基于互信息與KNN的入侵檢測技術研究
發(fā)布時間:2019-04-22 12:14
【摘要】:網(wǎng)絡技術飛速發(fā)展,導致各種網(wǎng)絡安全問題越來越嚴重,所需的安全防護措施也越來越重要。入侵檢測技術是一種基于預防的動態(tài)安全防范措施,它一直是信息安全領域研究的熱點,有著舉足輕重的地位。 本文針對傳統(tǒng)KNN算法學習效率低下的缺點,提出一種快速KNN (F-KNN)算法。其主要作了以下三個方面的改進: 第一,刪減訓練樣本庫。刪除訓練集中的大量重復數(shù)據(jù),以減少算法學習過程中的計算量,從而提高學習效率。 第二,建立索引模型。隨機選取一個訓練樣本作為基準點,計算其他訓練樣本與該基準點的距離,且由小到大進行排序,得到一個有序線性表,并抽取有序線性表中間隔相等的樣本建立索引表,根據(jù)索引表和有序線性表快速查找待分類測試樣本的k個最近鄰,以縮小查找范圍,從而提高學習效率。 第三,設緩存功能。對待分類測試樣本時,先與緩存的已分類測試樣本比對,若有相同,則直接賦予緩存樣本的類標,若無相同,再行分類學習,從而提高學習效率。 本文選用KDD CUP99數(shù)據(jù)集作為實驗數(shù)據(jù),首先對該數(shù)據(jù)集進行預處理;然后使用基于互信息的特征約簡算法進行特征選擇;最后使用F-KNN算法對特征約簡后的數(shù)據(jù)集進行異常檢測。實驗結果表明,F-KNN算法在不降低分類精度的前提下,大幅度提高了分類學習效率。
[Abstract]:With the rapid development of network technology, all kinds of network security problems become more and more serious, and the security protection measures are more and more important. Intrusion detection technology is a kind of dynamic security measures based on prevention. It has always been a hot spot in the field of information security and plays an important role in the field of information security. In this paper, a fast KNN (F-KNN) algorithm is proposed to overcome the disadvantage of low learning efficiency of traditional KNN algorithm. It mainly makes the following three aspects of improvement: first, delete the training sample database. In order to reduce the computational complexity in the learning process of the algorithm, a large number of duplicated data in the training set are deleted so as to improve the learning efficiency. Secondly, the index model is established. A training sample is randomly selected as the reference point, the distance between the other training samples and the reference point is calculated, and the order linear table is obtained from small to large, and the sample with equal interval in the ordered linear table is taken to set up the index table. According to the index table and the ordered linear table, k nearest neighbors of the test samples to be classified are quickly searched to reduce the search range and thus improve the learning efficiency. Third, set up cache function. When the classification test sample is treated, it is first compared with the cached classified test sample. If there is the same, the class label of the cache sample is directly assigned to the cache sample. If there is not the same, then the classification learning is performed, so as to improve the learning efficiency. In this paper, the KDD CUP99 data set is chosen as the experimental data, firstly, the data set is pre-processed, then the feature reduction algorithm based on mutual information is used for feature selection. Finally, the F-KNN algorithm is used to detect the anomaly of the reduced data set. The experimental results show that the F-KNN algorithm greatly improves the classification learning efficiency without reducing the classification accuracy.
【學位授予單位】:華東理工大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.08
本文編號:2462823
[Abstract]:With the rapid development of network technology, all kinds of network security problems become more and more serious, and the security protection measures are more and more important. Intrusion detection technology is a kind of dynamic security measures based on prevention. It has always been a hot spot in the field of information security and plays an important role in the field of information security. In this paper, a fast KNN (F-KNN) algorithm is proposed to overcome the disadvantage of low learning efficiency of traditional KNN algorithm. It mainly makes the following three aspects of improvement: first, delete the training sample database. In order to reduce the computational complexity in the learning process of the algorithm, a large number of duplicated data in the training set are deleted so as to improve the learning efficiency. Secondly, the index model is established. A training sample is randomly selected as the reference point, the distance between the other training samples and the reference point is calculated, and the order linear table is obtained from small to large, and the sample with equal interval in the ordered linear table is taken to set up the index table. According to the index table and the ordered linear table, k nearest neighbors of the test samples to be classified are quickly searched to reduce the search range and thus improve the learning efficiency. Third, set up cache function. When the classification test sample is treated, it is first compared with the cached classified test sample. If there is the same, the class label of the cache sample is directly assigned to the cache sample. If there is not the same, then the classification learning is performed, so as to improve the learning efficiency. In this paper, the KDD CUP99 data set is chosen as the experimental data, firstly, the data set is pre-processed, then the feature reduction algorithm based on mutual information is used for feature selection. Finally, the F-KNN algorithm is used to detect the anomaly of the reduced data set. The experimental results show that the F-KNN algorithm greatly improves the classification learning efficiency without reducing the classification accuracy.
【學位授予單位】:華東理工大學
【學位級別】:碩士
【學位授予年份】:2014
【分類號】:TP393.08
【參考文獻】
相關期刊論文 前10條
1 曹建軍;刁興春;杜瀊;王芳瀟;張瀟毅;;基于蟻群特征選擇的相似重復記錄分類檢測[J];兵工學報;2010年09期
2 竇東陽;楊建國;李麗娟;趙英凱;;基于規(guī)則的神經(jīng)網(wǎng)絡在模式分類中的應用[J];東南大學學報(自然科學版);2011年03期
3 馬駿;;入侵檢測系統(tǒng)發(fā)展簡述[J];電腦知識與技術;2008年34期
4 李成云;支冬棟;;免疫算法在入侵檢測模型中的應用研究[J];電腦知識與技術;2011年19期
5 陸廣泉;謝揚才;劉星;張師超;;一種基于KNN的半監(jiān)督分類改進算法[J];廣西師范大學學報(自然科學版);2012年01期
6 蔡賀;張睿;;k最近鄰域分類算法分析與研究[J];甘肅科技;2012年18期
7 盧新國,林亞平,陳治平;一種改進的互信息特征選取預處理算法[J];湖南大學學報(自然科學版);2005年01期
8 李凱齊;刁興春;曹建軍;李峰;;基于改進蟻群算法的高精度文本特征選擇方法[J];解放軍理工大學學報(自然科學版);2010年06期
9 徐峻嶺;周毓明;陳林;徐寶文;;基于互信息的無監(jiān)督特征選擇[J];計算機研究與發(fā)展;2012年02期
10 賈世國;張昌城;;基于數(shù)據(jù)挖掘的網(wǎng)絡入侵檢測系統(tǒng)設計與實現(xiàn)[J];計算機工程與應用;2008年14期
,本文編號:2462823
本文鏈接:http://www.wukwdryxk.cn/guanlilunwen/ydhl/2462823.html
最近更新
教材專著