基于SVM模型的惡意網(wǎng)頁及PDF文檔檢測技術(shù)研究
發(fā)布時間:2019-04-11 15:02
【摘要】:互聯(lián)網(wǎng)在給人們提供更加方便、快捷的信息化服務(wù)的同時,也由于其開放性和脆弱性給黑客攻擊打開了便利之門。當(dāng)前,在眾多的網(wǎng)絡(luò)攻擊中,最流行的攻擊方式是將腳本元素作為攻擊代碼的載體,利用瀏覽器及其插件中的漏洞,在客戶端隱蔽下載并執(zhí)行惡意程序,進而對用戶實施惡意攻擊。這種典型的網(wǎng)頁木馬攻擊方式已經(jīng)對互聯(lián)網(wǎng)的安全構(gòu)成嚴(yán)重威脅。傳統(tǒng)基于靜態(tài)特征碼的反病毒引擎主要采用匹配法來檢測網(wǎng)頁木馬,這種方法的局限性在于無法檢測出經(jīng)過混淆的惡意代碼,并且靜態(tài)特征庫也會隨著時間的推移變得異常龐大,最終導(dǎo)致檢測性能下降。因此,有必要研究一種能夠在不依賴靜態(tài)特征庫的情況下,實現(xiàn)對惡意混淆代碼的快速檢測技術(shù)。此外,隨著PDF文檔的廣泛應(yīng)用,以及PDF閱讀軟件存在的諸多漏洞,使得PDF文檔也逐漸成為網(wǎng)頁木馬的傳播載體。因此,設(shè)計一種能夠同時檢測惡意Web頁面和惡意PDF文檔的混合樣本檢測引擎具有廣闊的市場前景。 基于以上出發(fā)點,本論文通過對Web樣本和PDF樣本的結(jié)構(gòu)進行分析,,然后利用基于統(tǒng)計學(xué)習(xí)理論的支持向量機技術(shù)和基于動態(tài)運行的shellcode仿真技術(shù),實現(xiàn)了一種能夠快速檢測出隱藏在Web網(wǎng)頁或PDF文檔中的惡意代碼的檢測引擎。論文的主要工作如下: (1)對網(wǎng)頁木馬的攻擊與防御技術(shù)進行全面歸納總結(jié)。闡述了網(wǎng)頁木馬的基本攻擊原理和攻擊手段;分析了針對不同環(huán)節(jié)(如:網(wǎng)站服務(wù)器端、中間代理端、客戶端)的防御技術(shù)及其優(yōu)缺點。 (2)采用支持向量機技術(shù)來檢測混淆的惡意網(wǎng)頁代碼,克服了傳統(tǒng)基于靜態(tài)特征碼檢測技術(shù)的缺陷。通過對待測樣本的結(jié)構(gòu)進行分析并提取其中的JS代碼,并利用支持向量機技術(shù)對大量JS特征字符進行訓(xùn)練,獲得一個可以區(qū)分惡意樣本和正常樣本的特征分類器,從而實現(xiàn)對惡意混淆代碼的快速檢測(分類)。 (3)通過對PDF文檔結(jié)構(gòu)中的流對象進行靜態(tài)分析,提取其中嵌套的JS代碼,再利用支持向量機檢測技術(shù)對JS代碼檢測,從而實現(xiàn)了對惡意PDF文檔的檢測。 (4)使用一種動態(tài)模擬工具對惡意腳本中的Shellcode代碼進行運行仿真,可以得到惡意代碼的詳細(xì)行為分析報告,從而有助于分析人員對其進行直觀、細(xì)致的分析。
[Abstract]:Internet not only provides people with more convenient and fast information service, but also opens the door to hacker attack because of its openness and fragility. Currently, among many network attacks, the most popular attack is to use script elements as the carrier of attack code, exploit the vulnerability in browser and its plug-in, and secretly download and execute malicious programs on the client side. And then carry out malicious attacks on the user. This typical web Trojan attack has posed a serious threat to the security of the Internet. The traditional anti-virus engine based on static signature mainly uses matching method to detect web page Trojan horse. The limitation of this method is that it can't detect the confused malicious code. And the static feature library will also become unusually large over time, resulting in a decline in detection performance. Therefore, it is necessary to study a fast detection technique for malicious obfuscation code without relying on static feature library. In addition, with the extensive application of PDF documents and many vulnerabilities in PDF reading software, PDF documents have gradually become the carrier of web Trojan horse. Therefore, the design of a hybrid sample detection engine which can detect malicious Web pages and malicious PDF documents simultaneously has a broad market prospect. Based on the above, this paper analyzes the structure of Web samples and PDF samples, and then makes use of the support vector machine technology based on statistical learning theory and the shellcode simulation technology based on dynamic operation. A fast detection engine for detecting malicious code hidden in Web web pages or PDF documents is implemented. The main work of this paper is as follows: (1) summarize the attack and defense technology of webpage Trojan horse. This paper expounds the basic attack principle and attack means of the web page Trojan horse and analyzes the defense technology and its advantages and disadvantages aimed at different links (such as web server intermediate agent client). (2) support vector machine (SVM) is used to detect confused malicious web page code, which overcomes the shortcomings of traditional static signature detection technology. By analyzing the structure of test samples and extracting the JS code, a large number of JS feature characters are trained by support vector machine (SVM), and a feature classifier which can distinguish malicious samples from normal samples is obtained. Thus, the fast detection (classification) of malicious obfuscation codes can be realized. (3) through the static analysis of stream objects in PDF document structure, the nested JS code is extracted, and then the JS code is detected by support vector machine (SVM), thus the detection of malicious PDF documents is realized. (4) using a dynamic simulation tool to run the Shellcode code in malicious script, the detailed behavior analysis report of malicious code can be obtained, which is helpful for analysts to analyze the malicious code intuitively and meticulously.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP393.08
本文編號:2456507
[Abstract]:Internet not only provides people with more convenient and fast information service, but also opens the door to hacker attack because of its openness and fragility. Currently, among many network attacks, the most popular attack is to use script elements as the carrier of attack code, exploit the vulnerability in browser and its plug-in, and secretly download and execute malicious programs on the client side. And then carry out malicious attacks on the user. This typical web Trojan attack has posed a serious threat to the security of the Internet. The traditional anti-virus engine based on static signature mainly uses matching method to detect web page Trojan horse. The limitation of this method is that it can't detect the confused malicious code. And the static feature library will also become unusually large over time, resulting in a decline in detection performance. Therefore, it is necessary to study a fast detection technique for malicious obfuscation code without relying on static feature library. In addition, with the extensive application of PDF documents and many vulnerabilities in PDF reading software, PDF documents have gradually become the carrier of web Trojan horse. Therefore, the design of a hybrid sample detection engine which can detect malicious Web pages and malicious PDF documents simultaneously has a broad market prospect. Based on the above, this paper analyzes the structure of Web samples and PDF samples, and then makes use of the support vector machine technology based on statistical learning theory and the shellcode simulation technology based on dynamic operation. A fast detection engine for detecting malicious code hidden in Web web pages or PDF documents is implemented. The main work of this paper is as follows: (1) summarize the attack and defense technology of webpage Trojan horse. This paper expounds the basic attack principle and attack means of the web page Trojan horse and analyzes the defense technology and its advantages and disadvantages aimed at different links (such as web server intermediate agent client). (2) support vector machine (SVM) is used to detect confused malicious web page code, which overcomes the shortcomings of traditional static signature detection technology. By analyzing the structure of test samples and extracting the JS code, a large number of JS feature characters are trained by support vector machine (SVM), and a feature classifier which can distinguish malicious samples from normal samples is obtained. Thus, the fast detection (classification) of malicious obfuscation codes can be realized. (3) through the static analysis of stream objects in PDF document structure, the nested JS code is extracted, and then the JS code is detected by support vector machine (SVM), thus the detection of malicious PDF documents is realized. (4) using a dynamic simulation tool to run the Shellcode code in malicious script, the detailed behavior analysis report of malicious code can be obtained, which is helpful for analysts to analyze the malicious code intuitively and meticulously.
【學(xué)位授予單位】:江西理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2014
【分類號】:TP393.092;TP393.08
【參考文獻(xiàn)】
相關(guān)期刊論文 前5條
1 張昊;陶然;李志勇;杜華;;判斷矩陣法在網(wǎng)頁惡意腳本檢測中的應(yīng)用[J];兵工學(xué)報;2008年04期
2 蒲東兵;楊立明;周彥軍;車文隆;馬志強;;基于嵌入式瀏覽器的JavaScript解析器設(shè)計[J];信息技術(shù);2010年04期
3 左黎明;湯鵬志;劉二根;徐保根;;基于行為特征的惡意代碼檢測方法[J];計算機工程;2012年02期
4 王映,于滿泉,李盛韜,王斌,余智華;JavaScript引擎在動態(tài)網(wǎng)頁采集技術(shù)中的應(yīng)用[J];計算機應(yīng)用;2004年02期
5 張慧琳;鄒維;韓心慧;;網(wǎng)頁木馬機理與防御技術(shù)[J];軟件學(xué)報;2013年04期
本文編號:2456507
本文鏈接:http://www.wukwdryxk.cn/guanlilunwen/ydhl/2456507.html
最近更新
教材專著