a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當(dāng)前位置:主頁 > 科技論文 > 自動(dòng)化論文 >

基于代價(jià)敏感方法的垃圾網(wǎng)頁欺詐檢測(cè)

發(fā)布時(shí)間:2018-05-30 23:20

  本文選題:垃圾網(wǎng)頁檢測(cè) + 代價(jià)敏感學(xué)習(xí) ; 參考:《西南交通大學(xué)》2017年碩士論文


【摘要】:隨著近20年互聯(lián)網(wǎng)技術(shù)的急速發(fā)展,各式各樣的網(wǎng)站和Web應(yīng)用層出不窮,這些網(wǎng)站的出現(xiàn)給人們的生活帶來了便利。與此同時(shí),作為互聯(lián)網(wǎng)發(fā)展的副產(chǎn)品,網(wǎng)上也存在大量含有詐騙或有害信息的垃圾網(wǎng)頁,這些被作弊者散布的垃圾網(wǎng)頁嚴(yán)重地危害著上網(wǎng)者的利益。如何準(zhǔn)確地識(shí)別和檢測(cè)這些垃圾網(wǎng)頁是當(dāng)前研究者所關(guān)注的熱點(diǎn)之一。本文首先從垃圾網(wǎng)頁二元分類檢測(cè)入手,研究當(dāng)垃圾網(wǎng)頁和正常網(wǎng)頁被錯(cuò)分后產(chǎn)生的不同代價(jià),采用了基于代價(jià)敏感支持向量機(jī)的檢測(cè)方法。在引入代價(jià)敏感方法后,針對(duì)很多方案中需要人為指定代價(jià)的問題,基于粒子群優(yōu)化算法構(gòu)建了融合代價(jià)計(jì)算的垃圾網(wǎng)頁檢測(cè)框架。具體做法是把代價(jià)敏感支持向量機(jī)包裝為粒子群算法的適應(yīng)函數(shù),其中代價(jià)敏感分類的代價(jià)參數(shù)作為粒子群算法的尋優(yōu)問題,分類算法的AUC值作為適應(yīng)函數(shù)的輸出。以此既保證了分類檢測(cè)的性能又降低了人為因素對(duì)算法的影響。其次,本文研究了多級(jí)垃圾網(wǎng)頁檢測(cè)問題,多級(jí)檢測(cè)相比二分檢測(cè)更加細(xì)粒度,要求垃圾網(wǎng)頁按不同危害度被檢出。本文基于代價(jià)敏感支持向量機(jī)的“一對(duì)一”組合多元分類方法實(shí)現(xiàn)了多級(jí)垃圾網(wǎng)頁檢測(cè),“一對(duì)一”組合多分類方法既保證了檢測(cè)性能,又避免了代價(jià)矩陣中代價(jià)融合的問題。之后同樣結(jié)合粒子群優(yōu)化算法,對(duì)多個(gè)誤分類代價(jià)進(jìn)行計(jì)算。本文基于UK2007垃圾網(wǎng)頁數(shù)據(jù)集的原始類標(biāo)數(shù)據(jù),構(gòu)建了 MC-UK2007三類別的新數(shù)據(jù)集。之后分別使用UK2007和MC-UK2007進(jìn)行融合代價(jià)計(jì)算的二分類和多分類檢測(cè)實(shí)驗(yàn),并應(yīng)用其他算法設(shè)置了多組實(shí)驗(yàn)進(jìn)行對(duì)比。實(shí)驗(yàn)結(jié)果顯示本文所提的兩個(gè)方法均能取得更優(yōu)的AUC值,表明本文方法能夠更有效地檢出垃圾網(wǎng)頁。
[Abstract]:With the rapid development of Internet technology in recent 20 years, a variety of websites and Web applications emerge in endlessly. The emergence of these websites brings convenience to people's lives. At the same time, as a by-product of the development of the Internet, there are also a large number of spam pages containing fraud or harmful information on the Internet. These spam pages spread by cheaters seriously harm the interests of Internet users. How to accurately identify and detect these spam pages is one of the hot topics that researchers pay attention to. This paper starts with the binary classification detection of garbage pages, studies the different costs when garbage pages and normal pages are misclassified, and adopts a cost-sensitive support vector machine based detection method. After introducing the cost sensitive method, aiming at the problem of artificial specified cost in many schemes, a garbage page detection framework based on particle swarm optimization (PSO) algorithm is proposed. The specific method is to package the cost sensitive support vector machine as the adaptive function of the particle swarm optimization algorithm, in which the cost parameters of the cost sensitive classification are taken as the optimization problem of the particle swarm optimization algorithm, and the AUC value of the classification algorithm is taken as the output of the fitness function. This not only ensures the performance of classification and detection, but also reduces the influence of human factors on the algorithm. Secondly, this paper studies the problem of multilevel garbage page detection. Multilevel detection is more fine-grained than binary detection, which requires garbage pages to be detected according to different hazards. In this paper, the "one to one" multivariate classification method based on the cost sensitive support vector machine is used to realize multilevel spam page detection. The "one to one" combined multiple classification method not only guarantees the detection performance, but also avoids the problem of cost fusion in the cost matrix. After that, the cost of multiple misclassification is calculated with particle swarm optimization (PSO). Based on the original class mark data of UK2007 garbage page data set, this paper constructs a new data set of three categories of MC-UK2007. After that, UK2007 and MC-UK2007 are used to carry out two-classification and multi-classification detection experiments of fusion cost calculation, and other algorithms are used to set up multi-group experiments for comparison. The experimental results show that the two methods proposed in this paper can obtain better AUC value, which indicates that the proposed method can detect garbage pages more effectively.
【學(xué)位授予單位】:西南交通大學(xué)
【學(xué)位級(jí)別】:碩士
【學(xué)位授予年份】:2017
【分類號(hào)】:TP393.092;TP18

【參考文獻(xiàn)】

相關(guān)期刊論文 前10條

1 劉汝雋;賈斌;辛陽;;基于信息增益特征選擇的網(wǎng)絡(luò)異常檢測(cè)模型[J];計(jì)算機(jī)應(yīng)用;2016年S2期

2 董亞楠;劉學(xué)軍;李斌;;一種基于用戶行為特征選擇的點(diǎn)擊欺詐檢測(cè)方法[J];計(jì)算機(jī)科學(xué);2016年10期

3 權(quán)鑫;顧韻華;鄭關(guān)勝;顧彬;;一種增量式的代價(jià)敏感支持向量機(jī)[J];中國科學(xué)技術(shù)大學(xué)學(xué)報(bào);2016年09期

4 盧曉勇;陳木生;;基于隨機(jī)森林和欠采樣集成的垃圾網(wǎng)頁檢測(cè)[J];計(jì)算機(jī)應(yīng)用;2016年03期

5 李法良;朱焱;曾俊東;;集成PCA降維與分類算法的垃圾網(wǎng)頁檢測(cè)[J];計(jì)算機(jī)應(yīng)用與軟件;2014年10期

6 呂超鎮(zhèn);姬東鴻;吳飛飛;;基于LDA特征擴(kuò)展的短文本分類[J];計(jì)算機(jī)工程與應(yīng)用;2015年04期

7 劉奇旭;張辣,

本文編號(hào):1957272


資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/kejilunwen/zidonghuakongzhilunwen/1957272.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶68983***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
国产女人18毛片水真多18精品| 超碰97人人做人人爱少妇| 97国产精品久久碰碰一只小嘤嘤| 亚洲综合激情网| 精品人妻码一区二区三区红楼视频| 日本免费AⅤ欧美在线观看| 国产精品久久人妻互换| 性高潮久久久久久久| 疯狂做受xxxx高潮欧美日本| 另类激情| 少妇BBW搡BBBB搡BBBB| 久久久久人妻一区精品| 国产成人综合美国十次| 成人片黄网站色大片免费| 综合无码精品人妻一区二区三区| 成人区精品一区二区婷婷 | 国产精品欧美久久久久久| 日韩精品三级| 人妻人人澡人人添人人爽视频| 亚洲欧美精品| 欧美高清一区二区三区| 99久久精品国产亚洲| 国产传媒一区二区| 丝袜无内| 美国三级黄色片| 精品久久久一区| 六十路の高齢熟女が| 国产人伦精品一区二区三区| 内射无码专区久久亚洲| 国产精品一区二区在线| 亚洲插插插| 成人一级黄片| 国产一区二| 一区二区三区| 亚洲色图一区二区| 按摩师舌头进去添的我好舒服| 久久亚洲A∨无码精品色午夜| 亚洲欧美日韩在线资源观看| 欧美性受XXXX白人性爽| 国产Av无码专区亚洲版综合| 无码国产69精品久久久孕妇|