基于聚類分析的微博廣告發(fā)布者識別
發(fā)布時間:2018-10-14 18:40
【摘要】:微博空間存在大量的廣告內(nèi)容,這些信息嚴重影響著普通用戶的用戶體驗和相關的研究工作,F(xiàn)有研究多使用支持向量機(SVM)或隨機森林等分類算法對廣告微博進行處理,然而分類方法中人工標注大數(shù)據(jù)量訓練集存在困難,因此提出基于聚類分析的微博廣告發(fā)布者識別方法:對于用戶維度,針對微博廣告發(fā)布者通過發(fā)布大量普通微博來稀釋其廣告內(nèi)容的現(xiàn)象,提出核心微博的概念,通過提取核心微博主題及其對應的微博序列,計算用戶特征和對應微博的文本特征,并使用聚類算法對特征進行聚類,從而識別微博廣告發(fā)布者。實驗結果顯示,所提方法準確率為92%,召回率為97%,F值為95%,證明所提方法在廣告內(nèi)容被人為稀釋的情況下能準確地識別微博廣告發(fā)布者,可以為微博垃圾信息識別、清理等工作提供理論支持和實用方法。
[Abstract]:Weibo space has a large amount of advertising content, which seriously affects the user experience and related research work of ordinary users. In recent studies, support vector machine (SVM) (SVM) or random forest classification algorithms are often used to deal with advertising Weibo. However, it is difficult to manually annotate large amount of data training set in classification methods. Therefore, this paper puts forward a method of identifying Weibo advertisement publishers based on cluster analysis: for the user dimension, aiming at the phenomenon that a large number of ordinary Weibo advertisers dilute their advertising content by publishing a large number of ordinary Weibo, this paper puts forward the concept of the core Weibo. By extracting the core Weibo theme and its corresponding Weibo sequence, the user features and the corresponding text features are calculated, and then the features are clustered by clustering algorithm, so as to identify the advertiser. The experimental results show that the accuracy of the proposed method is 92 and the recall rate is 97 and F is 95. It is proved that the proposed method can accurately identify the advertisement publisher Weibo under the condition that the advertising content is artificially diluted, and can identify the spam information for Weibo. Cleaning work provides theoretical support and practical methods.
【作者單位】: 南京大學軟件學院
【基金】:江蘇省產(chǎn)學研前瞻性聯(lián)合研究項目(BY2015069-03)~~
【分類號】:TP391.1
,
本文編號:2271296
[Abstract]:Weibo space has a large amount of advertising content, which seriously affects the user experience and related research work of ordinary users. In recent studies, support vector machine (SVM) (SVM) or random forest classification algorithms are often used to deal with advertising Weibo. However, it is difficult to manually annotate large amount of data training set in classification methods. Therefore, this paper puts forward a method of identifying Weibo advertisement publishers based on cluster analysis: for the user dimension, aiming at the phenomenon that a large number of ordinary Weibo advertisers dilute their advertising content by publishing a large number of ordinary Weibo, this paper puts forward the concept of the core Weibo. By extracting the core Weibo theme and its corresponding Weibo sequence, the user features and the corresponding text features are calculated, and then the features are clustered by clustering algorithm, so as to identify the advertiser. The experimental results show that the accuracy of the proposed method is 92 and the recall rate is 97 and F is 95. It is proved that the proposed method can accurately identify the advertisement publisher Weibo under the condition that the advertising content is artificially diluted, and can identify the spam information for Weibo. Cleaning work provides theoretical support and practical methods.
【作者單位】: 南京大學軟件學院
【基金】:江蘇省產(chǎn)學研前瞻性聯(lián)合研究項目(BY2015069-03)~~
【分類號】:TP391.1
,
本文編號:2271296
本文鏈接:http://www.wukwdryxk.cn/wenyilunwen/guanggaoshejilunwen/2271296.html