新聞熱點話題發(fā)現(xiàn)及演化分析研究與應(yīng)用
本文選題:LDA模型 切入點:熱點話題發(fā)現(xiàn) 出處:《南京理工大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:熱點話題是因網(wǎng)絡(luò)報道而引起人們廣泛關(guān)注的話題,熱點話題發(fā)現(xiàn)與演化研究有利于社會大眾知曉當(dāng)前輿論焦點和政府進行良性輿論引導(dǎo),能夠防止有心之徒利用網(wǎng)絡(luò)的便捷性、不可控性牟取不正當(dāng)利益,制造社會矛盾。本文主要就新聞熱點話題發(fā)現(xiàn)及對熱點話題演化偏移過程進行研究,主要包括以下幾個方面:1、引入了 LDA主題模型,對新聞報道采用基于TF-IDF的詞-權(quán)值模型和基于語義理解的LDA模型兩種文本向量建模方式。在此基礎(chǔ)上,針對傳統(tǒng)單核心話題描述模型對多核話題描述欠缺的問題,提出了一種多核心話題描述模型,能夠識別同一話題下不同的關(guān)注核心,并給出了模型構(gòu)造方法:采用劃分聚類與層次聚類結(jié)合的方法對新聞報道進行精確聚類。實驗表明,多種文本向量建模相結(jié)合的方式以及多核心話題描述模型能夠提高新聞話題的聚類效果。2、根據(jù)熱點話題特征分析的結(jié)果,將新聞的熱度量化為媒體報道熱度和網(wǎng)民關(guān)注熱度,并采用基于兩者的復(fù)合關(guān)注度描述熱點話題的熱度;同時引入"話題指數(shù)",采用基于時間窗口的分段話題聚類方法對熱點話題生命周期演化過程進行分析,提出了一種基于多核心話題描述模型的話題演化偏移分析方法,將演化過程看成話題內(nèi)核心事件的轉(zhuǎn)移過程。實驗表明該方法能很好的發(fā)現(xiàn)熱點話題的演化偏移過程。3、基于上述研究成果,設(shè)計并實現(xiàn)了新聞熱點話題發(fā)現(xiàn)及演化分析子系統(tǒng),該子系統(tǒng)是移動新聞監(jiān)測和分析平臺的一個重要功能模塊,集成了新聞報道預(yù)處理、熱點話題發(fā)現(xiàn)、熱點話題演化分析等功能,能夠?qū)崟r發(fā)現(xiàn)當(dāng)前熱點話題并展示給用戶。
[Abstract]:Hot topic is the topic that people pay much attention to because of network report. The research of hot topic discovery and evolution is helpful for the public to know the current public opinion focus and the government to guide public opinion. It can prevent those who want to make use of the convenience of the network, can not be controlled to obtain improper interests, and create social contradictions. This paper mainly focuses on the discovery of hot topics in news and the process of migration of the evolution of hot topics. It mainly includes the following several aspects: 1, introduces the LDA topic model, adopts two text vector modeling methods for news reports: word-weight model based on TF-IDF and LDA model based on semantic understanding. Aiming at the lack of multi-core topic description model in traditional single-core topic description model, a multi-core topic description model is proposed, which can identify different cores of concern under the same topic. The method of model construction is given. The method of combining partitioning clustering with hierarchical clustering is used to accurately cluster news reports. The combination of multiple text vector modeling and multi-core topic description model can improve the clustering effect of news topics. According to the results of feature analysis of hot topics, the heat of news can be quantified as the heat of media reports and the attention of Internet users. The heat of the hot topic is described by using the composite concern degree based on both, and the topic index is introduced to analyze the evolution process of the life cycle of the hot topic by using the segmented topic clustering method based on the time window. A topic evolution migration analysis method based on multi-core topic description model is proposed. The evolution process is regarded as the transition process of the core events in the topic. The experiment shows that the method can find the evolution migration process of the hot topic very well. Based on the above research results, the subsystem of news hot topic discovery and evolution analysis is designed and implemented. This subsystem is an important function module of mobile news monitoring and analysis platform. It integrates the functions of news report preprocessing, hot topic discovery, hot topic evolution analysis and so on. It can discover the current hot topic in real time and display it to the user.
【學(xué)位授予單位】:南京理工大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:TP391.1
【參考文獻】
相關(guān)期刊論文 前10條
1 江華麗;;中文分詞算法研究與分析[J];物聯(lián)網(wǎng)技術(shù);2016年01期
2 李鳳嶺;朱保平;;基于LDA模型的微博話題發(fā)現(xiàn)技術(shù)研究[J];計算機應(yīng)用與軟件;2014年10期
3 鄒曉輝;孫靜;;LDA主題模型[J];智能計算機與應(yīng)用;2014年05期
4 李愛華;尹斐斐;;網(wǎng)格聚類算法研究[J];科技致富向?qū)?2012年23期
5 張小明;李舟軍;巢文涵;;基于增量型聚類的自動話題檢測研究[J];軟件學(xué)報;2012年06期
6 彭菲菲;錢旭;;基于用戶關(guān)注度的個性化新聞推薦系統(tǒng)[J];計算機應(yīng)用研究;2012年03期
7 徐戈;王厚峰;;自然語言處理中主題模型的發(fā)展[J];計算機學(xué)報;2011年08期
8 姚全珠;宋志理;彭程;;基于LDA模型的文本分類研究[J];計算機工程與應(yīng)用;2011年13期
9 姚宗靜;余強;;Dirichlet分布概率密度的導(dǎo)出及若干性質(zhì)[J];科技信息;2010年11期
10 黃曉斌;趙超;;文本挖掘在網(wǎng)絡(luò)輿情信息分析中的應(yīng)用[J];情報科學(xué);2009年01期
,本文編號:1609233
本文鏈接:http://www.wukwdryxk.cn/shoufeilunwen/xixikjs/1609233.html