中醫(yī)醫(yī)案文本挖掘的若干關鍵技術研究

發(fā)布時間：2018-06-29 21:13

本文選題：本體學習 + 命名實體識別��；參考：《山東師范大學》2016年博士論文

【摘要】：中醫(yī)是我國勞動人民數(shù)千年以來在與疾病斗爭中形成的豐富診療經驗的總結,在長期發(fā)展過程形成了一種以陰陽五行作為理論基礎的獨具特色的診療體系,留下了大量對中醫(yī)臨床決策有指導價值的文獻資料,這些“海量”中醫(yī)醫(yī)案文獻資料是中醫(yī)臨床診療的寶貴資源。目前,應用不同的文本挖掘方法,致力于從“海量”的文獻資料中獲取可理解的、可用的知識,用于分析中醫(yī)診療的用藥規(guī)律,以指導中醫(yī)臨床科研、教學及新藥研發(fā),已愈來愈成為該領域的研究熱點。然而,中醫(yī)醫(yī)案文本信息尚未得以有效挖掘和利用,原因在于:構建統(tǒng)一的中醫(yī)醫(yī)案本體存在一定的困難;命名實體識別效率不高;文本向量空間表示模型忽略了詞間的關聯(lián)性,不能很好地表示潛在語義信息;傳統(tǒng)文本聚類算法在處理數(shù)據(jù)時存在著對初始值的依賴性過強、易獲得局部最優(yōu)的弊端。針對上述問題,在前期研究基礎上,提出基于本體的命名實體識別算法和基于螢火蟲算法的中醫(yī)醫(yī)案文本聚類方法。本文的研究得到了山東省科技發(fā)展計劃:“基于醫(yī)用酶語義的文獻數(shù)據(jù)檢索挖掘算法的設計與實現(xiàn)(編號:2010G0020121)”、山東省電子專項工程:“山東省名老中醫(yī)診療輔助決策支持系統(tǒng)的開發(fā)與推廣(編號:2150511)”及山東省中醫(yī)藥科技發(fā)展計劃:“基于仿生智能算法的心力衰竭綜合防治方案研究(編號:2013-230)”的支持。本文數(shù)據(jù)來源為全國名老中醫(yī)、山東省名老中醫(yī)丁書文教授自2013年6月至2015年6月在山東中醫(yī)藥大學附屬醫(yī)院門診收集的2400份醫(yī)案,患者共757例,所用中藥共251種。本文的主要研究內容和研究成果總結如下:1.把人工蜂群算法應用于中醫(yī)醫(yī)案本體庫的構建。設計基于人工蜂群算法的本體學習技術,通過中文分詞技術、互信息及規(guī)則過濾等策略,以醫(yī)案中的中醫(yī)四診、中醫(yī)診斷、西醫(yī)診斷、證型、治法為信息語料進行分析、驗證,設計概念提取方法,同時利用小生境技術的融合、演化算法豐富種群的多樣性,結合人工蜂群算法尋優(yōu)速度快的優(yōu)勢抽取非分類關系,構建本體。實驗證明,組合的人工蜂群算法在中醫(yī)醫(yī)案非分類關系抽取過程中、在個體多樣性及平均適應度上均優(yōu)于普通的人工蜂群算法。2.提出一種基于本體的中醫(yī)醫(yī)案命名實體識別方法。應用條件隨機場、基于本體的修正及特征模板的修正方法對中醫(yī)醫(yī)案命名實體進行識別,構建基于本體的中醫(yī)醫(yī)案命名實體識別算法,通過檢驗性測試,獲得中醫(yī)四診、中醫(yī)診斷、西醫(yī)診斷、證型、治法的最優(yōu)實驗結果。實驗表明,基于本體的命名實體識別算法在中醫(yī)醫(yī)案命名實體識別時能取得較好效果。3.設計了一種基于詞共現(xiàn)組合的中醫(yī)醫(yī)案向量空間模型。利用關聯(lián)規(guī)則算法抽取出中醫(yī)醫(yī)案的二階詞共現(xiàn)組合,定義詞共現(xiàn)的度量方法,構建基于詞共現(xiàn)組合的向量空間模型。實驗表明,該方法在中醫(yī)醫(yī)案知識獲取及分類上比經典的向量空間模型具有更高的區(qū)分能力,并驗證了中醫(yī)醫(yī)案辨證診療主題與二階詞共現(xiàn)的關聯(lián)性。4.提出一種基于螢火蟲算法的中醫(yī)醫(yī)案文本聚類算法。引入粒計算思想,通過適應度變化情況動態(tài)確定螢火蟲算法的迭代和模擬退火算法的抽樣,擴大模擬退火的擾動增加種群的選擇范圍,并在實驗數(shù)據(jù)上進行驗證。實驗表明,相對于傳統(tǒng)的K-medoids聚類方法,該方法個體多樣性表現(xiàn)良好,能夠解決較難得到全局最優(yōu)的問題,文本聚類結果得到了專家的認可,具有一定的臨床參考價值。綜上所述,本文對中醫(yī)醫(yī)案文本挖掘的若干關鍵技術進行了分析,對適合中醫(yī)醫(yī)案文本挖掘的算法進行了設計,并通過文本挖掘系統(tǒng)對算法進行了整合、驗證,實驗表明,本文提出的設計方案具有有效性和先進性,可為中醫(yī)臨床、科研、教學和新藥研發(fā)提供參考。
[Abstract]:Traditional Chinese medicine is a summary of the rich diagnosis and treatment experience formed in the struggle against disease for thousands of years in China. In the long course of development, a unique and unique diagnosis and treatment system based on the five lines of yin and Yang has been formed, which has left a large number of literature materials guiding the clinical decision-making of traditional Chinese medicine, and these "massive" medical records of traditional Chinese medicine. Literature is a valuable resource for clinical diagnosis and treatment of traditional Chinese medicine. At present, the use of different text mining methods is devoted to obtaining understandable and available knowledge from "massive" literature to analyze the law of medicine for diagnosis and treatment of traditional Chinese medicine, in order to guide the clinical scientific research, teaching and new medicine research and development of traditional Chinese medicine, and has become a hot topic in this field. However, the text information of TCM medical cases has not been effectively excavated and used. The reason is: there are some difficulties in the construction of a unified traditional Chinese medical case body; the efficiency of the nomenclature entity recognition is not high; the text vector space representation model ignores the correlation between words, and the latent semantic information can not be shown very well; the traditional text clustering algorithm is in the processing number. According to the fact that the dependence of the initial value is too strong and easy to obtain the local optimal disadvantage, on the basis of the previous research, the ontology based named entity recognition algorithm and the Chinese medical case text clustering method based on the firefly algorithm are proposed. The research of this paper has obtained the Shandong Province science and technology development plan: "based on the medical enzyme language" Design and implementation of semantic bibliographic data retrieval mining algorithm (number: 2010G0020121) ", Shandong electronic special project:" development and promotion of Shandong famous old TCM diagnosis and treatment support system (serial number: 2150511) "and Shandong Province Traditional Chinese medicine science and technology development plan:" comprehensive prevention and control scheme based on biomimetic intelligent algorithm for heart failure " Research (numbered: 2013-230) support. The data source is the 2400 medical cases collected by Professor Ding Shuwen, the famous old Chinese medicine of Shandong Province, from June 2013 to June 2015 at the Affiliated Hospital of Shandong University of Traditional Chinese Medicine. There are 757 cases of patients and 251 kinds of Chinese medicine. The main contents and results of this paper are summarized as follows: 1 The artificial bee colony algorithm is applied to the construction of the medical case ontology library of traditional Chinese medicine. The ontology learning technology based on artificial bee colony algorithm is designed. Through the Chinese word segmentation technology, mutual information and rule filtering, the four diagnosis of traditional Chinese medicine, the diagnosis of traditional Chinese medicine, the western medicine diagnosis, the syndrome type and the treatment method are analyzed, verified and the design concept extraction method is used. At the same time, using the fusion of niche technology, the evolutionary algorithm enriches the diversity of the population, and combines the advantage of the artificial bee colony algorithm to extract the fast speed to extract the non classification relationship and construct the ontology. The experiment shows that the combined artificial bee colony algorithm is better than the ordinary one in the individual diversity and the average fitness in the non classification relationship extraction process of medical records. The artificial bee colony algorithm.2. proposed a traditional Chinese medical case named entity recognition method based on the ontology. Based on the airport, the ontology based correction and the feature template correction method, the Chinese medical case named entity was identified, the ontology based medical case naming entity recognition algorithm was constructed, and the four diagnosis of traditional Chinese medicine was obtained through the test test. The best experimental results of traditional Chinese medicine diagnosis, western medicine diagnosis, syndrome type and treatment method. The experiment shows that the ontology based named entity recognition algorithm can achieve good results in the recognition of medical cases named entity recognition of traditional Chinese medicine.3., a vector space model of TCM medical case based on word co occurrence is designed. The two order of TCM medical case is extracted by using association rule algorithm. The combination of word concurrence, the measure method of defining word concurrence and the construction of the vector space model based on the concurrence of words. The experiment shows that the method has a higher distinguishing ability than the classical vector space model in the knowledge acquisition and classification of medical records of traditional Chinese medicine, and verifies the association of the theme of diagnosis and treatment of TCM medical cases with the two order words, and the.4. Based on the algorithm of firefly algorithm, the clustering algorithm of Chinese medical case text is introduced. The idea of particle calculation is introduced to dynamically determine the iteration of the firefly algorithm and the sampling of simulated annealing algorithm. The selection range of the population is increased by increasing the simulated annealing disturbance and verified on the experimental data. The experiment shows that it is relative to the traditional K-med. OIDs clustering method, this method has good individual diversity, and can solve the problem that is difficult to get the global optimal. The result of text clustering has been recognized by experts and has certain clinical reference value. In summary, this paper analyses some key technologies of text mining in TCM medical cases, and makes a calculation for the text mining suitable for medical cases of traditional Chinese medicine. The method is designed, and the algorithm is integrated and verified through the text mining system. The experiment shows that the design scheme proposed in this paper is effective and advanced, which can provide reference for the clinical, scientific research, teaching and new medicine research and development of traditional Chinese medicine.
【學位授予單位】：山東師范大學
【學位級別】：博士
【學位授予年份】：2016
【分類號】：TP391.1

【相似文獻】

相關期刊論文前10條

1 張雯雯;許鑫;;文本挖掘工具述評[J];圖書情報工作;2012年08期

2 邢鴻飛;;文本挖掘口角升溫[J];世界科學;2013年05期

3 蔣良孝,蔡之華;文本挖掘及其應用[J];現(xiàn)代計算機(專業(yè)版);2003年02期

4 諶志群;張國煊;;文本挖掘研究進展[J];模式識別與人工智能;2005年01期

5 王娜;李云松;;基于概念格的文本挖掘[J];計算機技術與發(fā)展;2006年01期

6 黃維金;顧益軍;;刑偵檔案文本挖掘系統(tǒng)平臺中的文本精煉初探[J];中國人民公安大學學報(自然科學版);2006年02期

7 張燕;寒楓;楚紅濤;;文本挖掘簡述[J];中國電力教育;2006年S3期

8 韓春;田大鋼;;對股票市場信息的文本挖掘[J];中國高新技術企業(yè);2008年23期

9 程志;黃榮懷;;文本挖掘及其教育應用[J];現(xiàn)代遠距離教育;2008年02期

10 鞏知樂;張德賢;;文本挖掘理論概述[J];福建電腦;2008年09期

相關會議論文前10條

1 陳林;王曉華;李殿峗;文俊浩;;基于自增模式的文本挖掘研究[A];’2004計算機應用技術交流會議論文集[C];2004年

2 王巍;楊武;張樂君;鄭軍;;支持網絡話題管理的文本挖掘算法分析[A];全國網絡與信息安全技術研討會論文集（下冊）[C];2007年

3 王繼成;孫穎;張福炎;;文本挖掘-數(shù)據(jù)挖掘研究的新課題[A];第十六屆全國數(shù)據(jù)庫學術會議論文集[C];1999年

4 高飛;荊繼武;向繼;;文本挖掘系統(tǒng)的可視化方法研究[A];全國網絡與信息安全技術研討會論文集（上冊）[C];2007年

5 朱強生;田英;周延泉;何華燦;;基于非負因子分析的模糊文本挖掘[A];2006通信理論與技術新進展——第十一屆全國青年通信學術會議論文集[C];2006年

6 錢程揚;龍毅;徐震;孫昊;;基于Web文本挖掘的地理位置信息重建技術[A];中國地理學會2007年學術年會論文摘要集[C];2007年

7 蔣子海;周斌;吳泉源;;基于UIMA AS的文本挖掘系統(tǒng)的性能分析與評估[A];全國計算機安全學術交流會論文集·第二十五卷[C];2010年

8 邱曉蕾;張聰超;;基于SVD和部分聚集分類的文本挖掘算法[A];第二屆全國信息檢索與內容安全學術會議（NCIRCS-2005）論文集[C];2005年

9 武洪萍;周國祥;;Web文本挖掘研究[A];計算機技術與應用進展·2007——全國第18屆計算機技術與應用（CACIS）學術會議論文集[C];2007年

10 陳宇;王強;;聚類算法在Web文本挖掘中的應用研究[A];2009全國計算機網絡與通信學術會議論文集[C];2009年

相關重要報紙文章前4條

1 本報記者施鵬;非結構信息和文本挖掘[N];21世紀經濟報道;2009年

2 周青編譯;文本挖掘工具實現(xiàn)非結構化數(shù)據(jù)價值[N];計算機世界;2004年

3 ;SAS公司收購Teragram 強化BI領域地位[N];計算機世界;2008年

4 ;用挖掘技術使學術資源利用效益最大化[N];中國計算機報;2007年

相關博士學位論文前10條

1 曹奇敏;網絡信息文本挖掘若干問題研究[D];北京理工大學;2015年

2 陳虹樞;基于主題模型的專利文本挖掘方法及應用研究[D];北京理工大學;2015年

3 李梅;文本挖掘中若干關鍵技術研究[D];西北農林科技大學;2016年

4 袁鋒;中醫(yī)醫(yī)案文本挖掘的若干關鍵技術研究[D];山東師范大學;2016年

5 孫道軍;文本挖掘預處理相關基礎技術分析與應用研究[D];北京郵電大學;2008年

6 周雪忠;文本挖掘在中醫(yī)藥中的若干應用研究[D];浙江大學;2004年

7 王明春;基于粗糙集的數(shù)據(jù)及文本挖掘方法研究[D];天津大學;2005年

8 李芳;文本挖掘若干關鍵技術研究[D];北京化工大學;2010年

9 文翰;面向信息檢索的Web文本挖掘方法研究[D];華南理工大學;2012年

10 卜東波;聚類/分類理論研究及其在文本挖掘中的應用[D];中國科學院研究生院（計算技術研究所）;2000年

相關碩士學位論文前10條

1 張馨允;基于Spark的Web文本挖掘系統(tǒng)的研究與實現(xiàn)[D];吉林大學;2016年

2 王釗;基于Hadoop的文本挖掘研究與應用[D];廣東工業(yè)大學;2016年

3 黃建澍;面向人大代表議案處理的文本挖掘系統(tǒng)的設計與實現(xiàn)[D];中國科學院大學(工程管理與信息技術學院);2016年

4 徐奇釗;基于文本挖掘的文本情緒分類[D];云南財經大學;2016年

5 鄒運懷;基于文本挖掘的道岔故障分類研究[D];北京交通大學;2016年

6 王萍;基于Web文本挖掘的電子商務專業(yè)人才市場需求研究[D];重慶工商大學;2016年

7 盛華;聚類分析在文本挖掘中的應用與研究[D];江南大學;2016年

8 劉超;業(yè)界專家的媒體發(fā)言對公司股價影響的分析[D];上海師范大學;2016年

9 吳亞宇;基于文本挖掘的年報情感與上市公司業(yè)績的關系研究[D];中國地質大學(北京);2016年

10 高希瑞;基于文本挖掘的企業(yè)危機預警研究[D];華東師范大學;2011年

，

本文編號：2083388

資料下載

論文發(fā)表

本文鏈接：http://www.wukwdryxk.cn/shoufeilunwen/xxkjbs/2083388.html

上一篇：多關節(jié)機器人仿生液壓驅動技術及效率研究
下一篇：基于LED可見光通信的室內定位關鍵技術研究

論文發(fā)表

·知網|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

中醫(yī)醫(yī)案文本挖掘的若干關鍵技術研究