中醫(yī)醫(yī)案文本挖掘的若干關(guān)鍵技術(shù)研究
本文選題:本體學(xué)習(xí) + 命名實(shí)體識(shí)別; 參考:《山東師范大學(xué)》2016年博士論文
【摘要】:中醫(yī)是我國(guó)勞動(dòng)人民數(shù)千年以來在與疾病斗爭(zhēng)中形成的豐富診療經(jīng)驗(yàn)的總結(jié),在長(zhǎng)期發(fā)展過程形成了一種以陰陽五行作為理論基礎(chǔ)的獨(dú)具特色的診療體系,留下了大量對(duì)中醫(yī)臨床決策有指導(dǎo)價(jià)值的文獻(xiàn)資料,這些“海量”中醫(yī)醫(yī)案文獻(xiàn)資料是中醫(yī)臨床診療的寶貴資源。目前,應(yīng)用不同的文本挖掘方法,致力于從“海量”的文獻(xiàn)資料中獲取可理解的、可用的知識(shí),用于分析中醫(yī)診療的用藥規(guī)律,以指導(dǎo)中醫(yī)臨床科研、教學(xué)及新藥研發(fā),已愈來愈成為該領(lǐng)域的研究熱點(diǎn)。然而,中醫(yī)醫(yī)案文本信息尚未得以有效挖掘和利用,原因在于:構(gòu)建統(tǒng)一的中醫(yī)醫(yī)案本體存在一定的困難;命名實(shí)體識(shí)別效率不高;文本向量空間表示模型忽略了詞間的關(guān)聯(lián)性,不能很好地表示潛在語義信息;傳統(tǒng)文本聚類算法在處理數(shù)據(jù)時(shí)存在著對(duì)初始值的依賴性過強(qiáng)、易獲得局部最優(yōu)的弊端。針對(duì)上述問題,在前期研究基礎(chǔ)上,提出基于本體的命名實(shí)體識(shí)別算法和基于螢火蟲算法的中醫(yī)醫(yī)案文本聚類方法。本文的研究得到了山東省科技發(fā)展計(jì)劃:“基于醫(yī)用酶語義的文獻(xiàn)數(shù)據(jù)檢索挖掘算法的設(shè)計(jì)與實(shí)現(xiàn)(編號(hào):2010G0020121)”、山東省電子專項(xiàng)工程:“山東省名老中醫(yī)診療輔助決策支持系統(tǒng)的開發(fā)與推廣(編號(hào):2150511)”及山東省中醫(yī)藥科技發(fā)展計(jì)劃:“基于仿生智能算法的心力衰竭綜合防治方案研究(編號(hào):2013-230)”的支持。本文數(shù)據(jù)來源為全國(guó)名老中醫(yī)、山東省名老中醫(yī)丁書文教授自2013年6月至2015年6月在山東中醫(yī)藥大學(xué)附屬醫(yī)院門診收集的2400份醫(yī)案,患者共757例,所用中藥共251種。本文的主要研究?jī)?nèi)容和研究成果總結(jié)如下:1.把人工蜂群算法應(yīng)用于中醫(yī)醫(yī)案本體庫的構(gòu)建。設(shè)計(jì)基于人工蜂群算法的本體學(xué)習(xí)技術(shù),通過中文分詞技術(shù)、互信息及規(guī)則過濾等策略,以醫(yī)案中的中醫(yī)四診、中醫(yī)診斷、西醫(yī)診斷、證型、治法為信息語料進(jìn)行分析、驗(yàn)證,設(shè)計(jì)概念提取方法,同時(shí)利用小生境技術(shù)的融合、演化算法豐富種群的多樣性,結(jié)合人工蜂群算法尋優(yōu)速度快的優(yōu)勢(shì)抽取非分類關(guān)系,構(gòu)建本體。實(shí)驗(yàn)證明,組合的人工蜂群算法在中醫(yī)醫(yī)案非分類關(guān)系抽取過程中、在個(gè)體多樣性及平均適應(yīng)度上均優(yōu)于普通的人工蜂群算法。2.提出一種基于本體的中醫(yī)醫(yī)案命名實(shí)體識(shí)別方法。應(yīng)用條件隨機(jī)場(chǎng)、基于本體的修正及特征模板的修正方法對(duì)中醫(yī)醫(yī)案命名實(shí)體進(jìn)行識(shí)別,構(gòu)建基于本體的中醫(yī)醫(yī)案命名實(shí)體識(shí)別算法,通過檢驗(yàn)性測(cè)試,獲得中醫(yī)四診、中醫(yī)診斷、西醫(yī)診斷、證型、治法的最優(yōu)實(shí)驗(yàn)結(jié)果。實(shí)驗(yàn)表明,基于本體的命名實(shí)體識(shí)別算法在中醫(yī)醫(yī)案命名實(shí)體識(shí)別時(shí)能取得較好效果。3.設(shè)計(jì)了一種基于詞共現(xiàn)組合的中醫(yī)醫(yī)案向量空間模型。利用關(guān)聯(lián)規(guī)則算法抽取出中醫(yī)醫(yī)案的二階詞共現(xiàn)組合,定義詞共現(xiàn)的度量方法,構(gòu)建基于詞共現(xiàn)組合的向量空間模型。實(shí)驗(yàn)表明,該方法在中醫(yī)醫(yī)案知識(shí)獲取及分類上比經(jīng)典的向量空間模型具有更高的區(qū)分能力,并驗(yàn)證了中醫(yī)醫(yī)案辨證診療主題與二階詞共現(xiàn)的關(guān)聯(lián)性。4.提出一種基于螢火蟲算法的中醫(yī)醫(yī)案文本聚類算法。引入粒計(jì)算思想,通過適應(yīng)度變化情況動(dòng)態(tài)確定螢火蟲算法的迭代和模擬退火算法的抽樣,擴(kuò)大模擬退火的擾動(dòng)增加種群的選擇范圍,并在實(shí)驗(yàn)數(shù)據(jù)上進(jìn)行驗(yàn)證。實(shí)驗(yàn)表明,相對(duì)于傳統(tǒng)的K-medoids聚類方法,該方法個(gè)體多樣性表現(xiàn)良好,能夠解決較難得到全局最優(yōu)的問題,文本聚類結(jié)果得到了專家的認(rèn)可,具有一定的臨床參考價(jià)值。綜上所述,本文對(duì)中醫(yī)醫(yī)案文本挖掘的若干關(guān)鍵技術(shù)進(jìn)行了分析,對(duì)適合中醫(yī)醫(yī)案文本挖掘的算法進(jìn)行了設(shè)計(jì),并通過文本挖掘系統(tǒng)對(duì)算法進(jìn)行了整合、驗(yàn)證,實(shí)驗(yàn)表明,本文提出的設(shè)計(jì)方案具有有效性和先進(jìn)性,可為中醫(yī)臨床、科研、教學(xué)和新藥研發(fā)提供參考。
[Abstract]:Traditional Chinese medicine is a summary of the rich diagnosis and treatment experience formed in the struggle against disease for thousands of years in China. In the long course of development, a unique and unique diagnosis and treatment system based on the five lines of yin and Yang has been formed, which has left a large number of literature materials guiding the clinical decision-making of traditional Chinese medicine, and these "massive" medical records of traditional Chinese medicine. Literature is a valuable resource for clinical diagnosis and treatment of traditional Chinese medicine. At present, the use of different text mining methods is devoted to obtaining understandable and available knowledge from "massive" literature to analyze the law of medicine for diagnosis and treatment of traditional Chinese medicine, in order to guide the clinical scientific research, teaching and new medicine research and development of traditional Chinese medicine, and has become a hot topic in this field. However, the text information of TCM medical cases has not been effectively excavated and used. The reason is: there are some difficulties in the construction of a unified traditional Chinese medical case body; the efficiency of the nomenclature entity recognition is not high; the text vector space representation model ignores the correlation between words, and the latent semantic information can not be shown very well; the traditional text clustering algorithm is in the processing number. According to the fact that the dependence of the initial value is too strong and easy to obtain the local optimal disadvantage, on the basis of the previous research, the ontology based named entity recognition algorithm and the Chinese medical case text clustering method based on the firefly algorithm are proposed. The research of this paper has obtained the Shandong Province science and technology development plan: "based on the medical enzyme language" Design and implementation of semantic bibliographic data retrieval mining algorithm (number: 2010G0020121) ", Shandong electronic special project:" development and promotion of Shandong famous old TCM diagnosis and treatment support system (serial number: 2150511) "and Shandong Province Traditional Chinese medicine science and technology development plan:" comprehensive prevention and control scheme based on biomimetic intelligent algorithm for heart failure " Research (numbered: 2013-230) support. The data source is the 2400 medical cases collected by Professor Ding Shuwen, the famous old Chinese medicine of Shandong Province, from June 2013 to June 2015 at the Affiliated Hospital of Shandong University of Traditional Chinese Medicine. There are 757 cases of patients and 251 kinds of Chinese medicine. The main contents and results of this paper are summarized as follows: 1 The artificial bee colony algorithm is applied to the construction of the medical case ontology library of traditional Chinese medicine. The ontology learning technology based on artificial bee colony algorithm is designed. Through the Chinese word segmentation technology, mutual information and rule filtering, the four diagnosis of traditional Chinese medicine, the diagnosis of traditional Chinese medicine, the western medicine diagnosis, the syndrome type and the treatment method are analyzed, verified and the design concept extraction method is used. At the same time, using the fusion of niche technology, the evolutionary algorithm enriches the diversity of the population, and combines the advantage of the artificial bee colony algorithm to extract the fast speed to extract the non classification relationship and construct the ontology. The experiment shows that the combined artificial bee colony algorithm is better than the ordinary one in the individual diversity and the average fitness in the non classification relationship extraction process of medical records. The artificial bee colony algorithm.2. proposed a traditional Chinese medical case named entity recognition method based on the ontology. Based on the airport, the ontology based correction and the feature template correction method, the Chinese medical case named entity was identified, the ontology based medical case naming entity recognition algorithm was constructed, and the four diagnosis of traditional Chinese medicine was obtained through the test test. The best experimental results of traditional Chinese medicine diagnosis, western medicine diagnosis, syndrome type and treatment method. The experiment shows that the ontology based named entity recognition algorithm can achieve good results in the recognition of medical cases named entity recognition of traditional Chinese medicine.3., a vector space model of TCM medical case based on word co occurrence is designed. The two order of TCM medical case is extracted by using association rule algorithm. The combination of word concurrence, the measure method of defining word concurrence and the construction of the vector space model based on the concurrence of words. The experiment shows that the method has a higher distinguishing ability than the classical vector space model in the knowledge acquisition and classification of medical records of traditional Chinese medicine, and verifies the association of the theme of diagnosis and treatment of TCM medical cases with the two order words, and the.4. Based on the algorithm of firefly algorithm, the clustering algorithm of Chinese medical case text is introduced. The idea of particle calculation is introduced to dynamically determine the iteration of the firefly algorithm and the sampling of simulated annealing algorithm. The selection range of the population is increased by increasing the simulated annealing disturbance and verified on the experimental data. The experiment shows that it is relative to the traditional K-med. OIDs clustering method, this method has good individual diversity, and can solve the problem that is difficult to get the global optimal. The result of text clustering has been recognized by experts and has certain clinical reference value. In summary, this paper analyses some key technologies of text mining in TCM medical cases, and makes a calculation for the text mining suitable for medical cases of traditional Chinese medicine. The method is designed, and the algorithm is integrated and verified through the text mining system. The experiment shows that the design scheme proposed in this paper is effective and advanced, which can provide reference for the clinical, scientific research, teaching and new medicine research and development of traditional Chinese medicine.
【學(xué)位授予單位】:山東師范大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1
【相似文獻(xiàn)】
相關(guān)期刊論文 前10條
1 張?chǎng)?許鑫;;文本挖掘工具述評(píng)[J];圖書情報(bào)工作;2012年08期
2 邢鴻飛;;文本挖掘口角升溫[J];世界科學(xué);2013年05期
3 蔣良孝,蔡之華;文本挖掘及其應(yīng)用[J];現(xiàn)代計(jì)算機(jī)(專業(yè)版);2003年02期
4 諶志群;張國(guó)煊;;文本挖掘研究進(jìn)展[J];模式識(shí)別與人工智能;2005年01期
5 王娜;李云松;;基于概念格的文本挖掘[J];計(jì)算機(jī)技術(shù)與發(fā)展;2006年01期
6 黃維金;顧益軍;;刑偵檔案文本挖掘系統(tǒng)平臺(tái)中的文本精煉初探[J];中國(guó)人民公安大學(xué)學(xué)報(bào)(自然科學(xué)版);2006年02期
7 張燕;寒楓;楚紅濤;;文本挖掘簡(jiǎn)述[J];中國(guó)電力教育;2006年S3期
8 韓春;田大鋼;;對(duì)股票市場(chǎng)信息的文本挖掘[J];中國(guó)高新技術(shù)企業(yè);2008年23期
9 程志;黃榮懷;;文本挖掘及其教育應(yīng)用[J];現(xiàn)代遠(yuǎn)距離教育;2008年02期
10 鞏知樂;張德賢;;文本挖掘理論概述[J];福建電腦;2008年09期
相關(guān)會(huì)議論文 前10條
1 陳林;王曉華;李殿峗;文俊浩;;基于自增模式的文本挖掘研究[A];’2004計(jì)算機(jī)應(yīng)用技術(shù)交流會(huì)議論文集[C];2004年
2 王巍;楊武;張樂君;鄭軍;;支持網(wǎng)絡(luò)話題管理的文本挖掘算法分析[A];全國(guó)網(wǎng)絡(luò)與信息安全技術(shù)研討會(huì)論文集(下冊(cè))[C];2007年
3 王繼成;孫穎;張福炎;;文本挖掘-數(shù)據(jù)挖掘研究的新課題[A];第十六屆全國(guó)數(shù)據(jù)庫學(xué)術(shù)會(huì)議論文集[C];1999年
4 高飛;荊繼武;向繼;;文本挖掘系統(tǒng)的可視化方法研究[A];全國(guó)網(wǎng)絡(luò)與信息安全技術(shù)研討會(huì)論文集(上冊(cè))[C];2007年
5 朱強(qiáng)生;田英;周延泉;何華燦;;基于非負(fù)因子分析的模糊文本挖掘[A];2006通信理論與技術(shù)新進(jìn)展——第十一屆全國(guó)青年通信學(xué)術(shù)會(huì)議論文集[C];2006年
6 錢程揚(yáng);龍毅;徐震;孫昊;;基于Web文本挖掘的地理位置信息重建技術(shù)[A];中國(guó)地理學(xué)會(huì)2007年學(xué)術(shù)年會(huì)論文摘要集[C];2007年
7 蔣子海;周斌;吳泉源;;基于UIMA AS的文本挖掘系統(tǒng)的性能分析與評(píng)估[A];全國(guó)計(jì)算機(jī)安全學(xué)術(shù)交流會(huì)論文集·第二十五卷[C];2010年
8 邱曉蕾;張聰超;;基于SVD和部分聚集分類的文本挖掘算法[A];第二屆全國(guó)信息檢索與內(nèi)容安全學(xué)術(shù)會(huì)議(NCIRCS-2005)論文集[C];2005年
9 武洪萍;周國(guó)祥;;Web文本挖掘研究[A];計(jì)算機(jī)技術(shù)與應(yīng)用進(jìn)展·2007——全國(guó)第18屆計(jì)算機(jī)技術(shù)與應(yīng)用(CACIS)學(xué)術(shù)會(huì)議論文集[C];2007年
10 陳宇;王強(qiáng);;聚類算法在Web文本挖掘中的應(yīng)用研究[A];2009全國(guó)計(jì)算機(jī)網(wǎng)絡(luò)與通信學(xué)術(shù)會(huì)議論文集[C];2009年
相關(guān)重要報(bào)紙文章 前4條
1 本報(bào)記者 施鵬;非結(jié)構(gòu)信息和文本挖掘[N];21世紀(jì)經(jīng)濟(jì)報(bào)道;2009年
2 周青 編譯;文本挖掘工具實(shí)現(xiàn)非結(jié)構(gòu)化數(shù)據(jù)價(jià)值[N];計(jì)算機(jī)世界;2004年
3 ;SAS公司收購Teragram 強(qiáng)化BI領(lǐng)域地位[N];計(jì)算機(jī)世界;2008年
4 ;用挖掘技術(shù)使學(xué)術(shù)資源利用效益最大化[N];中國(guó)計(jì)算機(jī)報(bào);2007年
相關(guān)博士學(xué)位論文 前10條
1 曹奇敏;網(wǎng)絡(luò)信息文本挖掘若干問題研究[D];北京理工大學(xué);2015年
2 陳虹樞;基于主題模型的專利文本挖掘方法及應(yīng)用研究[D];北京理工大學(xué);2015年
3 李梅;文本挖掘中若干關(guān)鍵技術(shù)研究[D];西北農(nóng)林科技大學(xué);2016年
4 袁鋒;中醫(yī)醫(yī)案文本挖掘的若干關(guān)鍵技術(shù)研究[D];山東師范大學(xué);2016年
5 孫道軍;文本挖掘預(yù)處理相關(guān)基礎(chǔ)技術(shù)分析與應(yīng)用研究[D];北京郵電大學(xué);2008年
6 周雪忠;文本挖掘在中醫(yī)藥中的若干應(yīng)用研究[D];浙江大學(xué);2004年
7 王明春;基于粗糙集的數(shù)據(jù)及文本挖掘方法研究[D];天津大學(xué);2005年
8 李芳;文本挖掘若干關(guān)鍵技術(shù)研究[D];北京化工大學(xué);2010年
9 文翰;面向信息檢索的Web文本挖掘方法研究[D];華南理工大學(xué);2012年
10 卜東波;聚類/分類理論研究及其在文本挖掘中的應(yīng)用[D];中國(guó)科學(xué)院研究生院(計(jì)算技術(shù)研究所);2000年
相關(guān)碩士學(xué)位論文 前10條
1 張馨允;基于Spark的Web文本挖掘系統(tǒng)的研究與實(shí)現(xiàn)[D];吉林大學(xué);2016年
2 王釗;基于Hadoop的文本挖掘研究與應(yīng)用[D];廣東工業(yè)大學(xué);2016年
3 黃建澍;面向人大代表議案處理的文本挖掘系統(tǒng)的設(shè)計(jì)與實(shí)現(xiàn)[D];中國(guó)科學(xué)院大學(xué)(工程管理與信息技術(shù)學(xué)院);2016年
4 徐奇釗;基于文本挖掘的文本情緒分類[D];云南財(cái)經(jīng)大學(xué);2016年
5 鄒運(yùn)懷;基于文本挖掘的道岔故障分類研究[D];北京交通大學(xué);2016年
6 王萍;基于Web文本挖掘的電子商務(wù)專業(yè)人才市場(chǎng)需求研究[D];重慶工商大學(xué);2016年
7 盛華;聚類分析在文本挖掘中的應(yīng)用與研究[D];江南大學(xué);2016年
8 劉超;業(yè)界專家的媒體發(fā)言對(duì)公司股價(jià)影響的分析[D];上海師范大學(xué);2016年
9 吳亞宇;基于文本挖掘的年報(bào)情感與上市公司業(yè)績(jī)的關(guān)系研究[D];中國(guó)地質(zhì)大學(xué)(北京);2016年
10 高希瑞;基于文本挖掘的企業(yè)危機(jī)預(yù)警研究[D];華東師范大學(xué);2011年
,本文編號(hào):2083388
本文鏈接:http://www.wukwdryxk.cn/shoufeilunwen/xxkjbs/2083388.html