a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當(dāng)前位置:主頁 > 碩博論文 > 信息類博士論文 >

基于詞表示和深度學(xué)習(xí)的生物醫(yī)學(xué)關(guān)系抽取

發(fā)布時(shí)間:2018-06-24 09:02

  本文選題:詞表示 + 深度學(xué)習(xí); 參考:《大連理工大學(xué)》2016年博士論文


【摘要】:蛋白質(zhì)關(guān)系抽取和藥物關(guān)系抽取對(duì)于生物醫(yī)學(xué)領(lǐng)域相關(guān)數(shù)據(jù)庫的構(gòu)建、生命科學(xué)研究、藥物開發(fā)和疾病的防治都具有重要意義。目前,大量生物醫(yī)學(xué)關(guān)系抽取方法的研究重點(diǎn)在于特征集合的選取和核函數(shù)的設(shè)計(jì),經(jīng)過十余年的發(fā)展,基于特征和核函數(shù)的方法已經(jīng)相對(duì)成熟,提升空間變得有限。為了進(jìn)一步提升性能,本文研究基于詞表示和深度學(xué)習(xí)的抽取方法。深度學(xué)習(xí)能夠建立更深層的關(guān)系抽取模型以提升抽取效果,而詞表示將語義信息融合到詞向量中,是深度學(xué)習(xí)的前提。本文主要貢獻(xiàn)包括:針對(duì)生物醫(yī)學(xué)領(lǐng)域文本的特點(diǎn)設(shè)計(jì)詞表示模型,在傳統(tǒng)詞表示模型基礎(chǔ)上,融合詞形、詞性、詞干、句法塊、生物醫(yī)學(xué)命名實(shí)體這五類重要信息,增強(qiáng)詞向量的語義表示能力,并在蛋白質(zhì)關(guān)系抽取、藥物關(guān)系抽取等任務(wù)上取得了較好的效果,驗(yàn)證了在詞表示中融入詞性、實(shí)體等豐富信息的有效性,為基于深度學(xué)習(xí)的關(guān)系抽取方法提供了良好的詞表示基礎(chǔ)。針對(duì)蛋白質(zhì)二類關(guān)系抽取問題,克服傳統(tǒng)方法依賴于特征和核函數(shù)的局限性,提出一種基于實(shí)例表示的抽取模型,該模型包含詞向量、骨架特征、特征組合三個(gè)部分,在規(guī)模較大的語料上抽取效果達(dá)到了目前先進(jìn)水平,從而驗(yàn)證了基于詞表示和深度學(xué)習(xí)方法在蛋白質(zhì)關(guān)系抽取問題上的有效性。該模型考慮了蛋白質(zhì)關(guān)系實(shí)例的特點(diǎn),以詞向量作為輸入,配合骨架特征和向量組合,從而在實(shí)例表示中融合豐富的語義信息。針對(duì)藥物多類關(guān)系抽取問題,提出一種兩階段方法:在第一階段,采用實(shí)例表示與句法特征相結(jié)合的方法,利用邏輯回歸分類器,識(shí)別出藥物關(guān)系正例;在第二階段,利用長短期記憶網(wǎng)絡(luò)將正例分成四種藥物關(guān)系類型。為了提升第二階段性能,從重要度、實(shí)現(xiàn)代價(jià)和計(jì)算代價(jià)這三個(gè)方面考慮了多種相關(guān)要素對(duì)長短期記憶網(wǎng)絡(luò)的影響,通過實(shí)驗(yàn)發(fā)現(xiàn),詞向量、距離向量、詞性向量和雙層雙向長短期記憶網(wǎng)絡(luò)對(duì)于第二階段分類的性能具有提升作用,也是本文兩階段藥物關(guān)系抽取方法能夠取得較好效果的重要因素。綜上所述,本文針對(duì)蛋白質(zhì)間二分類關(guān)系抽取和藥物間多分類關(guān)系抽取,利用表示和深度學(xué)習(xí)等技術(shù)提出相應(yīng)的抽取方法,在一定程度上克服了基于特征和核函數(shù)方法的局限性,取得了較好的效果。詞表示和深度學(xué)習(xí)技術(shù)是近年來的研究熱點(diǎn),在生物醫(yī)學(xué)文本挖掘領(lǐng)域的起步較晚,本文所提出的方法在生物醫(yī)學(xué)關(guān)系抽取任務(wù)上取得了一定成果,驗(yàn)證了其有效性,并揭示了基于詞表示和深度學(xué)習(xí)方法在生物醫(yī)學(xué)文本挖掘領(lǐng)域具有廣闊的研究空間,值得在未來工作中繼續(xù)探索。
[Abstract]:Protein relation extraction and drug relationship extraction are of great significance to the construction of biomedical database, life science research, drug development and disease prevention and treatment. At present, a large number of biomedical relation extraction methods focus on the selection of feature sets and the design of kernel functions. After more than a decade of development, the methods based on features and kernel functions have been relatively mature, and the lifting space has become limited. To further improve performance, this paper studies extraction methods based on word representation and depth learning. Depth learning can build deeper relational extraction model to improve the extraction effect, and word representation fusion semantic information into word vector is the premise of deep learning. The main contributions of this paper are as follows: according to the characteristics of biomedical text, a word representation model is designed. Based on the traditional word representation model, five kinds of important information, such as lexical form, word-of-speech, stem, syntactic block and biomedical named entity, are fused. The ability of semantic representation of word vectors is enhanced, and good results are obtained in the tasks of protein relation extraction and drug relation extraction, which verify the effectiveness of incorporating part of speech and entity into word representation. It provides a good basis for relation extraction based on deep learning. In order to overcome the limitation of traditional methods, which depend on feature and kernel function, an extraction model based on case representation is proposed. The model consists of three parts: word vector, skeleton feature and feature combination. The effect of extraction on large scale corpus is up to the present advanced level, which verifies the validity of the method based on word representation and depth learning in the extraction of protein relationship. The model considers the characteristics of the case of protein relation, takes word vector as input, and combines skeleton feature and vector, so as to fuse rich semantic information in case representation. In order to solve the problem of drug multi-class relation extraction, a two-stage method is proposed: in the first stage, the method of case representation combined with syntactic features is used to identify the positive case of drug relationship by using logical regression classifier, and in the second stage, By using long-term and short-term memory networks, the positive cases are divided into four types of drug relationships. In order to improve the performance of the second stage, the effects of many related factors on the long-term and short-term memory network are considered from the three aspects of importance, realization cost and computational cost. Part of speech vector and double-layer bidirectional long-term and short-term memory network can improve the performance of the second stage classification, which is also an important factor that the two-stage drug relationship extraction method can achieve better results. To sum up, this paper proposes a new extraction method based on the techniques of representation and depth learning, aiming at the extraction of the two-class relationship between proteins and the multi-classification relationship between drugs. To some extent, the limitation of the method based on feature and kernel function is overcome, and good results are obtained. The technology of word representation and deep learning has been a hot research topic in recent years, and it started late in the field of biomedical text mining. The method proposed in this paper has achieved some results in the task of biomedical relation extraction, and verified its effectiveness. It is also revealed that the word representation and depth learning methods have a wide research space in biomedical text mining field, which is worthy of further exploration in the future work.
【學(xué)位授予單位】:大連理工大學(xué)
【學(xué)位級(jí)別】:博士
【學(xué)位授予年份】:2016
【分類號(hào)】:TP391.1

【相似文獻(xiàn)】

相關(guān)期刊論文 前1條

1 朱萬穎;張希府;高志強(qiáng);;句法模式的泛化及其在關(guān)系學(xué)習(xí)中的應(yīng)用[J];重慶工學(xué)院學(xué)報(bào)(自然科學(xué)版);2008年10期

相關(guān)會(huì)議論文 前1條

1 虞歡歡;陳九昌;錢龍華;周國棟;;基于樹核函數(shù)的中文語義關(guān)系抽取[A];中國計(jì)算機(jī)語言學(xué)研究前沿進(jìn)展(2007-2009)[C];2009年

,

本文編號(hào):2060941

資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/shoufeilunwen/xxkjbs/2060941.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶76696***提供,本站僅收錄摘要或目錄,作者需要?jiǎng)h除請(qǐng)E-mail郵箱bigeng88@qq.com
最近免费中文mv在线字幕 | 国产亚洲精品美女久久久| 一个人免费观看在线视频WWW| 一起碰一起噜一起| 日韩欧美一区二区三区免费观看| 丰满熟女大屁股水多多| 鄂托克旗| 西安市| 久久碰人妻一区二区三区| 91久久国产露脸国语对白| 超碰最新网址| 天堂av国产一区二区熟女人妻| 国产av一二三| 粉嫩老牛aⅴ一区二区三区| 80电影天堂网| nc18嫩草| 天天狠狠| 久久国产成人午夜av影院宅| 亚洲欧美综合精品久久成人| 中文字幕色站| 欧美办公室高跟放荡xxx| 久久久99精品| 男j插女p| 蜜桃成熟33d| 人妻比较好| 华亭县| 亚洲色欲AV无码成人专区| 怀远县| 成 人 黄 色 网 站 在线播放视频| 亚洲AV无码成人精品区天堂| 欧美最猛性xxxxx免费| 一区二区国产高清视频在线| 日韩欧无码一区二区三区免费不卡| 免费无挡无摭十八禁视频在线观看| 无码一区二区三区视频| 综合无码精品人妻一区二区三区 | 涞源县| 亚洲日韩电影久久| 亚洲国产综合精品一区| 日本精品少妇一区二区三区| 久久久亚洲精品无码|