a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當(dāng)前位置:主頁 > 科技論文 > 自動化論文 >

基于序列特征的多位點亞細(xì)胞定位預(yù)測研究

發(fā)布時間:2018-05-31 15:05

  本文選題:亞細(xì)胞定位 + 多標(biāo)簽學(xué)習(xí)。 參考:《東北師范大學(xué)》2017年碩士論文


【摘要】:蛋白質(zhì)的功能與其在細(xì)胞中的定位有著密切的關(guān)系,新合成的蛋白質(zhì)必須被轉(zhuǎn)運到特定的細(xì)胞器(即亞細(xì)胞)中才能正確的行使其功能。因此,預(yù)測蛋白質(zhì)的亞細(xì)胞定位,在確定一個未知蛋白質(zhì)的功能,了解蛋白質(zhì)相互作用,進(jìn)而理解各種生物過程,研究一些疾病的發(fā)病機制等方面有著及其重要的意義。傳統(tǒng)的生物實驗技術(shù)如:亞細(xì)胞分離、融合綠色熒光蛋白、質(zhì)譜和同位素親和標(biāo)簽等可提供比較精確的亞細(xì)胞定位數(shù)據(jù),但是這些實驗多比較昂貴且耗時,單純依靠這些實驗技術(shù)來進(jìn)行亞細(xì)胞定位研究代價通常比較大。近年來,隨著生物數(shù)據(jù)的極大豐富,生物信息學(xué)這一交叉學(xué)科得到了迅猛發(fā)展,越來越多的研究人員熱衷于利用各種計算技術(shù)來輔助解決熱點生物學(xué)問題,用機器學(xué)習(xí)方法進(jìn)行蛋白質(zhì)亞細(xì)胞定位預(yù)測研究即是其中的熱點之一,也是本文的主要研究目標(biāo)。經(jīng)過研究人員多年的努力,機器學(xué)習(xí)算法輔助亞細(xì)胞定位預(yù)測的研究取得了一系列很有意義的成果,各種計算方法相繼產(chǎn)生,亞細(xì)胞定位預(yù)測的精度不斷提高,亞細(xì)胞定位相關(guān)的預(yù)測平臺相繼出現(xiàn),這些都為后續(xù)的蛋白質(zhì)功能分析提供了有價值的信息。盡管研究有了很大的進(jìn)展,其中仍有需要提升或改進(jìn)的地方,大致分為以下三點:(1)大多數(shù)現(xiàn)有的方法只適用于二分類的數(shù)據(jù),但是實際上,許多蛋白質(zhì)可能有一個或多個亞細(xì)胞位置,我們需要的是能進(jìn)行多標(biāo)簽亞細(xì)胞定位預(yù)測的分類器。(2)雖然有一些方法引入了多標(biāo)簽學(xué)習(xí)技術(shù)來識別有一個或者多個亞細(xì)胞位點的蛋白質(zhì),但它們的數(shù)據(jù)集中含有多標(biāo)簽的蛋白質(zhì)數(shù)目過少。(3)一些預(yù)測分類器采用了基因本體(Gene Ontology)的方法來提高預(yù)測準(zhǔn)確率,但是這種方法提出的特征維數(shù)太大,提取過程比較繁瑣,需要有效的降維方法來進(jìn)行降維。本文在對目前的蛋白質(zhì)亞細(xì)胞定位預(yù)測算法進(jìn)行了充分的比較研究基礎(chǔ)上,針對現(xiàn)有分類器的不足,提出了相應(yīng)的改進(jìn)措施,并從數(shù)據(jù)集的獲取、蛋白質(zhì)序列特征提取方法、亞細(xì)胞定位預(yù)測算法以及預(yù)測算法的性能評估等四方面進(jìn)行了詳細(xì)的闡述。本文提出的方法,采用的數(shù)據(jù)集來自于被廣泛認(rèn)可的工具iLoc-Animal,其類別的“多樣度”達(dá)到1.8922,預(yù)測總類別數(shù)達(dá)到20個;序列特征提取方法采用了氨基酸組成AAC(amino acid composition)和聚類的特征LIFT,克服了用GO來構(gòu)造特征的繁瑣和耗時;預(yù)測算法在比較了常用的多標(biāo)簽預(yù)測算法和策略基礎(chǔ)上,最終采用了多標(biāo)簽K近鄰(multi-label K-nearest neighbor);分類器性能測試階段,本文采用了十折交叉驗證方法,對準(zhǔn)確率(Precision)、精確率(Accuracy)、召回率(Recall)、絕對正確率(Absolute-True)、絕對錯誤率(Absolute-False)等五個驗證指標(biāo)進(jìn)行了評估,并同經(jīng)典算法iLoc-Animal進(jìn)行了比較。實驗結(jié)果表明,本文的方法成功分類的準(zhǔn)確度(Accuracy)為74.35%和絕對正確率(Absolute-True)為71.17%,明顯高于iLoc-Animal中的準(zhǔn)確度(62.28%)和絕對正確率(45.62%)并且,各個評價指標(biāo)本文的結(jié)果也都好于iLoc-Animal。除了預(yù)測精度較高以外,本文的預(yù)測方法還有實現(xiàn)簡單,響應(yīng)速度快等特點,希望本文的工作能對當(dāng)前的蛋白質(zhì)亞細(xì)胞定位預(yù)測研究有啟發(fā)和促進(jìn)作用。
[Abstract]:The function of a protein is closely related to its location in a cell. The newly synthesized protein must be transported to a specific organelle (or subcellular) to perform its function correctly. Therefore, the prediction of the subcellular localization of proteins, the function of an unknown protein, the understanding of protein interaction, and the understanding of various kinds of proteins. Biological processes are of great significance in studying the pathogenesis of some diseases. Traditional biological experiments, such as subcellular separation, fusion of green fluorescent protein, mass spectrometry, and isotopic affinity tags, can provide more accurate subcellular location data, but these experiments are much more expensive and time-consuming and rely solely on these facts. In recent years, with the great abundance of biological data, the cross discipline of bioinformatics has developed rapidly. More and more researchers are keen to use various computational techniques to help solve hot biologic problems and use machine learning methods to carry out protein subfining. The study of cell location prediction is one of the hot spots and also the main research goal of this article. After many years of researchers' efforts, a series of meaningful results have been obtained by the research of machine learning algorithm assisted subcellular location prediction. Various calculation methods have been produced successively, the accuracy of subcellular location prediction is constantly improved, and subcellular localization has been improved. In spite of great progress, there are still three points that need to be promoted or improved: (1) most existing methods are suitable for two categories of data, but in fact, many proteins are in fact, many proteins are in fact. There may be one or more subcellular locations, and what we need is a classifier that can predict multi label subcellular localization. (2) although some methods have introduced multiple label learning techniques to identify proteins with one or more subcellular loci, the number of proteins with multiple labels is too small. (3) some preconditioning The classifier adopts the method of Gene Ontology (Gene Ontology) to improve the accuracy of prediction. However, the feature dimension of this method is too large, the extraction process is more complicated and the effective dimensionality reduction method is needed to reduce the dimension. The shortcomings of the existing classifier are given, and the corresponding improvement measures are put forward, and the four aspects, such as the acquisition of data sets, the extraction of protein sequence features, the algorithm of subcellular location prediction and the performance evaluation of the prediction algorithm, are elaborated in detail. The method proposed in this paper comes from the widely recognized tool iLoc-Animal, The "diversity" of the category has reached 1.8922 and the total number of categories is 20. The sequence feature extraction method uses the amino acid composition AAC (amino acid composition) and the clustering feature LIFT to overcome the cumbersome and time-consuming of using GO to construct characteristics. Using the multi label K nearest neighbor (multi-label K-nearest neighbor); the classifier performance testing stage, this paper uses ten fold cross validation method, the accuracy rate (Precision), the accuracy rate (Accuracy), the recall rate (Recall), the absolute correct rate (Absolute-True), the absolute error rate (Absolute-False) and other five verification indicators, and the same as the classical calculation. The results of the method iLoc-Animal are compared. The experimental results show that the accuracy of the method (Accuracy) is 74.35% and the absolute correct rate (Absolute-True) is 71.17%, which is obviously higher than the accuracy (62.28%) and the absolute correct rate (45.62%) in the iLoc-Animal, and the results of each evaluation index are better than the iLoc-Animal. except the prediction. Besides the high precision, the prediction method of this paper has the characteristics of simple realization and quick response. It is hoped that the work of this paper can enlighten and promote the current research of protein subcellular location prediction.
【學(xué)位授予單位】:東北師范大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:Q26;TP181

【參考文獻(xiàn)】

相關(guān)期刊論文 前4條

1 鄭珊珊;石卓興;代琦;姚玉華;;蛋白質(zhì)亞細(xì)胞定位預(yù)測研究進(jìn)展[J];科技視界;2014年12期

2 李立奇;萬瑛;;蛋白質(zhì)的亞細(xì)胞定位預(yù)測研究進(jìn)展[J];免疫學(xué)雜志;2009年05期

3 張松;黃波;夏學(xué)峰;孫之榮;;蛋白質(zhì)亞細(xì)胞定位的生物信息學(xué)研究[J];生物化學(xué)與生物物理進(jìn)展;2007年06期

4 周志華,陳世福;神經(jīng)網(wǎng)絡(luò)集成[J];計算機學(xué)報;2002年01期

相關(guān)博士學(xué)位論文 前1條

1 樊國梁;基于多類特征融合的蛋白質(zhì)亞線粒體定位預(yù)測研究[D];內(nèi)蒙古大學(xué);2013年



本文編號:1960203

資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/kejilunwen/zidonghuakongzhilunwen/1960203.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶d2686***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
国自产精品手机在线观看视频| 又湿又紧又大又爽a视频| 久久夜色精品国产噜噜噜亚洲av| 无码一区二区三区AV免费| 伊人久久久大香线蕉综合直播| 91精品少妇偷拍99| 成人黄色网站在线播放视频 | 粉嫩av一区二区三区高清| 亚洲精品无码久久久影院相关影片| 亚洲av综合色区无码一区| 18禁黄网站禁片免费观看在线| 在线观看麻豆| 国产精品成人国产乱| 九色蝌蚪在线| 国产精品久久久久久久久免费高清| 韩国1级做爰片| 亚洲国产美女精品久久久| 国精产品一区二区三区有限公司| 激情国产Av做激情国产爱| 人禽杂交18禁网站免费| 蜜臀精品国产高清在线观看| 玩弄白嫩少妇xxxxx性| 久久亚洲精品无码aⅴ大香| 无码aⅴ精品一区二区三区| 日本高清高色视频免费| 日韩人妻一区二区三区免费 | 国产精品无码专区在线播放| 天堂资源中文www| 久久精品无码观看TV| 国产精品久久久久精品艾秋 | 无码国产精品一区二区免费式直播| 婷婷五月综合缴情在线视频| 久久成人亚洲香蕉草草| 久久99久久99精品免观看 | 欧美丰满xxxaaa片| 久久网址| aⅤ精品无码无卡在线观看| 国产A级毛片久久久精品毛片| 无码AV中文一二三区| 亚洲午夜久久久影院伊人| 四虎成人精品一区二区免费网站 |