基于打分準(zhǔn)則和微粒群算法的基因選擇方法研究
本文選題:基因表達(dá)譜數(shù)據(jù) 切入點(diǎn):基因選擇 出處:《江蘇大學(xué)》2017年碩士論文 論文類型:學(xué)位論文
【摘要】:癌癥作為現(xiàn)今社會的生命殺手,種類繁多,治療方法各異,盡早確診對癥下藥是挽救生命的關(guān)鍵;蛐酒某霈F(xiàn)為人類在分子角度認(rèn)識疾病機(jī)理提供了新的路徑,通過對基因表達(dá)譜數(shù)據(jù)進(jìn)行挖掘發(fā)現(xiàn)致病基因?qū)Π┌Y的診斷和治療具有重要意義。雖然不少基因選擇方法能夠選出具有較高分類性能的基因子集,但這些方法存在算法時(shí)間開銷大,選出的基因解釋性差冗余度高的缺點(diǎn),為了克服這些方法的不足,本文在提出一種有效打分機(jī)制的基礎(chǔ)上,利用微粒群算法和極限學(xué)習(xí)機(jī)進(jìn)行基因選擇,選擇出了分類性能高、可解釋性好的基因集合。本文的主要工作如下:(1)針對傳統(tǒng)基因選擇方法時(shí)間開銷大、選出的基因子集可解釋性差的缺陷,提出了一種基于打分準(zhǔn)則和改進(jìn)PSO算法的基因選擇方法。首先利用分類信息指數(shù)對原始基因池進(jìn)行預(yù)處理,基于數(shù)學(xué)抽樣調(diào)查的科學(xué)性隨機(jī)生成限定基因數(shù)目的基因集合矩陣,利用極限學(xué)習(xí)機(jī)對基因集合進(jìn)行評價(jià),并挑選出滿足條件的基因集合,然后利用打分準(zhǔn)則對基因進(jìn)行評價(jià)、排序,并篩選相關(guān)基因;最后利用模擬退火算法改進(jìn)PSO算法,并對打分準(zhǔn)則評價(jià)后的基因進(jìn)行進(jìn)一步選擇。該方法步驟簡單,時(shí)間開銷小。在多個(gè)公開的基因表達(dá)譜數(shù)據(jù)集上的實(shí)驗(yàn)——結(jié)果表明相比其他方法,由于大量準(zhǔn)確的冗余刪除,可以快速高效的選擇出與腫瘤類別高度相關(guān)的基因子集。(2)針對打分準(zhǔn)則機(jī)制未能充分利用基因與分類相關(guān)的直接信息以及PSO算法依然易于陷入局部最優(yōu)的缺陷提出了基因信息加權(quán)和粒子半初始化的改進(jìn)方法。首先根據(jù)方差的大小調(diào)整求取平均適應(yīng)度值的次數(shù),然后利用基因本身包含的分類權(quán)重信息作為打分準(zhǔn)則的新增評價(jià)標(biāo)準(zhǔn)來完善打分機(jī)制,最后針對PSO算法易于陷入局部最優(yōu)的缺點(diǎn),設(shè)定更新閾值,迫使一半粒子在范圍內(nèi)更新從而改進(jìn)算法。改進(jìn)的方法充分利用了基因自身包含的信息,使得打分機(jī)制更加合理;相比其他方法能更快跳出局部最優(yōu)。在四個(gè)數(shù)據(jù)集上的實(shí)驗(yàn)結(jié)果表明,在基于信息加權(quán)和微粒群算法的基礎(chǔ)上,進(jìn)一步提高了所選基因子集的分類準(zhǔn)確率。
[Abstract]:Cancer is the killer of life in today's society. There are many kinds of cancer. The correct diagnosis of cancer is the key to saving lives. The emergence of gene chip provides a new way for human to understand the mechanism of disease from a molecular perspective. By mining the data of gene expression profiles, we find that pathogenic genes are of great significance for the diagnosis and treatment of cancer. Although many gene selection methods can select subsets of genes with high classification performance, However, these methods have some disadvantages such as high time cost and high redundancy of genetic interpretive difference. In order to overcome the shortcomings of these methods, an effective scoring mechanism is proposed in this paper. Using particle swarm optimization algorithm and extreme learning machine to select gene sets with high classification performance and good interpretability. The main work of this paper is as follows: (1) the traditional gene selection methods cost a lot of time. In this paper, a method of gene selection based on scoring criterion and improved PSO algorithm is proposed. Firstly, the classification information index is used to preprocess the original gene pool. Based on the scientific random generation of gene set matrix with limited number of genes based on mathematical sampling survey, the gene set is evaluated by extreme learning machine, and the gene set that meets the condition is selected, and then the gene is evaluated by scoring criterion. Sequencing and screening related genes. Finally, the simulated annealing algorithm is used to improve the PSO algorithm, and further selection of the genes evaluated by the scoring criteria is carried out. The steps of this method are simple, Experimental results on multiple published gene expression data sets show that, compared with other methods, due to a large number of accurate redundant deletions, We can quickly and efficiently select a subset of genes that are highly related to the tumor category. (2) aiming at the failure of the scoring criterion mechanism to make full use of the direct information related to the classification of genes and the fact that the PSO algorithm is still prone to fall into the local optimal defect proposal. An improved method of genetic information weighting and particle semi-initialization is proposed. Firstly, the average fitness is calculated according to the magnitude of variance. Then, the classification weight information contained by gene itself is used as the new evaluation criterion to improve the scoring mechanism. Finally, aiming at the disadvantage of PSO algorithm which is prone to fall into local optimum, the update threshold is set. The algorithm is improved by forcing half of the particles to update in the range. The improved method makes full use of the information contained in the gene itself and makes the scoring mechanism more reasonable. Experimental results on four datasets show that the classification accuracy of the selected gene subset is further improved on the basis of information weighting and particle swarm optimization.
【學(xué)位授予單位】:江蘇大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2017
【分類號】:R73-3;TP18
【參考文獻(xiàn)】
相關(guān)期刊論文 前10條
1 裘日輝;劉康玲;譚海龍;梁軍;;基于極限學(xué)習(xí)機(jī)的分類算法及在故障識別中的應(yīng)用[J];浙江大學(xué)學(xué)報(bào)(工學(xué)版);2016年10期
2 謝娟英;高紅超;;基于統(tǒng)計(jì)相關(guān)性與K-means的區(qū)分基因子集選擇算法[J];軟件學(xué)報(bào);2014年09期
3 張靖;胡學(xué)鋼;李培培;張玉紅;;基于迭代Lasso的腫瘤分類信息基因選擇方法研究[J];模式識別與人工智能;2014年01期
4 葉小勇;雷勇;侯海軍;;蟻群算法在全局最優(yōu)路徑尋優(yōu)中的應(yīng)用[J];系統(tǒng)仿真學(xué)報(bào);2007年24期
5 李穎新;李建更;阮曉鋼;;腫瘤基因表達(dá)譜分類特征基因選取問題及分析方法研究[J];計(jì)算機(jī)學(xué)報(bào);2006年02期
6 李穎新,阮曉鋼;基于支持向量機(jī)的腫瘤分類特征基因選取[J];計(jì)算機(jī)研究與發(fā)展;2005年10期
7 王明怡,吳平,夏順仁;基于人工神經(jīng)網(wǎng)絡(luò)集成的微陣列數(shù)據(jù)分類[J];浙江大學(xué)學(xué)報(bào)(工學(xué)版);2005年07期
8 李穎新,阮曉鋼;基于基因表達(dá)譜的腫瘤亞型識別與分類特征基因選取研究[J];電子學(xué)報(bào);2005年04期
9 朱云華,李穎新,阮曉鋼;基于基因表達(dá)譜的小圓藍(lán)細(xì)胞瘤亞型識別[J];計(jì)算機(jī)應(yīng)用;2004年11期
10 陳銘;后基因組時(shí)代的生物信息學(xué)[J];生物信息學(xué);2004年02期
相關(guān)博士學(xué)位論文 前2條
1 石向榮;面向過程監(jiān)控的非線性特征提取方法研究[D];浙江大學(xué);2014年
2 朱林;基于特征加權(quán)與特征選擇的數(shù)據(jù)挖掘算法研究[D];上海交通大學(xué);2013年
,本文編號:1621746
本文鏈接:http://www.wukwdryxk.cn/shoufeilunwen/xixikjs/1621746.html