多標簽學習中關鍵問題研究

發(fā)布時間：2018-06-27 12:27

本文選題：多標簽學習 + 多標簽分類��；參考：《西安電子科技大學》2016年博士論文

【摘要】：隨著科技的發(fā)展,越來越多的應用涉及到多標簽問題,如文本分類、圖像標注、基因功能分析等。與傳統(tǒng)的單標簽(二類分類或多類分類)問題不同,多標簽問題中允許一個示例可同時與多個標簽相關聯(lián),標簽之間存在更豐富的標簽關系,導致多標簽問題的分析變得更加復雜。多標簽學習研究的是如何給多標簽問題中的待測示例賦予所有合適的類別標簽。由于標簽關系的存在,多標簽學習比傳統(tǒng)的單標簽學習復雜得多,更加難以分析。出于應用需求,越來越多的研究人員開始多標簽學習研究。多標簽學習研究已成為機器學習和模式識別領域的研究熱點之一。雖然多標簽學習研究已經(jīng)取得了很大的進展,但其仍面臨著一些關鍵問題的挑戰(zhàn),如已有多標簽分類算法的分類性能仍有待提高、較高的標簽空間維度導致較高的訓練和測試時間成本以及較高的特征空間維度容易導致訓練模型過擬合等亟待解決的挑戰(zhàn)性問題。因此,多標簽分類、標簽空間降維和多標簽維度約簡是目前多標簽學習研究中的三個重點研究方面。其中,多標簽分類算法研究以提升分類性能為目標；標簽空間降維算法以降低標簽空間的維度為手段利用標簽關系,以期提高分類性能,同時減少訓練和測試時間；多標簽維度約簡用于解決多標簽學習中的“維度災難”問題,通過降低特征空間的維度,以獲得更好的示例表示。本論文正是圍繞這三個方面開展多標簽學習研究,主要工作包括以下幾點：1.鑒于標簽間常常有簇狀標簽關系,提出了基于簇狀本征標簽關系的多標簽分類算法。該算法中每個標簽的權(quán)值向量由公共分量和獨有分量兩部分構(gòu)成。公共分量是所有標簽共有的部分,對應示例中的背景信息；獨有分量歸單個標簽所有,對應示例中該標簽的獨有信息,標簽之間的本征關系反映在獨有分量之間的關系上,而標簽之間往往有簇狀關系。本文所提出的方法基于上述權(quán)值向量結(jié)構(gòu)對支持向量機進行擴展,在所有標簽的獨有分量上通過施加簇狀關系正則項利用簇狀標簽關系提高分類性能。通過放松正交約束條件,文中將非凸問題變?yōu)槁?lián)合凸的半正定規(guī)劃問題,并利用基于交替迭代更新規(guī)則的塊坐標下降方法提出了該問題的一種優(yōu)化方法。實驗結(jié)果表明,所提出算法的分類性能明顯優(yōu)于相關多標簽分類算法。2.針對現(xiàn)有多標簽分類算法中所有標簽用同一示例進行訓練的問題,提出了一種利用示例分布情況為每個標簽構(gòu)造更易判別的新示例表示的多標簽分類算法。由于同一示例表示無法較好地反映各標簽的特點,為此,所提出的算法基于一對所有策略將多標簽分類問題轉(zhuǎn)化為多個二類分類子問題,每個標簽對應一個子問題。每個子問題中正、負示例局部結(jié)構(gòu)之間的關聯(lián)關系對構(gòu)造高效分類模型有著很重要的作用,為挖掘這些關聯(lián)關系,本文提出了一種新的譜聚類方法一譜示例校準。所提出的多標簽分類算法利用譜示例校準算法得到聚類結(jié)果為每個標簽構(gòu)建更符合標簽特點的示例表示,然后基于新的示例表示訓練二類分類模型。實驗結(jié)果驗證了該算法的有效性。3.為在標簽空間降維過程中充分利用示例信息,提出了一種基于依賴最大化(Dependence maximization)的標簽空間降維算法。該算法的目標函數(shù)包括兩部分：編碼損失和依賴損失。編碼損失衡量用主成分分析方法對標簽矩陣壓縮過程中的信息損失。當標簽向量經(jīng)過降維變成碼字向量后,還需學習從特征空間到碼字空間的回歸模型,故示例和碼字向量之間的關系很重要,依賴損失便是用來衡量兩者之間依賴關系的損失情況。為利用示例信息,所提出的算法首次用希爾伯特-施密特獨立標準來衡量依賴損失,以能更充分地挖掘并利用示例和碼字向量之間的依賴關系。此外,我們還探討了兩種不同示例核矩陣對所提出算法性能的影響,其中一種示例核矩陣基于全局結(jié)構(gòu)信息,另一種示例核矩陣基于局部潛在結(jié)構(gòu)信息。實驗結(jié)果表明,該算法不僅大大縮短了訓練和測試時間,還能有效提高分類性能：利用后一種示例核矩陣的算法具有更好的分類性能,而其訓練和測試時間與利用前一種示例核矩陣的算法相當。4.針對示例和標簽向量中的孤立點問題,本文提出了一種基于l2.1范數(shù)的魯棒標簽空間降維算法。由于數(shù)據(jù)采集設備的問題,數(shù)據(jù)集的示例中往往存在孤立點問題；標簽向量孤立點是指與標簽空間降維算法中所利用的主要標簽關系明顯不符的標簽向量。目標函數(shù)包括編碼損失和依賴損失兩部分。編碼損失衡量用主成分分析方法對標簽矩陣壓縮過程中的信息損失。依賴損失衡量示例和碼字向量間線性回歸關系的損失情況。為解決孤立點問題,該算法目標函數(shù)中的編碼損失和依賴損失均采用l2.1范數(shù)。所得到的目標問題是一個非光滑問題,本文提出的變形交替迭代更新方法有效地解決了該問題,并對其進行了收斂性分析。實驗結(jié)果表明,所提出的魯棒標簽空間降維既能縮短訓練和測試時間,又能提高分類性能。此外,在標簽受污染的數(shù)據(jù)集上的實驗結(jié)果表明,與其它標簽空間降維算法相比,該算法具有更好的魯棒性。5.現(xiàn)有多標簽維度約簡方法沒有利用局部潛在結(jié)構(gòu),而傳統(tǒng)維度約簡方法研究已表明這些結(jié)構(gòu)的有用性。為此,本文提出了一種新的多標簽維度約簡方法一多標簽局部判別嵌入。該方法利用與實際情況更符合的非對稱標簽關系矩陣,這樣既賦予了包含信息量多的示例更大的權(quán)重,又克服多標簽學習中的過計數(shù)問題；通過構(gòu)建兩個鄰接圖集合來分析局部潛在結(jié)構(gòu),以更好地挖掘并利用數(shù)據(jù)內(nèi)部的幾何結(jié)構(gòu),使維度約簡結(jié)果有更好的類內(nèi)緊致性和類間可分性。通過對得到的優(yōu)化問題施加正交約束條件,獲得一組正交投影向量。實驗結(jié)果表明,與相關多標簽維度約簡方法相比,該方法的維度約簡結(jié)果更合理,能產(chǎn)生更有判別信息的特征,從而取得更好的分類精度。
[Abstract]:With the development of science and technology, more and more applications involve multi label problems, such as text classification, image annotation, gene function analysis, etc.. Different from the traditional single label (two class classification or multi class classification) problem, the multi label problem allows one example to be associated with multiple labels simultaneously, and there is a more rich label relationship between the labels. The analysis of multiple label problems becomes more complex. Multi label learning studies how to give all appropriate category labels to examples in the multi label problem. Because of the existence of the label relationship, multi label learning is much more complex and difficult to analyze than traditional single label learning. More and more researchers, out of application requirements, have become more and more researchers. Multi label learning has become one of the hotspots in the field of machine learning and pattern recognition. Although much progress has been made in the study of multi label learning, it still faces some key challenges, such as the classification performance of the existing multi label classification algorithms still needs to be improved and the label space is higher. Dimensionality leads to higher training and test time cost and high feature space dimension easily leads to the challenge of training model overfitting. Therefore, multi label classification, label spatial reduction and multi label dimension reduction are three key research aspects of multi label learning. The objective of the study is to improve the classification performance. The label space reduction algorithm uses the label relationship to reduce the dimension of the label space as a means to improve the classification performance, while reducing the training and testing time. In order to obtain a better example, this thesis is to carry out the study of multi label learning around these three aspects. The main work includes the following points: 1. in view of the often clustered label relationship between tags, a multi label classification algorithm based on cluster eigenvalue label relations is proposed. There are two components. The common component is the common part of all labels, corresponding to the background information in the example; the unique component belongs to the single label, corresponding to the unique information of the label in the example, the intrinsic relationship between the tags is reflected in the relationship between the unique components, and the label often has a cluster relationship. This method extends the support vector machine based on the weight vector structure above, and improves the classification performance by applying the cluster relation regular term on the unique component of all labels. By relaxing the orthogonal constraint conditions, the non convex problem is transformed into a joint convex semi positive programming problem, and the alternative iteration is used to make use of the alternate iteration more. The block coordinate descending method of the new rule proposes an optimization method of this problem. The experimental results show that the classification performance of the proposed algorithm is obviously better than that of the related multi label classification algorithm.2., which uses the same example for all the tags in the existing multi label classification algorithm. The multi label classification algorithm represented by the new example is more easily discriminating. Because the same example is not good to reflect the characteristics of each label, the proposed algorithm is based on a pair of all strategies to transform the multi label classification problem into multiple two class classification subproblems, each tag corresponds to a sub problem. Each sub problem is positive, The correlation between negative examples of local structures plays an important role in constructing an efficient classification model. In order to excavate these relationships, a new spectral clustering method, a spectral example calibration, is proposed in this paper. The proposed multi label classification algorithm uses the spectral example calibration algorithm to get the clustering results for each label more conforming to the label. The characteristics of the example are expressed, and then the two class classification models are trained based on the new example. The experimental results verify that the validity of the algorithm.3. is to make full use of the example information in the process of reducing the dimension of the label space. A space reduction algorithm based on the dependency maximization (Dependence maximization) is proposed. The target function of the algorithm includes the algorithm. The two part: coding loss and dependence loss. The code loss measure uses principal component analysis method to reduce the information loss in the label matrix compression process. When the label vector passes the dimension reduction to the codeword vector, it is necessary to learn the regression model from the feature space to the codeword space, so the relationship between the example and the codeword vector is very important and depends on the loss. It is used to measure the loss of dependence between the two. For the first time, the proposed algorithm uses the Hilbert Schmidt independent standard to measure the dependence loss for the first time, so that the dependence between the example and the codeword vector can be more fully excavated and used. In addition, we also discuss two different examples of the kernel matrix pairs. One example kernel matrix is based on global structure information, and the other example kernel matrix is based on local potential structure information. The experimental results show that the algorithm not only greatly reduces the training and test time, but also improves the classification performance effectively: the algorithm of the latter example kernel matrix has better classification. Yes, while its training and testing time is equivalent to the algorithm of the previous example kernel matrix using.4., a robust tag space reduction algorithm based on l2.1 norm is proposed in this paper, which is based on the problem of data acquisition equipment. The outlier is a label vector which is obviously incompatible with the main label relationship in the dimension reduction algorithm of the label space. The target function includes two parts of the coding loss and the dependence loss. The loss of information in the compression process of the tag matrix using the principal component analysis method, the example of the loss imbalance and the linear return between the codeword vectors. In order to solve the problem of the outlier, the l2.1 norm is used for both the coding loss and the dependence loss in the objective function of the algorithm. The target problem is a non smooth problem. The proposed alternation iterative updating method is effective in solving the problem, and the convergence analysis is carried out. The experimental results show that the problem is not smooth. The proposed robust label space reduction can not only shorten the training and test time, but also improve the performance of the classification. In addition, the experimental results on the contaminated data set show that the algorithm has better robustness compared with the other label space reduction algorithms, and the existing multi label dimensionality reduction method has not made use of the local potential structure for.5.. The study of the traditional dimensionality reduction method has shown the usefulness of these structures. For this reason, a new multi label dimensionality reduction method with multi label local discriminant embedding is proposed. This method uses the asymmetric label relation matrix which is more consistent with the actual situation, so it not only gives a larger weight of the example with more information in the packet, but also overcomes the fact that the packet has more information. The problem of counting the over counting in multi label learning; by constructing two adjacent atlas to analyze the local potential structure to better excavate and utilize the geometric structure of the data, make the result of dimension reduction have better intra class compactness and interclass separability. By applying orthogonal constraints to the optimized questions obtained, a set of orthogonal input is obtained. The experimental results show that, compared with the related multi label dimension reduction method, the dimensional reduction results of the proposed method are more reasonable and can produce more discriminant information, thus achieving better classification accuracy.
【學位授予單位】：西安電子科技大學
【學位級別】：博士
【學位授予年份】：2016
【分類號】：TP181

【相似文獻】

相關期刊論文前10條

1 林茜卡;傅秀芬;滕少華;李云;;協(xié)同標簽系統(tǒng)的應用研究[J];暨南大學學報(自然科學與醫(yī)學版);2009年01期

2 吳超;周波;;基于復雜網(wǎng)絡的社會化標簽分析[J];浙江大學學報(工學版);2010年11期

3 吳金成;曹嬌;趙文棟;張磊;;標簽集中式發(fā)布訂閱機制性能分析[J];指揮控制與仿真;2010年06期

4 李曉燕;陳剛;壽黎但;董金祥;;一種面向協(xié)作標簽系統(tǒng)的圖片檢索聚類方法[J];中國圖象圖形學報;2010年11期

5 袁柳;張龍波;;基于概率主題模型的標簽預測[J];計算機科學;2011年07期

6 張斌;張引;高克寧;郭朋偉;孫達明;;融合關系與內(nèi)容分析的社會標簽推薦[J];軟件學報;2012年03期

7 王永剛;嚴寒冰;許俊峰;胡建斌;陳鐘;;垃圾標簽的抵御方法研究[J];計算機研究與發(fā)展;2013年10期

8 汪祥;賈焰;周斌;陳儒華;韓毅;;基于交互關系的微博用戶標簽預測[J];計算機工程與科學;2013年10期

9 顧亦然;陳敏;;一種三部圖網(wǎng)絡中標簽時間加權(quán)的推薦方法[J];計算機科學;2012年08期

10 趙亞楠;董晶;董佳梁;;基于社會化標注的博客標簽推薦方法[J];計算機工程與設計;2012年12期

相關會議論文前6條

1 朱廣飛;董超;王衡;汪國平;;照片標簽的智能化管理[A];第四屆和諧人機環(huán)境聯(lián)合學術會議論文集[C];2008年

2 房冠南;袁彩霞;王小捷;李江;宋占江;;面向?qū)υ捳Z料的標簽推薦[A];中國計算語言學研究前沿進展（2009-2011）[C];2011年

3 梅放;林鴻飛;;基于社會化標簽的移動音樂檢索[A];第五屆全國信息檢索學術會議論文集[C];2009年

4 李靜;林鴻飛;;基于用戶情感標簽的音樂檢索算法[A];第六屆全國信息檢索學術會議論文集[C];2010年

5 駱雄武;萬小軍;楊建武;吳於茜;;基于后綴樹的Web檢索結(jié)果聚類標簽生成方法[A];第四屆全國信息檢索與內(nèi)容安全學術會議論文集（上）[C];2008年

6 王波;唐常杰;段磊;尹佳;左R，

本文編號：2073918

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.wukwdryxk.cn/shoufeilunwen/xxkjbs/2073918.html

上一篇：復雜場景下基于局部分塊和上下文信息的單視覺目標跟蹤
下一篇：氮化物MIS-HEMT器件界面工程研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍源|省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

多標簽學習中關鍵問題研究