基于物流信息的分類算法的研究及其應用
發(fā)布時間:2018-07-10 08:24
本文選題:物流 + 數(shù)據(jù)挖掘 ; 參考:《北京郵電大學》2015年碩士論文
【摘要】:近年來,信息技術的發(fā)展推動了信息化在企業(yè)物流管理應用中的興起,使得企業(yè)中存儲的數(shù)據(jù)呈現(xiàn)爆炸式增長。以數(shù)據(jù)作為資源,充分合理的利用數(shù)據(jù)挖掘技術深化企業(yè)物流管理,重點進行基于物流信息的數(shù)據(jù)挖掘技術及其應用的研究,可以幫助企業(yè)提高運作效率、降低成本、及時決策,已成為提升企業(yè)競爭力的有效途徑。 本文以數(shù)據(jù)挖掘分類算法中的K近鄰算法為研究對象,在闡述了經(jīng)典K近鄰算法的核心思想與研究現(xiàn)狀的基礎上,總結(jié)出其兩方面不足:(1)傳統(tǒng)算法假設樣本的不同屬性對分類的重要性相同,導致不相關屬性引起分類誤判,影響算法準確率。(2)傳統(tǒng)算法在選取待分類樣本的近鄰時需計算其與所有訓練樣本的距離,計算開銷大且結(jié)果易受到噪聲樣本的影響,影響算法效率及準確率。 針對以上兩方面不足,分別提出兩種改進策略: (1)提出基于屬性約簡的改進算法,體現(xiàn)不同屬性對分類結(jié)果的差異性。該算法利用信息熵計算條件屬性與決策屬性間的相關系數(shù),區(qū)分條件屬性在分類過程中的重要性,并通過調(diào)整相關系數(shù)的閾值適當約簡樣本屬性。數(shù)值分析顯示,改進算法可在一定程度上提升分類準確率。 (2)提出基于聚類的樣本裁剪改進算法,從而有效處理海量數(shù)據(jù)集,降低算法時間復雜度。此算法利用層次聚類限定K-means聚類的初始聚類中心,避免其隨機選擇影響聚類結(jié)果,同時引入K-means聚類修正層次聚類結(jié)果并從中選擇具有代表性的樣本集進行分類測試。仿真實驗證明,通過以上的樣本裁剪,改進算法可在提高或保持分類準確率的前提下,有效地降低分類器的計算量,提高分類效率。 最后,本文在上述研究工作的基礎上設計了一個改進的K近鄰協(xié)同過濾推薦模型。該模型以北京市物流線路評分數(shù)據(jù)為應用對象,驗證該模型在解決實際問題中的有效性和可行性。實驗證明,改進算法推薦結(jié)果準確率顯著提高,通過該模型能夠幫助客戶從大量專業(yè)信息中快速找到適合的物流公司,具有實際應用性。
[Abstract]:In recent years, the development of information technology has promoted the rise of information technology in the application of enterprise logistics management, making the data stored in the enterprise explosive growth. Taking data as the resource, making full and reasonable use of data mining technology to deepen enterprise logistics management, focusing on the research of data mining technology and its application based on logistics information, can help enterprises improve their operational efficiency and reduce their costs. Timely decision-making has become an effective way to enhance the competitiveness of enterprises. In this paper, the K-nearest neighbor algorithm in the classification algorithm of data mining is taken as the research object, and the core idea and research status of the classical K-nearest neighbor algorithm are expounded. The main conclusions are as follows: (1) the traditional algorithm assumes that the different attributes of the samples are of the same importance to the classification, which leads to the classification misjudgment caused by the unrelated attributes. (2) the traditional algorithm needs to calculate the distance between the nearest neighbor of the sample to be classified and all the training samples. The computation cost is large and the results are easily affected by the noise samples, which affects the efficiency and accuracy of the algorithm. In view of the above two shortcomings, two improved strategies are proposed: (1) an improved algorithm based on attribute reduction is proposed to reflect the difference of classification results between different attributes. The algorithm uses information entropy to calculate the correlation coefficients between conditional attributes and decision attributes to distinguish the importance of conditional attributes in the classification process and to reduce the sample attributes appropriately by adjusting the threshold of correlation coefficients. Numerical analysis shows that the improved algorithm can improve the classification accuracy to some extent. (2) an improved algorithm of sample clipping based on clustering is proposed to deal with massive data sets effectively and reduce the time complexity of the algorithm. This algorithm uses hierarchical clustering to define the initial clustering center of K-means clustering to avoid its random selection to affect the clustering results. At the same time, K-means clustering is introduced to modify the hierarchical clustering results and representative sample sets are selected for classification test. The simulation results show that the improved algorithm can effectively reduce the amount of computation and improve the classification efficiency on the premise of improving or maintaining the accuracy of classification. Finally, an improved K-nearest neighbor collaborative filtering recommendation model is designed based on the above work. The model is applied to the Beijing logistics line scoring data to verify the effectiveness and feasibility of the model in solving practical problems. The experimental results show that the accuracy of the improved recommendation algorithm is significantly improved and the model can help customers quickly find the suitable logistics company from a large number of professional information and it has practical application.
【學位授予單位】:北京郵電大學
【學位級別】:碩士
【學位授予年份】:2015
【分類號】:TP311.13
【參考文獻】
相關期刊論文 前10條
1 張華娣;;貝葉斯和SVM在物流客戶流失分析中的應用[J];重慶工學院學報(自然科學版);2009年07期
2 李蓉 ,葉世偉 ,史忠植;SVM-KNN分類器——一種提高SVM分類精度的新方法[J];電子學報;2002年05期
3 周彥利;周創(chuàng)明;王曉丹;;基于核的K近鄰法[J];航空計算技術;2006年05期
4 劉向東,陳兆乾;一種快速支持向量機分類算法的研究[J];計算機研究與發(fā)展;2004年08期
5 張玲珠;周忠眉;;結(jié)合屬性值貢獻度與平均相似度的KNN改進算法[J];計算機工程與應用;2010年18期
6 鄧維斌;王國胤;王燕;;基于Rough Set的加權樸素貝葉斯分類算法[J];計算機科學;2007年02期
7 王國胤,于洪,楊大春;基于條件信息熵的決策表約簡[J];計算機學報;2002年07期
8 李紅蓮,王春花,袁保宗;一種改進的支持向量機NN-SVM[J];計算機學報;2003年08期
9 李紅蓮,王春花,袁保宗,朱占輝;針對大規(guī)模訓練集的支持向量機的學習策略[J];計算機學報;2004年05期
10 黃創(chuàng)光;印鑒;汪靜;劉玉葆;王甲海;;不確定近鄰的協(xié)同過濾推薦算法[J];計算機學報;2010年08期
,本文編號:2112814
本文鏈接:http://www.wukwdryxk.cn/guanlilunwen/wuliuguanlilunwen/2112814.html
最近更新
教材專著