當(dāng)前位置：主頁 > 管理論文 > 移動(dòng)網(wǎng)絡(luò)論文 >

HTML頁面中的文獻(xiàn)記錄分析算法

發(fā)布時(shí)間：2019-04-26 00:39

【摘要】：為了使出版機(jī)構(gòu)能夠及時(shí)從大量網(wǎng)頁中發(fā)現(xiàn)所需文獻(xiàn),需要設(shè)計(jì)能夠從超文本標(biāo)記語言頁面中自動(dòng)提取文獻(xiàn)信息的算法.為此,設(shè)計(jì)了基于條件隨機(jī)場的文獻(xiàn)記錄分析算法:首先,設(shè)計(jì)了文檔對象樹的分割算法,通過分割標(biāo)記將頁面數(shù)據(jù)分成獨(dú)立的部分,這些數(shù)據(jù)塊由標(biāo)簽和文本序列構(gòu)成;隨后,將該序列作為條件隨機(jī)場模型的特征向量,建立文獻(xiàn)信息標(biāo)記模型;最后,設(shè)計(jì)啟發(fā)式算法,從標(biāo)記模型中提取文獻(xiàn)信息數(shù)據(jù),并通過實(shí)驗(yàn)驗(yàn)證了其有效性.
[Abstract]:In order for publishers to find the required documents from a large number of web pages in time, it is necessary to design an algorithm that can automatically extract literature information from hypertext markup language pages. For this reason, a document record analysis algorithm based on conditional random field is designed. Firstly, the segmentation algorithm of document object tree is designed. The page data is divided into independent parts by segmenting tags, and these data blocks are composed of tags and text sequences. Then, using this sequence as the feature vector of conditional random field model, the document information marking model is established. Finally, the heuristic algorithm is designed to extract the literature information data from the marking model, and the validity of the model is verified by experiments.
【作者單位】：北京印刷學(xué)院信息工程學(xué)院;清華大學(xué)計(jì)算機(jī)科學(xué)與技術(shù)博士后流動(dòng)站;國家新聞出版廣電總局廣播電視衛(wèi)星直播管理中心;
【基金】：北京市教委科技創(chuàng)新服務(wù)能力建設(shè)項(xiàng)目(PXM2016_014223_000025) 北京印刷學(xué)院校級重點(diǎn)項(xiàng)目(ea201507);北京印刷學(xué)院教師隊(duì)伍建設(shè)—博士啟動(dòng)金項(xiàng)目(27170116005/062);北京印刷學(xué)院科研項(xiàng)目—出版物數(shù)據(jù)資產(chǎn)評估實(shí)驗(yàn)室建設(shè)項(xiàng)目(20190116005/006)
【分類號】：TP393.092
，

本文編號：2465603

資料下載

論文發(fā)表

支付寶下載

Download by Alipay
微信下載

Download by Wechat
會員下載

Download by Member

本文鏈接：http://www.wukwdryxk.cn/guanlilunwen/ydhl/2465603.html

上一篇：基于廣義模糊軟集理論的云計(jì)算資源需求組合預(yù)測研究
下一篇：基于ISIS路由協(xié)議的網(wǎng)絡(luò)多拓?fù)渎酚申P(guān)鍵技術(shù)研究

論文發(fā)表

·知網(wǎng)|萬方|維普|龍?jiān)磡省級|國家級|科技核心|北大核心|南大核心CSSCI|EI|SCI|SSCI|

a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

HTML頁面中的文獻(xiàn)記錄分析算法