a国产,中文字幕久久波多野结衣AV,欧美粗大猛烈老熟妇,女人av天堂

當(dāng)前位置:主頁 > 科技論文 > 計算機(jī)論文 >

基于HDFS的海量小文件存儲系統(tǒng)設(shè)計與實(shí)現(xiàn)

發(fā)布時間:2018-06-10 00:19

  本文選題:海量小文件存儲 + 分布式文件系統(tǒng) ; 參考:《國防科學(xué)技術(shù)大學(xué)》2012年碩士論文


【摘要】:近年來,企業(yè)和個人數(shù)據(jù)都呈現(xiàn)爆炸性增長的趨勢。谷歌首席執(zhí)行官EricSchmidt表示,現(xiàn)在全球每兩天所創(chuàng)造的數(shù)據(jù)量等同于從人類文明至2003年間產(chǎn)生的數(shù)據(jù)量的總和。如何存儲海量的數(shù)據(jù),成為當(dāng)前存儲系統(tǒng)所面臨的巨大挑戰(zhàn)。傳統(tǒng)集中存儲方式已經(jīng)滿足不了數(shù)據(jù)存儲的需求,,于是出現(xiàn)了用于大規(guī)模數(shù)據(jù)存儲的分布式文件系統(tǒng),如Google File System(GFS)、Hadoop File System(HDFS)、PVFS、Luster等。 這些分布式文件系統(tǒng)具有良好的可擴(kuò)展性和容錯特性,能夠滿足海量數(shù)據(jù)存儲的需求。但是在很多應(yīng)用場合除了要求支持海量大文件的存儲,還需要支持海量小文件的存儲。雖然GFS、HDFS等分布式文件系統(tǒng)能夠滿足大文件的高效存儲,但在存儲海量小文件時,效率卻很低。針對此問題,工業(yè)界和學(xué)術(shù)界提出了很多方法,但普遍存在性能低,系統(tǒng)可靠性不高,不能高效存儲小文件元數(shù)據(jù)等問題。針對這些挑戰(zhàn),本文設(shè)計實(shí)現(xiàn)了一種基于HDFS的海量小文件存儲系統(tǒng)。 該系統(tǒng)的主要設(shè)計思想是,在HDFS現(xiàn)有的目錄樹結(jié)構(gòu)下,將一個文件夾內(nèi)的小文件,打包成一個大文件進(jìn)行存儲,該文件稱為小文件數(shù)據(jù)文件。同時生成小文件索引,記錄小文件在對應(yīng)數(shù)據(jù)文件中的位置。 本文設(shè)計和實(shí)現(xiàn)的基于HDFS的海量小文件存儲系統(tǒng)是可擴(kuò)展、高容錯、分布式的海量小文件存儲集群系統(tǒng)。本文提出小文件聚合存儲技術(shù)通過將小文件數(shù)據(jù)存儲在HDFS數(shù)據(jù)文件中,實(shí)現(xiàn)數(shù)據(jù)的分布式存儲和容錯;同時提出小文件分布索引管理技術(shù)將索引分布到各個數(shù)據(jù)節(jié)點(diǎn)管理,解決了單一元數(shù)據(jù)節(jié)點(diǎn)在存儲海量小文件成為瓶頸的缺點(diǎn);設(shè)計的海量小文件存儲系統(tǒng)索引容錯機(jī)制通過對索引進(jìn)行容錯,降低小文件丟失的風(fēng)險;通過在單個目錄下創(chuàng)建多個多數(shù)據(jù)文件,解決訪問同一目錄下小文件沖突的問題。在以上基礎(chǔ)上,系統(tǒng)在客戶端緩存用戶常用到的小文件索引位置及數(shù)據(jù)文件流的信息,提高系統(tǒng)的文件訪問的效率。 通過實(shí)驗(yàn)表明,該系統(tǒng)小文件讀寫延遲、吞吐率與不增加小文件支持的原生HDFS相比有了很大的提高。并且,該系統(tǒng)能夠有效解決海量小文件存儲元數(shù)據(jù)過于龐大的問題,且通過索引容錯機(jī)制,提高了該系統(tǒng)的可靠性。
[Abstract]:In recent years, both corporate and personal data have shown an explosive growth trend. Google CEO Eric Schmidt said the amount of data created every two days in the world is now equivalent to the amount of data generated between human civilization and 2003. How to store huge amounts of data has become a great challenge to the current storage system. The traditional centralized storage method can no longer meet the requirement of data storage, so distributed file systems for large-scale data storage, such as Google File system / GFSU / Hadoop File system HDFSU / PVFS Luster, etc., have good extensibility and fault tolerance. It can meet the demand of massive data storage. However, in many applications, it is necessary to support the storage of large files as well as large files. Although distributed file systems such as GFSU HDFS can satisfy the efficient storage of large files, the efficiency of storing large numbers of small files is very low. In order to solve this problem, many methods have been put forward by industry and academic circles. However, there are many problems such as low performance, low reliability of system and low efficient storage of small file metadata. Aiming at these challenges, this paper designs and implements a large amount of small file storage system based on HDFS. The main idea of this system is that, under the existing directory tree structure of HDFS, a small file in a folder is designed. Packaged into a large file for storage, this file is called a small file data file. At the same time, the index of small files is generated, and the location of small files in the corresponding data files is recorded. This paper designs and implements a large amount of small file storage system based on HDFS, which is an extensible, highly fault-tolerant and distributed large size small file storage cluster system. In this paper, we propose a small file aggregation storage technology to realize distributed data storage and fault tolerance by storing small file data in HDFS data file, at the same time, we propose a small file distributed index management technology to distribute the index to each data node management. It solves the problem that the single metadata node becomes the bottleneck in storing the large amount of small files, and the fault-tolerant mechanism of the index of the mass small file storage system can reduce the risk of small file loss by fault-tolerant of the index. By creating multiple data files in a single directory, the problem of accessing small files in the same directory is solved. On the basis of the above, the system caches the information of small file index position and data file flow, which is commonly used by users in the client side, and improves the efficiency of file access of the system. The experiment shows that the system has delayed reading and writing of small files. Throughput is much higher than native HDFS without small file support. Moreover, the system can effectively solve the problem that the large amount of metadata stored in small files is too large, and the reliability of the system is improved by index fault-tolerant mechanism.
【學(xué)位授予單位】:國防科學(xué)技術(shù)大學(xué)
【學(xué)位級別】:碩士
【學(xué)位授予年份】:2012
【分類號】:TP333

【參考文獻(xiàn)】

相關(guān)期刊論文 前2條

1 楊德志,黃華,張建剛,許魯;大容量、高性能、高擴(kuò)展能力的藍(lán)鯨分布式文件系統(tǒng)[J];計算機(jī)研究與發(fā)展;2005年06期

2 余思;桂小林;黃汝維;莊威;;一種提高云存儲中小文件存儲效率的方案[J];西安交通大學(xué)學(xué)報;2011年06期



本文編號:2001333

資料下載
論文發(fā)表

本文鏈接:http://www.wukwdryxk.cn/kejilunwen/jisuanjikexuelunwen/2001333.html


Copyright(c)文論論文網(wǎng)All Rights Reserved | 網(wǎng)站地圖 |

版權(quán)申明:資料由用戶3cc37***提供,本站僅收錄摘要或目錄,作者需要刪除請E-mail郵箱bigeng88@qq.com
国产玖玖| 国产精品国产三级国产AV′| 日韩三级黄色片| 亚洲午夜国产精品无码老牛影视 | 欧美xxxx18性欧美护士| 男人天堂久久| 强行糟蹋人妻HD中文字幕| 欧美人与动牲交片免费| 亚洲精品成人a在线观看| 人妻精品久久无码区| 天天干天天日天天操| 狠狠操综合网| 轻轻草| 国产AV无码一区二区二三区J| 在线观看片免费人成视频无码| 国内精品久久久久久99| 国产日韩一区二区| 九色国产精品入口| 国产性色| 欧美a∨亚洲欧美亚洲| 社长ol丝袜人妻秘书| 国产精品久久久久久久免费看| 久久精品一区二区三区无码护土 | 真实国产乱子伦视频对白 | 亚洲中文字幕无码中文字幕| 云南省| 午夜爱爱爱| chiansea老熟老妇2乱| 国产三级三级三级| 超碰最新在线| 被4个男人摁着强进了好爽| av夜色| 毛片基地黄久久久久久天堂| 狼狼综合久久久久综合网 | 一区二区三区国产亚洲网站| 国产午夜福利精品久久2021| 国产美女自慰在线观看| 久久久久人妻一区二区三区VR| 中文字幕无码亚洲成a人片| 日韩国产一区二区| 精品人妻少妇一区二区三区在线|