Wednesday, 23 July 2014

SANE: Semantic-Aware Namespace in Ultra-Large-Scale File Systems



The explosive growth in data volume and complexity imposes great challenges for file systems. To address these challenges, an innovative namespace management scheme is in desperate need to provide both the ease and efficiency of data access. In almost all today’s file systems, the namespace management is based on hierarchical directory trees. This tree-based namespace scheme is prone to severe performance bottlenecks and often fails to provide real-time response to complex data lookups. This paper proposes a Semantic-Aware Namespace scheme, called SANE, which provides dynamic and adaptive namespace management for ultra-large storage systems with billions of files. SANE introduces a new naming methodology based on the notion of semantic-aware per-file namespace, which exploits semantic correlations among files, to dynamically aggregate correlated files into small, flat but readily manageable groups to achieve fast and accurate lookups. SANE is implemented as a middleware in conventional file systems and works orthogonally with hierarchical directory trees. The semantic correlations and file groups identified in SANE can also be used to facilitate file prefetching and data de-duplication, among other system-level optimizations. Extensive trace-driven experiments on our prototype implementation validate the efficacy and efficiency of SANE.
According to a recent survey of 1,780 data center managers in 26 countries, over 36 percent of respondents faced two critical challenges: efficiently supporting a flood of emerging applications and handling the sharply increased data management complexity. This reflects a reality in which we are generating and storing much more data than ever and this trend continues at an accelerated pace. This data volume explosion has imposed great challenges to storage systems, particularly to the metadata management of file systems. For example, many systems are required to perform hundreds of thousands of metadata operations per second and the performance is severely restricted by the hierarchical directory-tree based metadata management scheme used in almost all file systems today.
The most important functions of namespace management are file identification and lookup. File system namespace as an information-organizing infrastructure is fundamental to system’s quality of service such as performance, scalability, and ease of use. Almost all current file systems, unfortunately, are based on hierarchical directory trees.
v Limited system scalability.
v Reliance on end-users to organize and lookup data.
v Lack of metadata-semantics exploration.

We propose a new namespace management scheme, called SANE, which provides a flat but small, manageable and efficient namespace for each file. In SANE, the notion of semantic-aware per-file namespace is proposed in which a file is represented by its semantic correlations to other files, instead of conventional static file names. Our goal is not to replace conventional directory-tree management that already has a large user base. Instead, we aim to provide another metadata overlay that is orthogonal to directory trees. SANE runs concurrently with the conventional file system that integrates it and takes over the responsibilities of file search and semantic file grouping from the file system when necessary. Moreover, SANE, while providing the same functionalities, makes use of a new naming scheme that only requires constant-scale complexity to identify and aggregate semantically correlated files. SANE extracts the semantic correlation information from a hierarchical tree.


v The metadata of files that are strongly correlated are automatically aggregated and then stored together in SANE.

v SANE is implemented as a transparent middleware that can be deployed / embedded in most existing file systems without modifying the kernels or applications.



Processor                  -        Pentium –IV

Speed                        -        1.1 Ghz
RAM                         -        512 MB(min)
Hard Disk                 -        40 GB
Key Board                -        Standard Windows Keyboard
Mouse                       -        Two or Three Button Mouse
Monitor                     -        LCD/LED
Operating system      :         Windows XP.
Coding Language      :         .Net
Data Base                 :         SQL Server 2005
Tool                          :         VISUAL STUDIO 2008.

Yu Hua, Hong Jiang, Senior , Yifeng Zhu, Dan Feng, and Lei Xu_, “SANE: Semantic-Aware Namespace in Ultra-Large-Scale File Systems” IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 2014

No comments:

Post a Comment