EFFICIENTLY REPRESENTING MEMBERSHIP FOR
VARIABLE LARGE DATA SETS
ABSTRACT:
Cloud
computing has raised new challenges for the membership representation scheme of
storage systems that manage very large data sets. This paper proposes DBA, a
dynamic Bloom filter array aimed at representing membership for variable large
data sets in storage systems in a scalable way. DBA consists of dynamically
created groups of space-efficient Bloom filters (BFs) to accommodate changes in
set sizes. Within a group, BFs are homogeneous and the data layout is optimized
at the bit level to enable parallel access and thus achieve high query
performance. DBA can effectively control its query accuracy by partially
adjusting the error rate of the constructing BFs, where each BF only represents
an independent subset to help locate elements and confirm membership. Further,
DBA supports element deletion by introducing a lazy update policy. We prototype
and evaluate our DBA scheme as a scalable fast index in the MAD2 deduplication
storage system. Experimental results reveal that DBA (with 64 BFs per group)
shows significantly higher query performance than the state-of-the-art approach
while scaling up to 160 BFs. DBA is also shown to excel in scalability, query
accuracy, and space efficiency by theoretical analysis and experimental
evaluation..
EXISTING SYSTEM:
A
straightforward approach to recording membership is to keep an ordered full
index in memory. Once a membership query arrives, certain search algorithm will
be activated to locate the target item. However, this primitive method faces two
challenges when dealing with variable large data sets. First, it is
cost-ineffective to maintain an ordered full index, as the logical/physical
structure of the index must be frequently adjusted to accommodate the addition
or deletion of elements. Commercial stores such as Amazon’s Dynamo and Microsoft’s
ChunkStash allow complicated keys (i.e., opaque byte arrays and 20-byte SHA-1
hashes respectively) that cannot be efficiently sorted. Second, as the amount
of data grows, the whole index can become too large to be stored in the RAM in
its entirety.
DISADVANTAGES OF
EXISTING SYSTEM:
v Cost-ineffective
to maintain.
v Index
can become too large to be stored in the RAM.
PROPOSED
SYSTEM:
This
paper proposes DBA, a dynamic Bloom filter array aimed at representing
membership for variable large data sets in storage systems in a scalable way.
DBA consists of dynamically created groups of space-efficient Bloom filters
(BFs) to accommodate changes in set sizes. Within a group, BFs are homogeneous
and the data layout is optimized at the bit level to enable parallel access and
thus achieve high query performance. DBA can effectively control its query
accuracy by partially adjusting the error rate of the constructing BFs, where
each BF only represents an independent subset to help locate elements and
confirm membership. Further, DBA supports element deletion by introducing a
lazy update policy.
ADVANTAGES OF PROPOSED
SYSTEM:
v
It gives high query performance.
v
It provides large data sets in storage
systems in a scalable way.
v
Minimum energy cost.
SYSTEM CONFIGURATION:-
HARDWARE REQUIREMENTS:-
Processor - Pentium –IV
Speed - 1.1 Ghz
RAM - 512 MB(min)
Hard Disk - 40 GB
Key Board - Standard Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - LCD/LED
SOFTWARE
REQUIREMENTS:
Operating
system : Windows XP.
Coding
Language : .Net
Data
Base : SQL Server 2005
Tool : VISUAL STUDIO 2008.
REFERENCE:
Jiansheng
Wei, Member, IEEE, Hong Jiang, Senior Member, IEEE, Ke Zhou, Member, IEEE, and
Dan Feng, Member, IEEE, “Efficiently
Representing Membership for Variable Large Data Sets” IEEE TRANSACTIONS ON
PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 4, APRIL 2014
No comments:
Post a Comment