SIGNATURE SEARCHING IN
A NETWORKED COLLECTION OF FILES
TO VIEW OUTPUT CLICK HERE
TO VIEW OUTPUT CLICK HERE
ABSTRACT:
A signature is a data pattern of interest in a large
data file or set of large data files. Such signatures that need to be found
arise in applications such as DNA sequence analysis, network intrusion
detection, biometrics, large scientific experiments, speech recognition and
sensor networks. Related to this is string matching. More specifically we
envision a problem where long linear data files (i.e., flat files) contain
multiple signatures that are to be found using a multiplicity of processors
(parallel processor). This paper evaluates the performance of finding
signatures in files residing in the nodes of parallel processors configured as
trees, two dimensional meshes and hypercubes. We assume various combinations of
sequential and parallel searching. A unique feature of this work is that it is
assumed that data is pre-loaded onto processors, as may occur in practice, thus
load distribution time need not be accounted for. Elegant expressions are found
for average signature searching time and speedup, and graphical results are
provided.
EXISTING SYSTEM:
String searching, which is similar to our concept of
signature searching, is a special case of pattern searching. String searching
generically involves finding a pattern of lengthmin a text of length n over
some alphabet. The worst case complexity of exact string matching is O(n) but
the proportionality constant of the linear term can be very different depending
on the string matching algorithm, ranging from m for the naBve algorithm to 2
for the Knuth- Morris-Pratt algorithm. Approximate string matching involves
string matching that allows errors. That is, the pattern and/or text suffer
some corruption. Applications include noisy channels, speech recognition, hand
writing recognition, finding DNA sequences in the presence of mutations and
text searching. Approximate string matching algorithms utilize some distance metric
to quantify the amount of difference between two strings. For instance, the
edit distance is
the number of differences between two strings. The computational complexity of
approximate string matching can range from linear to NP complete depending on
the error mechanism.
DISADVANTAGES OF
EXISTING SYSTEM:
v Approximate
string matching involves string matching that allows errors.
v Pattern
and/or text suffer some corruption.
PROPOSED
SYSTEM:
In this paper we assume the data is stored in flat
files (i.e., very long linear sequences of data) stored at nodes of certain interconnection
networks. We assume linear (in the file size) computational complexity which
applies to exact string matching and some approximate string matching.
Naturally more sophisticated database methodologies are possible and flat file
are often converted into other structures but for initial raw data processing
flat files are natural. We envision a scenario where files containing
signatures are placed on a multiplicity of (parallel) processors tied together
by an interconnection network. Unlike the work in much of the divisible load
theory literature, we do not take the time to distribute the load to the
processors and links into account. Rather we assume the files are pre-loaded
onto the processors prior to time t ¼ 0. This is relevant in certain
applications. Load is often spontaneously distributed to processors without
being scheduled, monitored or timed.
ADVANTAGES OF PROPOSED
SYSTEM:
v
Differing degrees of sequentiality and
concurrency in the search strategy.
v
Its goal is to determine expected search
time and speedup under a variety of search protocols that largely differ in a
number of aspects.
SYSTEM CONFIGURATION:-
HARDWARE REQUIREMENTS:-
Processor - Pentium –IV
Speed - 1.1 Ghz
RAM - 512 MB(min)
Hard Disk - 40
GB
Key Board - Standard
Windows Keyboard
Mouse - Two or Three Button Mouse
Monitor - LCD/LED
SOFTWARE
REQUIREMENTS:
Operating
system : Windows XP.
Coding
Language : JAVA
Data
Base : MySQL
Tool : Netbeans.
REFERENCE:
Zhongwen
Ying and Thomas G. Robertazzi,“ Signature Searching in a Networked Collection
of Files” IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED
SYSTEMS, VOL. 25, NO. 5, MAY 2014.
No comments:
Post a Comment