Wednesday, 23 July 2014

Signature Searching in a Networked Collection of Files



SIGNATURE SEARCHING IN A NETWORKED COLLECTION OF FILES
TO VIEW OUTPUT CLICK HERE

ABSTRACT:

A signature is a data pattern of interest in a large data file or set of large data files. Such signatures that need to be found arise in applications such as DNA sequence analysis, network intrusion detection, biometrics, large scientific experiments, speech recognition and sensor networks. Related to this is string matching. More specifically we envision a problem where long linear data files (i.e., flat files) contain multiple signatures that are to be found using a multiplicity of processors (parallel processor). This paper evaluates the performance of finding signatures in files residing in the nodes of parallel processors configured as trees, two dimensional meshes and hypercubes. We assume various combinations of sequential and parallel searching. A unique feature of this work is that it is assumed that data is pre-loaded onto processors, as may occur in practice, thus load distribution time need not be accounted for. Elegant expressions are found for average signature searching time and speedup, and graphical results are provided.
EXISTING SYSTEM:
String searching, which is similar to our concept of signature searching, is a special case of pattern searching. String searching generically involves finding a pattern of lengthmin a text of length n over some alphabet. The worst case complexity of exact string matching is O(n) but the proportionality constant of the linear term can be very different depending on the string matching algorithm, ranging from m for the naBve algorithm to 2 for the Knuth- Morris-Pratt algorithm. Approximate string matching involves string matching that allows errors. That is, the pattern and/or text suffer some corruption. Applications include noisy channels, speech recognition, hand writing recognition, finding DNA sequences in the presence of mutations and text searching. Approximate string matching algorithms utilize some distance metric to quantify the amount of difference between two strings. For instance, the edit distance is the number of differences between two strings. The computational complexity of approximate string matching can range from linear to NP complete depending on the error mechanism.
DISADVANTAGES OF EXISTING SYSTEM:
v Approximate string matching involves string matching that allows errors.
v Pattern and/or text suffer some corruption.

PROPOSED SYSTEM:
In this paper we assume the data is stored in flat files (i.e., very long linear sequences of data) stored at nodes of certain interconnection networks. We assume linear (in the file size) computational complexity which applies to exact string matching and some approximate string matching. Naturally more sophisticated database methodologies are possible and flat file are often converted into other structures but for initial raw data processing flat files are natural. We envision a scenario where files containing signatures are placed on a multiplicity of (parallel) processors tied together by an interconnection network. Unlike the work in much of the divisible load theory literature, we do not take the time to distribute the load to the processors and links into account. Rather we assume the files are pre-loaded onto the processors prior to time t ¼ 0. This is relevant in certain applications. Load is often spontaneously distributed to processors without being scheduled, monitored or timed.
ADVANTAGES OF PROPOSED SYSTEM:
v Differing degrees of sequentiality and concurrency in the search strategy.
v Its goal is to determine expected search time and speedup under a variety of search protocols that largely differ in a number of aspects.

SYSTEM CONFIGURATION:-

HARDWARE REQUIREMENTS:-

Processor             -       Pentium –IV

Speed                  -       1.1 Ghz
RAM                   -       512 MB(min)
Hard Disk            -       40 GB
Key Board           -       Standard Windows Keyboard
Mouse                 -       Two or Three Button Mouse
Monitor                       -       LCD/LED
SOFTWARE REQUIREMENTS:
Operating system        :       Windows XP.
Coding Language       :       JAVA
Data Base            :       MySQL
Tool                    :       Netbeans.

REFERENCE:
Zhongwen Ying and Thomas G. Robertazzi,“ Signature Searching in a Networked Collection of Files” IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 25, NO. 5, MAY 2014.

No comments:

Post a Comment