OCCT: A ONE-CLASS
CLUSTERING TREE FOR IMPLEMENTING ONE-TO-MANY DATA LINKAGE
ABSTRACT:
One-to-many data linkage
is an essential task in many domains, yet only a handful of prior publications
have addressed this issue. Furthermore, while traditionally data linkage is
performed among entities of the same type, it is extremely necessary to develop
linkage techniques that link between matching entities of different types as
well. In this paper we propose a new one-to-many data linkage method that links
between entities of different natures. The proposed method is based on a
one-class clustering tree (OCCT)which characterizes the entities that should be
linked together. The tree is built such that it is easy to understand and
transform into association rules,i.e.,the inner nodes consist only of features
describing the first set of entities, while the leaves of the tree represent
features of their matching entities from the second dataset. We propose four
splitting criteria and two different pruning methods which can be used for
inducing the OCCT. The method was evaluated using datasets from three different
domains. The results affirm the effectiveness of the proposed method and show
that the OCCT yields better performance in terms of precision and recall (in
most cases it is statistically significant) when compared to a C4.5 decision
tree-based linkage method.
EXISTING SYSTEM:
Data linkage is the task of identifying
different entries (i.e., data items) that refer to the same entity across
different data sources. The goal of the data linkage task is joining datasets
that do not share a common identifier (i.e., a foreign key).Common data linkage
scenarios include: linking data when combining two different databases; data
deduplication (a data compression technique for eliminating redundant data) which
is commonly done as a preprocessing step for data mining tasks identifying
individuals across different census datasets linking similar DNA sequences and,
matching astronomical objects from different catalogues. It is common to divide
data linkage into two types: one-to-one
and one-to-many. In
one-to-one data linkage, the goal is to associate an entity from one dataset
with a single matching entity in another dataset. In one-to-many data linkage,
the goal is to associate an entity from the first dataset with a group of
matching entities from the other dataset. Most of the previous works focus on
one-to-one data linkage.
DISADVANTAGES OF
EXISTING SYSTEM:
·
It is not secure.
· It can able to do one to one data linkage only.
· It consumes large amount of time.
PROPOSED SYSTEM:
We propose a new data linkage method aimed at
performing one-to-many (and can be ex-tended to many-to-many) linkage. In
addition, while data linkage is usually performed among entities of the same
type, the proposed data linkage technique can match entities of different
types. For example, in a student database we might want to link a student
record with the courses she should take (according to different features which
describe the student and features describing the courses). The proposed method
links between the entities using a One-Class
Clustering Tree (OCCT). A clustering tree is a tree in
which each of the leaves contains a cluster instead of a single classification.
Each cluster is generalized by a set of rules (e.g., a set of conditional
probabilities) that is stored in the appropriate leaf.
ADVANTAGES OF PROPOSED
SYSTEM:
·
It resolves three major problem data leakage prevention, recommender systems, and fraud detection.
· It
can able to detect abnormal access to database records that might indicate a
potential data leakage or data misuse.
SYSTEM CONFIGURATION:-
HARDWARE REQUIREMENTS:-
ü Processor - Pentium –IV
ü Speed - 1.1 Ghz
ü RAM - 512 MB(min)
ü Hard
Disk - 40 GB
ü Key
Board - Standard Windows Keyboard
ü Mouse - Two or Three Button Mouse
ü Monitor - LCD/LED
SOFTWARE
REQUIREMENTS:
•
Operating system : Windows XP
•
Coding Language : Java
•
Data Base : MySQL
•
Tool : Net Beans IDE
REFERENCE:
Ma'ayan
Gafny, Asaf Shabtai, Lior Rokach, Yuval Elovici “ OCCT: A One-Class Clustering Tree for Implementing
One-to-Many Data Linkage” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA
ENGINEERING, TKDE-2011-09-0577.
No comments:
Post a Comment