Which Data Mining technique is suitalbe for removing duplicate entries and calculate the positive or negative result on the basis of survey ?

Presently, there are two major approaches for duplicate record detection. Research in databases highlights relatively simple and quick duplicate detection techniques that can be applied to databases with millions of records. Such techniques typically do not rely on the existence of training data and emphasize efficiency over effectiveness. On the other hand, research in machine learning and statistics aims to develop more sophisticated matching techniques that rely on probabilistic models. An interesting way for future research is to develop techniques that combine the best of both worlds. Most of the duplicate detection systems available today recommend various algorithmic approaches for speeding up the duplicate detection process. The varying nature of the duplicate detection process also requires adaptive methods that detect different patterns for duplicate detection and automatically adapt themselves over time.

Finally, a huge amount of structured information is now derived from unstructured text and from the web. This information is typically inaccurate and noisy; duplicate record detection techniques are essential for improving the quality of the extracted data. The increasing popularity of information extraction techniques is going to make this issue more common in the future, highlighting the need to develop strong and scalable solutions. This only adds to the response that more research is needed in the area of duplicate record detection and in the area of data cleaning and information excellence in general. We conclude with coverage of existing tools and with a brief discussion of the problems in duplicate record detection.

Naeem A. Mahoto

Removal of duplicate entries or transformation of data into some suitable format comes under the data preprocessing methods (i.e., data cleaning methods).

There are several data cleaning methods. The existing available tools (such as WEKA and RapidMiner using filters) also allow to remove duplicate entries.

Data mining techniques, however, can be used for the interesting and valuable information extraction from the dataset.

Becoming a reviewer ?

Suggest me Journal for Publication ?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

Why can't academics earn the money they deserve?

Conjugation of PEG-Amine to an Amino Acid Using EDC?