How do we improve data quality?

More Lubaba Yemer's questions See All

Can anyone help me with Multi robot task allocation and path planning using Matlab?

Where to find videos to learn Matlab for Multi robot task allocation and path planning

13 January 2023 676 5 View

Number of research works on the CSR practices of Islamic Banks in Bangladesh?

on the context of individual Islamic banks. on the context of overall Islamic banking sectors.

10 August 2018 3,240 5 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Hello Everyone ! I'm looking for a good journal to publish my manuscript with low publication cost?

I am Looking for a Science Journal with good impact factor and low publication cost to publish a review paper. Your suggestions would be appreciated.

06 August 2024 6,796 3 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

Any idea about 'International Research Journal of commerce , arts and science? Is it a UGC listed journal?

Any idea about 'International Research Journal of commerce , arts and science? Is it a UGC listed journal? Kindly advice

04 August 2024 7,367 3 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Are authors sowing for scientific Journals to be reaping the benefits? We are charged for publication but we offer free peer review services, why?

As authors and academic writers, we usually prepare our manuscript using our own resources. We submit these manuscripts to scientific Journals for peer review and publication sometimes at a fee....

03 August 2024 7,304 2 View

Dennis Mazur

Data should be checked for accuracy each time a piece of data is placed into the database. Duplicate copies should secure at multiple sites with no data tampering.

Tharmakulasingam Sirojan

It depends on the type of the data.

If the data is a collection of waveform / signal : before the analysis there should be a step called data cleaning where we can apply some noise removal algorithms & Outlier removal techniques and filling the missing values(if any) in order to make sure the data is appropriate for analysis.

Mohammed Kemal

Data quality can be improved with having a good data governance (DG). Which is the overall management of the quality, availability, usability, integrity and security of data used in an enterprise. A sound data governance program includes a governing body or council, a defined set of procedures and a plan to execute those procedures.

Srini Vasan

Look for patterns in data that you would not normally expect. Run an autocorrelation analysis if you suspect this is the case. Double check any typos or misplaced decimals. Do sampling checks to ensure the plausibility of each piece of data sampled. Run checks to ensure that the data type is consistent for the entire data set.

Khalid Hassan

Data quality begin with the collecting data procedure, and it depends on the sensitivity of the measurement equipments and tools, which give small error margin for measurements, and does not depend on the diligence and experience of the workers, which may lead to bias in measurements.

Good Luck

Md. Abdur Rashid

Methods of data collection is important factor. Mixed method data collection and data triangulation is very effective to get quality data for the purpose of analysis.

Wojciech Indyk

There are many techniques of improving data quality. One of them is filling missing values. I recently published a paper "Generic Data Imputation and Feature Extraction for Signals from Multifunctional Printers". It's open access, here is the link: http://ceur-ws.org/Vol-2322/dsi4-1.pdf I hope it will help you with your data.

Shailendra Singh Bisht

Step1 : Determine the dimensions of data quality that you want to achieve.

Step2: Profile the data at the source system and identify anomalies

Step3: Create Business/Data Quality Rules that will run over the data source in scheduled interval and generate the discrepancy report

Step 4: Monitor the result and share the report with respective data owners for data correction.

Step5: Repeat the cycle from Step1 to Step4 until your data quality reach to the satisfactory level.

Illugi T. Hjaltalín

@shailendra

Can you please elaborate on the first step a bit more. I like your approach to solving this problem.

Lubaba Yemer

Thank you Illugi T. Hjaltalín Shailendra Singh Bisht Wojciech Indyk Mohammed Kemal Ajit kumar Roy

Mohammed Kemal Illugi T. Hjaltalín Md. Abdur Rashid all for sharing your input!!!

Jaydip Datta

Sampling is the basic part of data science . In statistical quality assurance few imperical formule are available of which variance is most important. Frequency of sampling ,Random collection of data at different point also play an important role .

Michal Arkadiusz Blaszczak

It is not a top-down process but an iterative one, so you can start with performing an initial data manipulation phase, analyse and assess whether the process was sufficiently satisfying. Even the very concept of outliers is not straight-forward and it depends on the type of data and analysis involved

@ Illugi

The Step is to determine which Data quality dimension is applicable for our project or subject area. There are 6-7 dimensions such as

Completeness, Accuracy

Timeliness, Consistency

Uniqueness, Validity

Not necessarily all will be applicable for your data. Hence At the beginning of project we should target for the low-hanging fruits

For E.g to achieve 'Validity', Set the rule that PhoneNumber Column should only have Numeric values.

For first cycle of Data quality try to achieve one or two dq dimension, this will give some experience and learning to the team.

In second cycle we should apply that learning and improve further.

Data quality project require detail planning and research it's bit difficult to write all in message but soon i will publish my paper so that it helps the community.