What data analysis tools can be used in order to validate data quality?

More Martijn Smeets's questions See All

How to calculate conditional hedge ratios in R or Eviews or any other software?

I am already familiar with the process of calculating hedge ratios with linear regression (OLS). I am already running 4 different regressions for calculating hedge ratios between emerging markets...

23 May 2023 7,429 1 View

What do I do with DLS peaks that are from the buffer?

I'm looking at the aggregation of my protein sample using DLS. Unfortunately, my buffer (20mM HEPES) also results in a set of peaks. These are at approximately 1 and 1000 d.nm. The lowest peak...

01 March 2021 9,577 2 View

How long does it take for HCN-2 attach to the plate?

I ve started with culturing HCN-2 cells. I start this by coating 10 cm dish with Poly-L-ornithine for 24 hours. After that, I removed the excess liquid and washed one time with PBS. After that, I...

29 July 2020 254 0 View

Why differs CT values of one sample between its triplos (Taqman Assays and qPCR)?

At first, we performed transfections. To analyze that transfection, we performed RNA isolation, cDNA synthesis and qPCR. Using the qPCR, we analyze multiple samples, whereof each sample we...

19 March 2020 2,568 4 View

What is the optimal method to the cryopreservation of SH-SY5Y cells?

I have tried multiple times to cryopreservation of SH-SY5Y cells, but afther thawing, no cells had survived. I bring the cells to cold (4 degrees Celsius) DMEM/F-12 medium with 5% DMSO. After...

14 November 2019 3,691 3 View

Do you also want to look at team composition, team dynamics and entrepreneurial competencies of the team members?

Hi Mehdi, thanks for pointing this project on the evolution of Entrepreneurial Team to me. That triggers this question from my side. I may of help to your project. Kind regards, Martijn

25 May 2019 6,665 4 View

Validity of BOLD signal in hypoxemic COPD patients?

At a basal level, brain activity consists of action potentials, and the BOLD signal on an MRI scan is only a proxy for brain activity. As a subset of patients with COPD suffers from hypoxemia, I...

01 February 2018 4,609 0 View

What should I do when comparing response frequencies in multiple response items?

I have a multiple response item consisting of 10 response options. Participants were instructed to choose 3 of them. I want to know which one of those response options was picked the most. Of...

06 December 2017 6,186 6 View

Keeping performance payout at a constant whilst convincing participants they worked for it?

I am looking for a way to convince my participants they "earned" their payout at the end of the experiment. However, I do not want any actual performance fluctuations, because that is beyond the...

05 March 2017 3,351 4 View

How to reduce off-target effects siRNA?

Dear colleagues, I have been using siRNA interventions for some time now to study the impact of my proteins of interest on gene transcription. For these investigations I have been transfecting...

28 November 2016 2,951 8 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Mubashir Qasim

That depends on many things e.g. the nature of data, how data quality is defined and what systems are being used. Systematic data validations could be a good remedy. For the analysis part, you may want to add few data exploratory options for visual data quality checks.

Depending of the systems, you can deploy AI based autonomous anomaly detection methods to ensure quality as well.

Saki Gerassis

Nowadays, most data scientist assess the quality of their data at the moment they check whether the statistical model they are using is well-suited to the problem. There exist multiple goodness of fit tests that can offer you a good understanding about how your data (observed values) are correlated to the expected values under the model. Therefore, you can decide if the data you manage is sufficiently representative for your work task.

However, this approach is not the most efficient when you want to boost the data quality of your data warehouse. In that case, the best strategy would be to define a set of points (protocols) that your input data needs to meet to be taken into further consideration. To carry out this you can apply a wide range of rules and validation methods:

- Size/format checks, consistency checks, file existence/missing values, uniqueness check, limit check, ...

William E. Winkler

There are many data analytic tools (usually commercial) that claim that they can provide information about the quality of your files (and some also claim that they can subsequently provide files where the errors are 'corrected'.) The methods are called 'profiling' tools. I am not aware of any that work in a minimally effective manner. Difficult errors are determining 'duplicates' in files using quasi-identifying information such as name, address, date-of-birith, etc. Two records may be duplicates (represent the same person or business) even when the quasi-identifying information as representational or typographical error. Any quantitative information from the two records representing the same entity may have slight (or major) differences. If the records having missing values associated with the data that an individual wishes to analyze, then the missing values should be filled-in with a principled method that preserves joint distributions (e.g., Little and Rubin book on missing data, 2002). The 'corrected' data may also need to satisfying edit constraints (such as a child under 16 cannot be married).

htpps://sites.google.com/site/dinaworkshop2015/invited-speakers