To perform data quality assessment in the pre-processing data phase (Big Data Context), should data profiling being performed before data sampling (on the whole data set), or is it ok to have profiled on a subset of the data?

If we consider the second approach, how sampling is done without having information about the data (even some level of profiling)?

More Hadi Fadlallah's questions See All
Similar questions and discussions