To perform data quality assessment in the pre-processing data phase (Big Data Context), should data profiling being performed before data sampling (on the whole data set), or is it ok to have profiled on a subset of the data?
If we consider the second approach, how sampling is done without having information about the data (even some level of profiling)?