Big data systems aggregate massive data streams from heterogeneous data sources. What are the techniques or solutions to cater the effect of collecting this raw data stream and reducing the volume without effecting value of data.
In addition to what Mr Ashish Dutt said, you can definitely normalize the data set to reduce to range of randomness in data set. Other than that their are various methods for dimensionality reduction like SOM based on influence of various independent attributes on dependent attributes also there are possibilities of cluster or classification based on supervised and unsupervised learning in massive hetergeneous data set
In addition to Mr Manas Gaur answer, I would like to emphasize that the computation of the dimensionality reduction step is itself an issue. Even if you use PCA, which is probably the simplest method, the computation and the eigenvalue decomposition of the covariance matrix are difficult problems when dealing with big data. Dedicated methods should be used.
A balanced use of correlative (PCA, Cluster Analysis, Multidiemnsional Scaling) and sampling methods will allow you to check the basic invariance of the data structure (in terms of mutual correlatyion between variables) across different samplings.
Moreover, before entering the above methods the usual data trimming (keeping only one variable out of a variables pair correlated more than 0.90, elimination of no-variance items and variables with too many empty spaces) are in any case a good start.