I guess there is some confusion around the very meaning of Big Data. Just because a data set is large, that doesn´t automatically make it “Big Data”. So how does one know if he/she actually has a Big Data problem?
Short answer: dimensionality, nonlinearity, and "confidence" that the measured/observed used actually produced the data they did and that these data are of some phenomenon or phenomena that are "real" (see #3 below for more detail).
Long answer:
1) Dimensionality:
Let's say I'm interested in whether temperatures show greater variability at particular ages. If I take the temperature of 6 million people, I have a lot of data points, but in 2-dimensional space. I can plot them and learn quite a bit without running a single statistical test. In fact, standard tests can be LESS accurate than my informal, visual scan because I can "see" how much my plot approximates a linear function, if there are potential outliers, etc. The fact that I took the temperatures from a million people helps a whole lot as well (providing I did a decent job getting balancing the ages of my sample).
What happens, though, if I instead of taking peoples' temperatures I am interested in global surface temperatures over a single year? Let's assume that I have a thermometer ever square meter across the globe. I still know that surface effects ranging from anthropogenic changes in land morphology due to farming to the UHI effect come into play. Also, while I can use a thermometer to measure the temperature of a person living in Canada and another in Egypt without worrying about location, thermometers measuring surface temperatures can't be treated like this (as if location didn't matter). Even with a simple model that only takes into account a few constants to correct for things like the UHI effect and a relativity few variable (humidity, surrounding urban density, altitude, proximity to the poles the poles or the tropics, distance above the ground, etc.), I am comparing thermometers along multiple different axes of varying scales. Things get subtle in higher dimensions. In R2 or R3, it's pretty simple in most applications to judge when points are close to one another, but when I am comparing one thermometer to another in a point in maybe 15th dimensional space in which one variable (temperature) can be -1 degree Celsius and another 36, while another variable (altitude) can be -100 meters to a few thousand, etc., small changes along enough axes can be easily missed (can't be easily plotted) and can easily render useless a number of statistical measures/methods useless.
2 ) Linear/nonlinear
Lorenz practically founded what is commonly known as chaos theory by imagining the simplest atmosphere possible. However, even a single molecule in such an idealized model turned out to exhibit be very complex. Its dynamics are nonlinear, they are governed by sudden & qualitative changes like bifurcations, unpredictable changes large and small, are far from equilibrium, etc.
Dimensionality crops up here too. Image a pile of rocks that a wave carries and deposits on the shore vs. a sand that it carries creating a sandpile onshore. Knowing information about a many dimensions can give you a pretty good approximation of the final configuration of the rocks (initial conditions, from the position of the rocks to the geometry of the ocean floor). But is the difference between knowing the final configuration of the sandpile vs. the rock pile just a matter of more sand grains than rocks? No. For the rocks, the problem is already hard enough because there are so many dimensions one needs to account for and their trajectories will not be linear. But you can make it simpler and reduce the dimensionality with some safe assumptions: grouping the rocks into size & weight ranges instead of having a value of each for each (which combines two variables into one and reduces the set cardinality), using more approximate units of position, averaging the forces of currents and reducing their scale or effect they will have, etc. Sand, however, makes this all impossible. Small changes in currents result in a completely different position. Sand will do little to impede the trajectory of a rock propelled by a water wave, but the reverse isn't true so debris has to be taken into account. And so on.
Finally, there's what your data consists of and how confident you are that you are using valid measurements of something that is a valid phenomenon? It is very difficult to model living systems. Single cells have to0 many strongly interacting parts. But at least you can look at the cell and your model and see how close they are (most of the time and to a certain extent, anyway). What if I'm interested in the effect of religiousity & political orientation on intelligence and mental health? With the exception of neurological disorders, all mental disorders are based on classifications of symptoms and intelligence on certain constructs that are supposed to be measures of whatever intelligence is. I can have a small population and a good but small sample, but how do I know if I am measuring what I think I am? I can't x-ray religiousity. I can't use an fMRI scan to determine intelligence. I can't even use neuroimaging to diagnose mental disorders. So as complicated as modelling e.g., the weather can be, if the model predicts rain tomorrow and it doesn't rain, then my model was wrong. Most importantly, I know that.