I think your question should be: what are differences between static and dynamic data?
In the case with static data, regardless its amount, your prediction or classification model is also static, i.e. does not change over time.
In the case with dynamic data, there might be concept drift, so classification and prediction model must be changed accordingly. To read more about this topic you can Google for "concept drift tutorial" or "concept drift detection". Most of the works in that topic deal with the dynamicity of data as such data stream.
I think your question should be: what are differences between static and dynamic data?
In the case with static data, regardless its amount, your prediction or classification model is also static, i.e. does not change over time.
In the case with dynamic data, there might be concept drift, so classification and prediction model must be changed accordingly. To read more about this topic you can Google for "concept drift tutorial" or "concept drift detection". Most of the works in that topic deal with the dynamicity of data as such data stream.
A data stream can be static, and a data set can be dynamic. Large data sets are usually sourced from data streams. Therefore, they could evolve over time, but in a more stable manner compared to data streams.
1. The main feature of data streams is that you "cannot" access all the data, because it is flowing gradually over time. On the other hand, in a data set your access to the whole data is granted (at least theoretically).
2. The second difference is "unstable" nature of data streams. The concepts and clusters of the stream could drift or completely change over time.
The biggest issue that is enforced on data streams is the fact that one can read the data only once and even then, a part of the data (called a "window") is visible at any instant. Large data simply talks about the volume of the data, and no such restrictions are there.
The single-read restriction on the data streams is practically true. However, it is not a property of data streams, but more a limit of current real-time analytic methods. Although the real-time processing requirement usually forces the analyzers to compromise some data, smart analyzers can implement re-readable closed-loop internal data streams to revisit interesting windows of the main data stream.
If you would like to do an experiment (data mining, opinion mining, Sentiment Knowledge Discovery ... etc, I think you need to choose a specific corpus otherwise if you interesting in developing an algorithm for data stream model such as Twitter; it's data follows the data stream model. In this model, data arrive at high speed, and data mining algorithms must be able to predict in real time and under strict constraints of space and time. I think all professional above have highlited this area according to their views and hope you will find your missing answer :)
Concept drift / covariate shift are intrinsically linked to the sequential nature of the data stream
from a data set, you are free to choose as many access rules to create as many different data streams (possibly with a repeated access to the same data etc) ; each of these streams may or may not exhibit concept drift or covariate shift : if this is the case, this is not a property of the data set itself but a property of the chosen access rule (random access is not supposed to create such a driifing-shifting stream, but sequential access might create one, depending on how the data were stored)
Data Streams are transient and have a time component associated with them. Concept drift will occur in data streams. Incremental learner algorithms with sliding window techniques may be applied to handle concept drift.
Three main differences: (1) data streams usually have some time structure whereas large data sets don't have to. (2) when you consider data streams you have to consider effects like changes in the underlying stationarity, when dealing with large data sets you are usualyl interested in the global statistics. (3) Data streams are potentially infinite and you have to consider how to deal with that.
Of course, a large sample from a data stream automatically becomes a large data set.