What are differences between large data set and data stream?

More Mahdieh Najafy's questions See All

which one the evaluation methods apply for data stream?

I'm looking for Evaluation methods for data stream. I want Evaluated performance of Decision tree algorithm for Classification data stream.

11 December 2012 9,457 1 View

Do the records change in data stream ?

Do the records change in data stream or just the value of their feature change and we should guess that what is the new class label of the previous records?

10 November 2012 3,427 0 View

What are pros and cons of decision tree versus other classifier as KNN,SVM,NN?

I have to explain advantage and disadvantage of decision tree versus other classifier

10 November 2012 4,936 10 View

How does change in distribution of data effects on performance of k-Nearest Neighbors?

I want to find effectiveness change of distribution of data on performance k-Nearest Neighbors. performance : Accuracy or Error or ect.

09 October 2012 5,217 0 View

What are different kinds of lazy classifier?

What are type of lazy classifier?

09 October 2012 8,261 8 View

Feedback defines the constitution of an organism?

“Here is a thought experiment. Let's place Rodolpho Llinas's jarred-brain on top of a body (Fig. 1). I bet Llinas would argue that his jarred-brain retains its own consciousness, and the android...

11 August 2024 2,483 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Self-Organizing Superorganisms—as envisaged by Nenad Sestan (2018)?

The rate of glucose consumption by the neocortex is reduced by over 80% during anesthesia (Sibson et al. 1998), which disables the synapses (Richards 2002) that are inundated by glial tissue (Engl...

08 August 2024 3,118 0 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

Measuring the Intelligence of a Species?

Larger brains, which typically contain more neurons, store and transfer more information (Tehovnik and Chen 2015), but the precise relationship between number of neurons and information has yet to...

05 August 2024 1,238 2 View

How can i do multivariate Time Series forecast using MLP, ANFIS and LSTM?

I need the python code to forecast what crop production will be in the next decade considering climate and crop production variables as seen in the attached.csv file.

05 August 2024 2,977 3 View

The Curse of Evolution and Complexity?

Brain and body mass together are positively correlated with lifespan (Hofman 1993). The duration of neural development is one of the best predictors of brain size, and conception is the best...

05 August 2024 6,247 3 View

Hoang Thanh Lam Popular answer

I think your question should be: what are differences between static and dynamic data?

In the case with static data, regardless its amount, your prediction or classification model is also static, i.e. does not change over time.

In the case with dynamic data, there might be concept drift, so classification and prediction model must be changed accordingly. To read more about this topic you can Google for "concept drift tutorial" or "concept drift detection". Most of the works in that topic deal with the dynamicity of data as such data stream.

João Mendes Moreira

A data stream is characterized by the speed the data arrives. Data streams algorithms should be able to process the data as long data arrives.

Large data concerns the amount of data to be processed. Not the speed the data arrives.

Mahdieh Najafy

Thank you.

But one question exist:

concept change is occur in data stream and can not occur in large data set.

Why and how occur concept change in data stream?

I'm looking for difference between data stream and large data set Based on the occurrence concept change.

Hoang Thanh Lam

Reza Farrahi Moghaddam

A data stream can be static, and a data set can be dynamic. Large data sets are usually sourced from data streams. Therefore, they could evolve over time, but in a more stable manner compared to data streams.

1. The main feature of data streams is that you "cannot" access all the data, because it is flowing gradually over time. On the other hand, in a data set your access to the whole data is granted (at least theoretically).

2. The second difference is "unstable" nature of data streams. The concepts and clusters of the stream could drift or completely change over time.

Read also more about online machine learning.

Arnab Bhattacharya

The biggest issue that is enforced on data streams is the fact that one can read the data only once and even then, a part of the data (called a "window") is visible at any instant. Large data simply talks about the volume of the data, and no such restrictions are there.

The single-read restriction on the data streams is practically true. However, it is not a property of data streams, but more a limit of current real-time analytic methods. Although the real-time processing requirement usually forces the analyzers to compromise some data, smart analyzers can implement re-readable closed-loop internal data streams to revisit interesting windows of the main data stream.

Muna Alsallal

If you would like to do an experiment (data mining, opinion mining, Sentiment Knowledge Discovery ... etc, I think you need to choose a specific corpus otherwise if you interesting in developing an algorithm for data stream model such as Twitter; it's data follows the data stream model. In this model, data arrive at high speed, and data mining algorithms must be able to predict in real time and under strict constraints of space and time. I think all professional above have highlited this area according to their views and hope you will find your missing answer :)

Fabrice Clerot

Concept drift / covariate shift are intrinsically linked to the sequential nature of the data stream

from a data set, you are free to choose as many access rules to create as many different data streams (possibly with a repeated access to the same data etc) ; each of these streams may or may not exhibit concept drift or covariate shift : if this is the case, this is not a property of the data set itself but a property of the chosen access rule (random access is not supposed to create such a driifing-shifting stream, but sequential access might create one, depending on how the data were stored)

Akila Gopu

Data Streams are transient and have a time component associated with them. Concept drift will occur in data streams. Incremental learner algorithms with sliding window techniques may be applied to handle concept drift.

Farid Kadri

http://www.quora.com/Data-Mining/What-are-differences-between-large-data-set-and-data-stream

Three main differences: (1) data streams usually have some time structure whereas large data sets don't have to. (2) when you consider data streams you have to consider effects like changes in the underlying stationarity, when dealing with large data sets you are usualyl interested in the global statistics. (3) Data streams are potentially infinite and you have to consider how to deal with that.

Of course, a large sample from a data stream automatically becomes a large data set.