I collected more than 200000 samples, but someone said the data was not a Big data. I asked him to define the Big data, unfortunately he could not define.
It probably depends of the nature of your data and the application you are considering. I am working in Biology and the scale people consider for calling a project 'Big Data' is definitely not the same as people working in social media for instance.
One way Big Data has been characterized is using the 3Vs definition (https://www.whishworks.com/blog/big-data/understanding-the-3-vs-of-big-data-volume-velocity-and-variety):
* VOLUME: it refers to the amount of data generated.
* VELOCITY: the speed with which data are being generated.
* VARIETY: the nature of the data as all the structured and unstructured data that has the possibility of getting generated either by humans or by machines.
So if in your case, this person might say it is not Big Data because it was generated once (low velocity), 200,000 samples is 'small' compared to the current scale (millions/billions, low volume). also, could the data be represented in a large spreadsheet with little effort (low variety)?
It probably depends of the nature of your data and the application you are considering. I am working in Biology and the scale people consider for calling a project 'Big Data' is definitely not the same as people working in social media for instance.
One way Big Data has been characterized is using the 3Vs definition (https://www.whishworks.com/blog/big-data/understanding-the-3-vs-of-big-data-volume-velocity-and-variety):
* VOLUME: it refers to the amount of data generated.
* VELOCITY: the speed with which data are being generated.
* VARIETY: the nature of the data as all the structured and unstructured data that has the possibility of getting generated either by humans or by machines.
So if in your case, this person might say it is not Big Data because it was generated once (low velocity), 200,000 samples is 'small' compared to the current scale (millions/billions, low volume). also, could the data be represented in a large spreadsheet with little effort (low variety)?
I agree with Kévin Vervier. It depends on the essence and application of the data. For example, to image applications (image recognition, object detection) a dataset with 200k of samples is considered a small/medium dataset, while in other applications this number can be considered large.
Volume. Big data implies enormous volumes of data. ...
Variety. Variety refers to the many sources and types of data both structured and unstructured.
Velocity. Big Data Velocity deals with the pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc.
Veracity. Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed.
Validity. Like big data veracity is the issue of validity meaning is the data correct and accurate for the intended use.
Volatility. Big data volatility refers to how long is data valid and how long should it be stored.
Big data describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information. In addition, it is characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. Although big data doesn't equate to any specific volume of data, the term is often used to describe terabytes, petabytes and even exabytes of data captured over time.
some scientists say that any volume of data that you can’t process on your local machine is Big Data
As the data size alarmingly grow, we move from information overload to big data, because services and systems start generating data. Though there are 9 Vs scholar recommend, Big data mainly characterized by 3Vs: Volume, Velocity and Variety. Depending on the situation and application PBs & ZBs volume of data available; which is generated and available at a high speed, velocity the wide variety of structured, semi-structured and unstructured multimedia (image, video, audio, text) data.
I agree with @Kevin_Vervier I worked in banking 20 years ago and there was no such thing then as "big data". But I assure you our data was massive and high volume. I do believe it is the combination of velocity and volume and the unstructured nature of having a variety of different data sources in the same data set that makes it "big data".
@Hamad and @Carbone, HPC is used for several cases. I found a million of data could be processed with normal computer. Is HPC an essential requirement for the Big data?
No....just trying to make you aware of the magnitudes of difference in data volumes and the relative processing requirements. What some people believe is big is many times very small. There isn't a rule of thumb nor should there be. Its always about the tradeoffs of the 5/6 V's (velocity, volume etc), depending who you talk to, and the processing necessary. If you always focus on that then you are always thinking the right way. Weather models just happen to be some very, very large data types and are processed most often on HPC systems. Hope this helps.
Big data is not so much about bigness, but about its characteristics: (1) ALL rather than samples, (2) measured (x, y, z, t) rather than estimated, and (3) individual rather than aggregated. Sampling is not a legitimate concept of big data. Big data is not just a new type, but a new paradigm:
Research Big Data Is a New Paradigm
Article The Evolution of Natural Cities from the Perspective of Loca...
Big data is with large volume to be able to cross the domain boundary to indirectly discover the value information in other domain, usually presenting a big surprise for users. Originally data is designed to collect, sense,record for purpose A, but finally contributes to purpose B, through association actions. Traditionally non-big data original collection design is consistent with destination utilization.