What are the basic characteristics of Big Data?

Kévin Vervier Popular answer

It probably depends of the nature of your data and the application you are considering. I am working in Biology and the scale people consider for calling a project 'Big Data' is definitely not the same as people working in social media for instance.

One way Big Data has been characterized is using the 3Vs definition (https://www.whishworks.com/blog/big-data/understanding-the-3-vs-of-big-data-volume-velocity-and-variety):

* VOLUME: it refers to the amount of data generated.

* VELOCITY: the speed with which data are being generated.

* VARIETY: the nature of the data as all the structured and unstructured data that has the possibility of getting generated either by humans or by machines.

So if in your case, this person might say it is not Big Data because it was generated once (low velocity), 200,000 samples is 'small' compared to the current scale (millions/billions, low volume). also, could the data be represented in a large spreadsheet with little effort (low variety)?

Kévin Vervier

One way Big Data has been characterized is using the 3Vs definition (https://www.whishworks.com/blog/big-data/understanding-the-3-vs-of-big-data-volume-velocity-and-variety):

* VOLUME: it refers to the amount of data generated.

* VELOCITY: the speed with which data are being generated.

* VARIETY: the nature of the data as all the structured and unstructured data that has the possibility of getting generated either by humans or by machines.

Artur Jordão

Dear Md. Shahin,

I agree with Kévin Vervier. It depends on the essence and application of the data. For example, to image applications (image recognition, object detection) a dataset with 200k of samples is considered a small/medium dataset, while in other applications this number can be considered large.

John N. Carbone

Look at weather models. They require HPC to process. .

Dr Santhosh Kumar

The following 6 V's are Important for Bigdata

Volume. Big data implies enormous volumes of data. ...
Variety. Variety refers to the many sources and types of data both structured and unstructured.
Velocity. Big Data Velocity deals with the pace at which data flows in from sources like business processes, machines, networks and human interaction with things like social media sites, mobile devices, etc.
Veracity. Big Data Veracity refers to the biases, noise and abnormality in data. Is the data that is being stored, and mined meaningful to the problem being analyzed.
Validity. Like big data veracity is the issue of validity meaning is the data correct and accurate for the intended use.
Volatility. Big data volatility refers to how long is data valid and how long should it be stored.

Safwat Helmy Hamad

Big data describes any voluminous amount of structured, semistructured and unstructured data that has the potential to be mined for information. In addition, it is characterized by 3Vs: the extreme volume of data, the wide variety of data types and the velocity at which the data must be processed. Although big data doesn't equate to any specific volume of data, the term is often used to describe terabytes, petabytes and even exabytes of data captured over time.

some scientists say that any volume of data that you can’t process on your local machine is Big Data

Million Meshesha

As the data size alarmingly grow, we move from information overload to big data, because services and systems start generating data. Though there are 9 Vs scholar recommend, Big data mainly characterized by 3Vs: Volume, Velocity and Variety. Depending on the situation and application PBs & ZBs volume of data available; which is generated and available at a high speed, velocity the wide variety of structured, semi-structured and unstructured multimedia (image, video, audio, text) data.

Gary Black

I agree with @Kevin_Vervier I worked in banking 20 years ago and there was no such thing then as "big data". But I assure you our data was massive and high volume. I do believe it is the combination of velocity and volume and the unstructured nature of having a variety of different data sources in the same data set that makes it "big data".

Md. Shahin

Thanks all of you.

@Hamad and @Carbone, HPC is used for several cases. I found a million of data could be processed with normal computer. Is HPC an essential requirement for the Big data?

John N. Carbone

No....just trying to make you aware of the magnitudes of difference in data volumes and the relative processing requirements. What some people believe is big is many times very small. There isn't a rule of thumb nor should there be. Its always about the tradeoffs of the 5/6 V's (velocity, volume etc), depending who you talk to, and the processing necessary. If you always focus on that then you are always thinking the right way. Weather models just happen to be some very, very large data types and are processed most often on HPC systems. Hope this helps.

Bin Jiang

Big data is not so much about bigness, but about its characteristics: (1) ALL rather than samples, (2) measured (x, y, z, t) rather than estimated, and (3) individual rather than aggregated. Sampling is not a legitimate concept of big data. Big data is not just a new type, but a new paradigm:

Research Big Data Is a New Paradigm

Article The Evolution of Natural Cities from the Perspective of Loca...

Md. Shahin

Thanks@Carbone for your clarification!

Tinghua Ai

Big data is with large volume to be able to cross the domain boundary to indirectly discover the value information in other domain, usually presenting a big surprise for users. Originally data is designed to collect, sense,record for purpose A, but finally contributes to purpose B, through association actions. Traditionally non-big data original collection design is consistent with destination utilization.

What are the limitations of Machine Learning to study the traffic flow?

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

Why can't academics earn the money they deserve?

Conjugation of PEG-Amine to an Amino Acid Using EDC?

How Do Project Data Analytics and AI Advance Quality 4.0 in Construction Project Management?