I think Big Data occurs when someone needs to select the most relevant experimental data, given an initial set of data, for example via the ABC curves method; or alternatively perform a statistical treatment of such data. In this context, the concept of Big Data is relative, it depends on the type of data set and researcher. For me, a set larger than 100,000 data is Big Data.
It is a relative concept. However, it can be generally defined as a collection of data that we can not (or find difficulties to) deal manually. Numerically, a set of big data is not less than one hundred data
I propose an non-academic industry-grade definition: Big data D(M,C1,C2,T) is an M-dimensional data that is or will be processed by a computer with capacity C1, interpreted by a specialist with competency C2, over a period of time T, such that C1 > q.C2, where q>1 is called the “sanity” factor. Qualitatively, if the data will require more than q people to analyze and get insights out of it, then it is already considered as big data and just let the computer practically handle it.
Thank you for your answers, my comments are as follow.
1. Dear Professor James F Peters, your recommended paper is important, but is not related to "Mathematical Definition of Big Data". 20 years ago I published a book on essentials of Discrete Mathematics. I hope to give a definition of big data like definition of group. I also think the definition of data warehouse, which has mathematical flavor. However, data mining has not Mathematical definition, machine learning has not Mathematical definition.
2. Dear Paulo Laerte Natti
" a set larger than 100,000 data is Big Data" is not a Mathematical definition. I have considered your idea in my
[1]. Sun Z, Wang PP (2017) A Mathematical Foundation of Big Data. Journal of New Mathematics and Natural Computation. 13(2): 83-99. DOI: 10.1142/S1793005717400014
3. Tareq Al-shami
"A collection of data that we cannot (or find difficulties to) deal manually. Numerically" has a little mathematical flavor, but it is not Mathematical definition.
For "A set of big data is not less than one hundred data" please also see Sun Z, Wang PP (2017), mentioned above.
4 Dear Santhosh Kumar Balan,
Thank you for your papers, I will download each of them for reading
5. Dear Emmanuel Gonzalez
Your non-academic industry-grade definition: Big data D(M,C1,C2,T) is an M-dimensional data that is or will be processed by a computer with capacity C1, interpreted by a specialist with competency C2, over a period of time T, such that C1 can be changed to
Big data D(C1,C2,T V) is data that is processed by an entity E1 with capacity C1, interpreted by an Entity 2 with competency C2, over a period of time T, such that data can be changed into big value V. where entity is a set of humans, computing machineries, robots, intelligent agents.
This is a market-oriented definition, or pragmatic definition.
For your definition, M-dimensional data should be replaced as data, have a look at topology, distance is nothing there, therefore, M-dimensional is too specific. "interpreted by a specialist", is replaced by entity, it is better.
a computer is replaced by entity. Similarly, in German, das Sein, is the most general concept in philosophy.
Thank you for your provided three papers. It seems that they have drawn attention to mathematics behind big data. But they have not addressed what i have asked. Even so, I am happy because I have found a better explanation of Google Page Rank using matrix and graph theory from one of the three provided papers or presentation slides. It deserve me to think it and develop it.
Big data is a term used to refer to data sets that are too large or complex for traditional data-processing application software to adequately deal with. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate
thank you for your comments. To my knowledge, there does not exist any
Mathematical Definition of Big Data in the world. I have paid attention to it since 2012, also see my "a mathematical foundation of big data". No mathematicians nor computer scientists have introduced a Mathematical Definition of Big Data. This is the reason why I raised this question.
At the moment, I like to use 10 Bigs (see Big data with ten big characteristics) as data elements to propose a Mathematical Definition of Big Data.
This Mathematical definition is based on integration of calculus, logic and alegbra. For example, partly, a Mathematical Definition of Big Data is based on the definition of group in algebra, and definition of limit in calculus. Only in such a way is it a real a Mathematical Definition of Big Data.