The key problem is that the data sets (genomic and proteomic in particular) are growing at rates faster than More's law. This is only going to get worse with the next generation of gene sequencers, and will certainly continue for at least 5 years.
I find it hard to imagine a solution allowing "true" big data when the data is growing faster than computation. Pruning of "uninteresting" data quickly will be critical. I think of some similarity with the LHC trigger system, where there is too much data generated by the particle detectors to process in detail or even store. So there is a heirarchy of "triggers" where things that seem uninteresting at first glance are discarded and possible "interesting" events/data are passed on to the next level of trigger/filter. As each level has less data to deal with than the previous, more work can be done and more detailed processing can be accomplished.
Some method of very quickly discarding parts of the data that don't matter much is the key, IMHO.
Big data presents a number of challenges relating to its complexity One challenge is how we can understand and use big data when it comes in an unstructured format, such as text or video. Another challenge is how we can capture the most important data as it happens and deliver that to the right people in real-time. A third challenge is how we can store the data, and how we can analyze and understand it given its size and our computational capacity. And there are numerous other challenges, from privacy and security to access and deployment.
The key problem is that the data sets (genomic and proteomic in particular) are growing at rates faster than More's law. This is only going to get worse with the next generation of gene sequencers, and will certainly continue for at least 5 years.
I find it hard to imagine a solution allowing "true" big data when the data is growing faster than computation. Pruning of "uninteresting" data quickly will be critical. I think of some similarity with the LHC trigger system, where there is too much data generated by the particle detectors to process in detail or even store. So there is a heirarchy of "triggers" where things that seem uninteresting at first glance are discarded and possible "interesting" events/data are passed on to the next level of trigger/filter. As each level has less data to deal with than the previous, more work can be done and more detailed processing can be accomplished.
Some method of very quickly discarding parts of the data that don't matter much is the key, IMHO.
Many issues connected with BIG DATA have been phrased - and presumably will be phrased in this thread. Most deal with computational aspects.
I would like to stress the following: Screening such data sets without a good question (or several) will only yield garbage answers. Yet, IMHO one additional compelling issue is a statistical one: The more data you screen and build models for, the more likely is a 'hit' - something associated with a paramter in your screen. Now, avoiding false positive hits, can be an huge problem. Just applying a Bonferroni threshold might be a way to go, or it might be too conservative. Essentially for every kind of data set a significance level needs to be established. And this is a very demanding task indeed.
My fear is that the BIG DATA business will avoid this and positive finding ('X' correlates with 'Y') is almost always good PR - alas regardless of its significance.
As the name implies, BIG DATA consumes large space that wouldn't fit in the memory even with thousands of machines. And if it did, communication between those machines will block the network. Hence, research in different applications are finding ways to approximate, partition or even preprocess the data offline to speedup the online process of the application they're working on. This also inspired Google to use the MapReduce paradigm for parallel processing. Hence, I think the next 5 years will be research on parallel frameworks similar to MapReduce, parallel partitioning, clever techniques for preprocessing and incremental computing. Incremental computing beging the most targeted for research as it is not fully supported in the open source MapReduce project, Hadoop.
Big Data is somewhat of a misnomer, and has many definitions. In general, data-related challenges are different for different researchers, and priorities are different. Unstructured data do not have to be large in volume to pose a range of challenges, already mentioned here. In the end of the day, they will be distilled into structured data suitable for analysis or other processing. Well-structured data have well-defined challenges of storage, preservation (of data, methods and algorithms), interoperability, IPR, security, privacy and so on. All of them need to be solved, sooner the better.
Cataloguing and indexing information in a smart and efficient way, which may be expensive- you need experts in those respective fields to do it right and do it efficiently.
Just to clarify things, I don't think that Big Data is just a synonym for Data Mining. Data Mining is just one aspect of the problem, and in most cases involves statistical studies, and correlation methods. But I think there are much things behind Big Data such like storage problem (very often linked to network problems), dynamic flow of data (many small individual data, but a massive flow), etc...
I personally think that connected objects (the famous Internet of Things), like smart-watches, and every smart thing that we will be full of in a near future, represents a potential source for Big Data generation. Thus, I think that there are a lot of research opportunities in massive dynamic flows of small data.
i would like to contribute with this link about the question that James R. Parker made: if big data is a marketing product or something really new.
Personally, i agree with Xavier Bonnaire, Big Data involve more than only Data Mining, it's also analyse other aspects such as storage and process capabilities.
But as Hassan Abedi said, there is no solid definition about it.
Big Data has following three Concerns according to Dzone:
Data Privacy
There is no second thought on the benefits due to Big Data powered apps and services, but these benefits can cost out privacy. Companies should address this concern in their Projects. How could we forget the famous Facebook data leakage case.
2. Security of Data
Everytime you click on the “I Agree” while you log in to an app or website, you give several permission to them to access our devices and accounts. Can we trust these companies to keep our data safe? Companies have already struggled with the security of data even before the increment in the data complexity.
3. Data Discrimination
Along with helping businesses to become a better marketer and service provider, big data also let them discriminate the data. What if with all the insights from the big data make it more difficult for certain population to find certain information the need. This was the question raised by Federal Trade Commission: Big Data: A Tool for Inclusion or Exclusion?.
Challenges with Big Data:
Data Recognition
Discover techniques to search specific data that can help you
Modeling and stimulation to model the problems that can solved by Big Data.
Effective ideas to analyze and visualize the outcomes of Big Data.
Storage, streaming and processing of Big Data.
There are many sub-problems beneath the problem. Technology is coming up with the solutions. Recognisance of Big Data as problem was a solution in itself.
Big Data has also come as a solution and has many applications in various sectors. You can go through them here - https://data-flair.training/blogs/big-data-applications-various-domains/
To tackle the increasing challenges of agricultural production, the complex agricultural ecosystems need to be better understood. This can happen by means of modern digital technologies that monitor continuously the large quantities of data in an unprecedented pace. The analysis of this big data would enable farmers and companies to extract value from it, improving their productivity.
I asked a question about this subject and there are some answers that may help you in this specific field.