Big Data can be successfully store, manipulated and processing using relational databases servers.
In Oracle there some special object oriented packages as Multimedia, XML, Spatial, Topology. The first one is not very strong. Additionally it is possible to create user defined types, which can be representation of complex objects. It is possible to use PL/SQL or Java Stored Procedures (JSP). When we using Java classes, it is possible to obtain very strong tool to processing objects on the database server side. Example of such approach You can find in Query by Voice Example and sound similarity based on the Dynamic Time Warping algorithm (http://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-article-BPOM-0030-0003)
Or Implementation of MFCC vector generation in classification context (http://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-article-LOD8-0002-0011).
Of course there are many publication of other authors.
In MSSQL object representation is built in only for Spatial and XML data. But exists very strong mechanism of User defined Types using CLR (Common Language Runtime). Object can be build using any of language of .NET platform. I built some objects of such types for example to represent net connections (social networks). Using User Defined Aggregates I built analytical (statistical) library. Unfortunately all of my works descriptions are in Polish. In all cases I obtain very high efficiency. Many useful informations You can find in book Professional SQL Server 2005 CLR Programming (http://shop.oreilly.com/product/9780470054031.do).
I believe that both database servers gives enough strong tools which gives programmer and developer possibility to obtain any functionality necessary to store and process Big Data.
Conventional databases don't seem appropriate to handle big data. Special approaches for optimisation of storage are necessary. But do you really mean big data or semantic data? You try to compare solutions on two different levels.
There is always structure behind big data. Of course, it is not so easy to define or homogeneous like by RDBMS. It is more loose and we should rather think about graphs. One of the possible ways of storage optimisation is to discover structures in big data (here, similar graph structures) and hold them close together so that it may easily be accessed by queries.
I think we should construct a bridge between big data, represented and manipulated by conventional relational databases, and emerging semantic information evoked by big data. New semantic information generation processes should be supported by cognitive knowledge bases.
Structure behind big data is implicit and may vary in relation with knowledge discovery.
RDBMS schema assume that structure is explicit. I think that only a multiparadigm approach mixing NoSQL DB, Graph DB, RDBMS (for master datai) can handle the complexity of big data.
Big Data can be successfully store, manipulated and processing using relational databases servers.
In Oracle there some special object oriented packages as Multimedia, XML, Spatial, Topology. The first one is not very strong. Additionally it is possible to create user defined types, which can be representation of complex objects. It is possible to use PL/SQL or Java Stored Procedures (JSP). When we using Java classes, it is possible to obtain very strong tool to processing objects on the database server side. Example of such approach You can find in Query by Voice Example and sound similarity based on the Dynamic Time Warping algorithm (http://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-article-BPOM-0030-0003)
Or Implementation of MFCC vector generation in classification context (http://yadda.icm.edu.pl/baztech/element/bwmeta1.element.baztech-article-LOD8-0002-0011).
Of course there are many publication of other authors.
In MSSQL object representation is built in only for Spatial and XML data. But exists very strong mechanism of User defined Types using CLR (Common Language Runtime). Object can be build using any of language of .NET platform. I built some objects of such types for example to represent net connections (social networks). Using User Defined Aggregates I built analytical (statistical) library. Unfortunately all of my works descriptions are in Polish. In all cases I obtain very high efficiency. Many useful informations You can find in book Professional SQL Server 2005 CLR Programming (http://shop.oreilly.com/product/9780470054031.do).
I believe that both database servers gives enough strong tools which gives programmer and developer possibility to obtain any functionality necessary to store and process Big Data.
I think in the near future, you can not entirely avoid the use of relational databases to represent and manipulate big data. Such databases are so common that we have to imagine hybrid approaches at least to integrate big data outputs for legacy information systems.
I think this is a great question that should be addressed outside the scope of SQL / NoSQL debate. What do we store in a database? How do we organized information? Should we compromise accuracy in exchange for speed and storage of high level features (i.e., semantics)? Should we keep data enclosed in rigid boxes or should we let it self-organize into meaningful connections that we could not envisage apriori? Coming from a background on Artificial Neural Networks, I think these are really fascinating questions that we should start to get our heads into and start look at data as something with its own life instead of dead bits stored ad infinitum in isolated boxes.
@Armando I do agree - self organizing data!! Even if only partially realized this already would be a big breakthrough = the amounts of non - a priori evident information, esp. in genomics, but not necessarily limited to such areas only, are immense !
Big Data is a term that is itself used for a very huge amount of data. The data that needs to be processed is increasing many folds every year and our relational databases will not be able to go hand in hand with the increasing data needs. Work is going on to make relational databases distributed but we have to sacrifice at-least one of the ACID properties of relational databases to process Big Data. As we are moving towards processing huge data the technology that we were using for many decades will not suffice.
The relational databases will coexist in future but will have different use cases than Big Data Databases.
Since noone stated the traditional algorithmic/datastructure answer, here it comes:
You need to know what functions you want to perform on top of the data, because this affects how the data is structures. In case there are conflict, you must choose to optimize the data structure according the function you prioritize (cf. readers/writers of shared object in concurrent programming). Additionally, there are pragmatics such as being able to sacrifice ACID properties (as suggested to be needed by @Saurabh Singh), given that the decision process based on the functions applied to data can handle it (there are lot of examples where this is possible, cf. naming services, DNS, yellow pages and consider eventual consistency or cache consistency). The size of the data is a pragmatic requirement imposing particular problems on the functions and the data structures, that all.
If the functions,data structure requirements and pragmatic requirements (in particular, the size) suites a conventional relational database, then yes. Otherwise, no. Given this, I would like to rephrase the question to: "What functionality in what application domains where we have big data is suitable to conventional relational databases?". The application domain has the requirements of the data structures as well as the pragmatic requirements in that particular domain. Compare any pair of the following to get an idea: airline ticket booking, stock market exchange, banking, drive-by-wire, surveillance, support for elderly). From my point of view, the question is too general and will result in a lot of maybe yes and maybe no.
For example, if you optimize the indexing technique for a database there you never update any existing tuples, only add new tuples, then you can get queries that scale better (I think it was from O(N log N) to O(N), but I might be wrong but less overhead per computational step) than traditional conventional relational databases. However, if you cannot meet the pragmatic requirement of this indexing technique that requires no updates to existing tuples, then yoou cannot use this techniques (which is very useful to certain kinds of Big Data). If anyone is interested, I can dig up the reference (end of 90's if I do no misremember).
Further, I am unsure of how much weight the pragmatic requirement of the size (and complexity) of Big Data (I dislike this concept!) is compared to other pragmatic requirements of a specific application domain.
@Jonas and @Armando, I agree with your point of view. The origin of any decision in any particular project is in the nature of user requirements and semantics being modeled in a problem domain, where we may always take care early in advance about the unity of data structures and functions being performed over them.
Let me note that not only physical organization factors are important. It is not just a technical dilemma about Relational DBMS vs. NoSQL system. Let us remind solid mathematical background of relational data model. By means of the satisfaction of normal form conditions, particularly very restrictive 1NF in regard to the dilemma RDBMS vs. NoSQL, and then 2NF to BCNF, we try to overcome not physical but also evident logical problems of data structuring. Some important issues are how problem domain semantics reflects the problem of update anomalies if we decide deliberately to violate some normal form condition. May update anomalies appear in practice? If may, is it tolerable to allow them? What would be about data consistency, in two aspects - formal or just semantic violation of data constraints? Not all paradigms, embedded in various data models, give the same answers to these issues.
@Ivan, I agree, This is what I included in the " functions,data structure requirements and pragmatic requirements (in particular, the size) suites a conventional relational database" part of my argument. Personally, the answer to your questions vary with the application domain.
guys, my opinion on the original question is YES also, but the big data has a big volume and variety according the definition. It means that many cases we will use very bad structured tables and not enough scalability of warehouse. so we will need to use distributed storing and processing and unstructured data oriented data model. In this case the Hadoop like solutions will be preferable
@Vladimir, exactly they are badly structured but this is for the reason - namely, much more is still not known about them more than some than some small snippets of gleaned already information (at least in genomics this is the norm) - hence, a self-organizing data would be a very big break-through :) IFF/when realized
First one is Big Data definition. In my opinion it is better to talk about semi-structural data or data with imprecise structure definition. In many cases Big Data are really big (have large size – graphics, audio, video, DNA). Big Data is not the same as Large Volume of Data. There no problem when we have small set of such elements. It is possible to store them in file system and process them nearly manually. If number of data of BigData type increase we have to use additional tools to organize them. In my opinion, commercial Relational Databases are the only one choice. When the volume still increase we can organize DB servers in network called cluster or grid (more complex and stronger than Hadoop). Such organization offer us increase of processing power (multiply storage, parallel processing, parallel I/O operation etc.) but from user (operator) side they look like single node. It is practical realization of so called cloud computing on DB server level. Concurrence: NoSQL DB still in progress not such strong; Object-oriented DB not exists in commerce and still can’t store large volumes. When we add possibility of creation of object oriented user defined data types in commercial DB servers we have all we need to store and process data of BigData type.
Now we not think about RDB such restrict way. We not “keep data enclosed in rigid boxes” because they are Relational-Object oriented or object extended databases. Possibilities of self-description of data store as objects are really very wide. Automation of such process due to DB object triggers are as well versatile. But because I’m not believe in strong artificial intelligence, creation of self-organized database server is in my opinion impossible. Additionally I’m not sure if I would like to have such tool, where I have no control what going on inside.
I’m not understand @Ivan problem with 1NF restrictions. If we treat object store in the table field as atomic, problem with normalization disappear. The same way we can think about nested tables. If we try to think other way it is impossible to create any Object-oriented Relational Database Server as ORACLE.
"But because I’m not believe in strong artificial intelligence, creation of self-organized database server is in my opinion impossible. Additionally I’m not sure if I would like to have such tool, where I have no control what going on inside".
According to what you say, one can only have as much control as far their understanding is reaching. Fine! Yet we do not understand full implications of the DNA information stored even in a genome of the smallest bacteria: M. genitalium, or even that of the flu virus. Nonetheless, those self-contained Big Data repositories can self-organize themselves into viable, operational entities.
We have tried to build the data self-organization storage for large computer clusters using data similarity. Look at http://link.springer.com/chapter/10.1007/978-3-642-32153-5_10
http://www.sciencedirect.com/science/article/pii/S0306437913001300. I can send a copy .
@Vladimir yes please, esp. this approach seems to be quite intriguing: "Scalable Distributed Algorithm for Approximate Nearest Neighbor Search Problem in High Dimensional General Metric Spaces" :) !!
The question sound to me like: Have traffic signs an inpact on the engineerings of car engines...
Because big data is just a synonym for huge masses of available data and it's complexity, especially concerning linked open data. Big data has so far no real good definition and their is also no specification that regards storage possibility.
However, big data has no real causal relation to the storage engine, but in my opinion relational databases are the most esablished and one of the fastest processing and storing technologies. You just the generate a well database schema, which allows further extensions etc. The most advantage on relational databases is the possibility to optimize it with indeces. I know that there are a couple of solutions in interplay with hadoop that are very fast too, but in perspective to maintainability, scaleability, data safetiness and may more, the good old relational databases are the most appropriate solutions I would recommend. Also in regards in making complex querying it is the best solution. Yes, Solar and other techniques are sometimes a little faster (we mostly talk about miliseconds), but often this depence strongly on how you use and configured solar and Co. That makes it not so suiteable as fundament for ongoing (research) activities on the data, where the use case and scenario can quickly change.
Data sets are considered to be BIG Data when the conventional storage technology cannot handle (capture, curation, storage, search, sharing, transfer, analysis, and visualization). Despite the advances in DB technology the BIG DATA size is growing and the applications are getting wider. When we talk about Big data the concern is not only the ability to store but goes beyond that to be able to filter, search, analyse and visualise these data sets.
Moreover, we should also think about dynamic data which their relations change over time and relational databases (pre-defined structures and relations ) wont help to handle . Semantics emerge overtime and based on the way we look (analyze) to these data.