Big data is huge in content and dimensions and graphs are computationally expensive. Is there any possibility where a big data problem could be addressed with a graph based solution by keeping the complexity manageable.
Yes. One place to start would be to look for solutions to problems using GraphLab. This was a project developed to use graphs in analysis of data sets from which a company now called Dato (date.com) was started. Dato builds tools for more than graphs, but this is one place that may lead you in the direction you are asking. In addition, there are Graph databases such as neo4j and titan where data is organized in a graph rather than tabular structure.
Yes. One place to start would be to look for solutions to problems using GraphLab. This was a project developed to use graphs in analysis of data sets from which a company now called Dato (date.com) was started. Dato builds tools for more than graphs, but this is one place that may lead you in the direction you are asking. In addition, there are Graph databases such as neo4j and titan where data is organized in a graph rather than tabular structure.
Bulk Synchronous Parallel (BSP) paradigm via Hama or pregel, Direct Acyclic Graph (DAG) paradigm via Microsoft Dryad can easily deal Big data graph based.
MATLAB now has built-in Map Reduce functionality to allow for analysis of data sets that are too big to fit in memory. Algorithms can be developed on a desktop and then executed on a Hadoop cluster. Using the MATLAB compiler developers can create applications for production Hadoop systems.
The new 64-bit version of MATLAB allows users to access more physical memory which can load larger data sets for processing. New functions in MATLAB allow the processing of data sets that are too large to fit in memory.New computing features in MATLAB allows parallel processing of data utilizing multiple CPU cores to increase compute throughput. GPU acceleration provides even higher levels of performance. The new Parallel Computing Toolbox feature in MATLAB along with the MATLAB Distributed Computing Server can process data in parallel on clusters of machines numbering in the thousands. Matrices and multidimensional arrays can be distributed across a cluster of computers. The new version of MATLAB also supports image processing using GPU and multiple core computation with the Parallel Computing Toolbox as well.
I would also suggest using a graph database. This way you are not bound to any applicaiton/program for your data analysis. Store your data as a graph and then use any language/library to analyze your data provided that you have a client in that language to that particular graph database. If you are going to use a mainstream language like Java, Python etc. for your analysis then finding a client will not be a problem. Neo4J is the more established graph database. Titan, ArangoDB and OrientDB are the promising newcomers AFAIK.
graph in two graph chart and graph theory graph chart indicates the visual data mining which could be state the descriptive statistics on behavior of date represent through graph. Other sense of Graph theory may construct graph and solve through many operation which could prove clustering and classification as well as association
No. There is big chance of getting a seemingly good conclusions that are in fact totally wrong. The question is what for you need the big-data solution. if it is meant for data management (not for drawing conclusions), it can do good, as Reshi Nawab indicated, with Map Reduce.
Current big data trends imply parallelization using, for example, the MapReduce paradigm and an implementation framework such as Hadoop or Spark. Some algorithms are harder to parallelize than others, and Spark can offer an improvement with respect to Hadoop in this aspect.
Big graph data represents a challenge, one could say another level of difficulty with respect to big tabular data. MapReduce needs to divide the data into chunks which can be processed independently and fuse the final results.
So, if we can locally demarcate in the graph what we are processing, for example by neighbourhood or community, then we can parallelize the processing.
The average path length is a global computation so we could say this is not adequate for parallelization (average of averages?)
Another option is to map the graph data into a lower dimensional space and process from there.
Ma D., Sandberg M., and Jiang B. (2015), Characterizing the heterogeneity of the OpenStreetMap data and community, ISPRS International Journal of Geo-Information, 4(2), 535-550.
We can assume nothing about the content of a big-data set, and are most likely to stay ignorant even after all the statistics and trial & error attempts. So, as I see it, graphics seems better than the zero option, as long as we trust the conclusion made with it to be not altogether wrong...
Graphs and graph theory have efficient and compelling approach to present the visual data mining so how better you have to optimist the memory and time complicity Big data can be optimize with subspace many other techniques corresponding graphs may effect the solution.