Can we have a graph based solution for big data?

Benjamin James Keller Popular answer

Yes. One place to start would be to look for solutions to problems using GraphLab. This was a project developed to use graphs in analysis of data sets from which a company now called Dato (date.com) was started. Dato builds tools for more than graphs, but this is one place that may lead you in the direction you are asking. In addition, there are Graph databases such as neo4j and titan where data is organized in a graph rather than tabular structure.

Benjamin James Keller

Aftab A. Chandio

Bulk Synchronous Parallel (BSP) paradigm via Hama or pregel, Direct Acyclic Graph (DAG) paradigm via Microsoft Dryad can easily deal Big data graph based.

Shafagat Mahmudova

Dear Taimoor Khan,

Look the link. May be useful.

Regards, Shafagat

http://www.informationweek.com/big-data/big-data-analytics/graph-analytics-the-other-big-data/d/d-id/1109724?

Reshi Nawab

There is possibilty of getting answer here .

MATLAB now has built-in Map Reduce functionality to allow for analysis of data sets that are too big to fit in memory. Algorithms can be developed on a desktop and then executed on a Hadoop cluster. Using the MATLAB compiler developers can create applications for production Hadoop systems.

The new 64-bit version of MATLAB allows users to access more physical memory which can load larger data sets for processing. New functions in MATLAB allow the processing of data sets that are too large to fit in memory.New computing features in MATLAB allows parallel processing of data utilizing multiple CPU cores to increase compute throughput. GPU acceleration provides even higher levels of performance. The new Parallel Computing Toolbox feature in MATLAB along with the MATLAB Distributed Computing Server can process data in parallel on clusters of machines numbering in the thousands. Matrices and multidimensional arrays can be distributed across a cluster of computers. The new version of MATLAB also supports image processing using GPU and multiple core computation with the Parallel Computing Toolbox as well.

You can also check

http://www.mathworks.com/videos/tackling-big-data-with-matlab-97711.html?s_iid=disc_rw_ml_bod

İzzet Pembeci

I would also suggest using a graph database. This way you are not bound to any applicaiton/program for your data analysis. Store your data as a graph and then use any language/library to analyze your data provided that you have a client in that language to that particular graph database. If you are going to use a mainstream language like Java, Python etc. for your analysis then finding a client will not be a problem. Neo4J is the more established graph database. Titan, ArangoDB and OrientDB are the promising newcomers AFAIK.

http://en.wikipedia.org/wiki/Graph_database

Dr.S S Patil

graph in two graph chart and graph theory graph chart indicates the visual data mining which could be state the descriptive statistics on behavior of date represent through graph. Other sense of Graph theory may construct graph and solve through many operation which could prove clustering and classification as well as association

Edith Ohri

No. There is big chance of getting a seemingly good conclusions that are in fact totally wrong. The question is what for you need the big-data solution. if it is meant for data management (not for drawing conclusions), it can do good, as Reshi Nawab indicated, with Map Reduce.

Milan Kabac

Hi Taimoor,

Take a look at Pregel, Apache Giraph and Microsoft Dryad.

http://googleresearch.blogspot.fr/2009/06/large-scale-graph-computing-at-google.html

http://fr.slideshare.net/shatteredNirvana/pregel-a-system-for-largescale-graph-processing

http://dl.acm.org/citation.cfm?id=1807167.1807184

http://giraph.apache.org/

http://research.microsoft.com/en-us/projects/dryad/

Good luck !

David F. Nettleton

Hi Taimoor,

Current big data trends imply parallelization using, for example, the MapReduce paradigm and an implementation framework such as Hadoop or Spark. Some algorithms are harder to parallelize than others, and Spark can offer an improvement with respect to Hadoop in this aspect.

Big graph data represents a challenge, one could say another level of difficulty with respect to big tabular data. MapReduce needs to divide the data into chunks which can be processed independently and fuse the final results.

So, if we can locally demarcate in the graph what we are processing, for example by neighbourhood or community, then we can parallelize the processing.

The average path length is a global computation so we could say this is not adequate for parallelization (average of averages?)

Another option is to map the graph data into a lower dimensional space and process from there.

Regards, David Nettleton.

Bin Jiang

See our work where graph is used:

Ma D., Sandberg M., and Jiang B. (2015), Characterizing the heterogeneity of the OpenStreetMap data and community, ISPRS International Journal of Geo-Information, 4(2), 535-550.

https://www.researchgate.net/publication/283017967_Big_Data_Is_not_just_a_New_Type_but_a_New_Paradigm

Research Big Data Is a New Paradigm

Edith Ohri

We can assume nothing about the content of a big-data set, and are most likely to stay ignorant even after all the statistics and trial & error attempts. So, as I see it, graphics seems better than the zero option, as long as we trust the conclusion made with it to be not altogether wrong...

Dr.S S Patil

Graphs and graph theory have efficient and compelling approach to present the visual data mining so how better you have to optimist the memory and time complicity Big data can be optimize with subspace many other techniques corresponding graphs may effect the solution.

How to learn more about SPSS and its Application?

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

Do you know best mines of western part of Afghanistan?

Is Galaxy.org good to use for research for analyzing data and for publication?

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

How can I interpret the data without the need of solving it manually?

Why can't academics earn the money they deserve?

Conjugation of PEG-Amine to an Amino Acid Using EDC?