Which is best software for mining BIG data?

More Gajendra Pal Singh Raghava's questions See All

How to run python based apps contain mechanize?

We are developing android-apps using python-kivy. We developed an app compile it using buildozer and run successfully on android device. We fail when we used mechanize module in our app.

02 March 2016 8,338 2 View

What is best web site for computational resources on Zika Virus?

Presently, Zika Virus is one of the major challenges for mankind. It is important to discover drug, vaccine and biomakers for this disease as soon as possible, in order to prevent its spread. Thus...

01 February 2016 3,790 12 View

May I know your opinion on GPSR package?

We are planning to develop update version of our GPSR package (A resource for genomics, proteomics and system biology). In this package we included most of PERL routine required in bioinformatics....

10 November 2015 2,884 4 View

How fourth paradigm is relevant to your research?

I wants to know need of fourth paradigm in different research fields. In this article, I mentioned how it is important in our area of research. I hope, we are getting big data in other fields...

05 June 2015 3,018 1 View

Is any gene responsible for boredom?

It has been observed that some people feel more bore than others. I have seen some kids who frequently complains that they are feeling bore. I wants to understand what is scientific reason behind...

31 December 2014 936 0 View

How to improve visibility of my research?

This is an important for a researcher that his work is read by users. Publishing paper in a good/popular journal (high impact factor) is first step in research. Next step is to make it available...

10 November 2014 2,092 16 View

Can you provide information on FDA approved peptide based drugs?

Dear Colleagues, we are interested in peptide based drugs approved by FDA so far. We got brief information about these drugs from FDA site, we need more information like amino acid sequence, drug...

09 October 2014 2,770 7 View

Do you think this database for experimentally validated antiparasitic peptide sequences and their structures will be useful?

May I know your comment on this database?

07 August 2014 2,129 2 View

How can researchers contribute to combat the Ebola virus?

As all of us know that Ebola virus is one of the major challenge for researchers community. I hope number of researchers are already working on this dread virus. As well as number of researchers...

07 August 2014 9,448 8 View

Why bioinformatics is still unrecognized?

In last two decades their is tremendous growth in the field of bioinformatics, a large number of databases, software, web servers and papers has been published. Today, it is impossible to handle...

05 June 2014 6,601 9 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

GC-MS retention index prediticon?

Hello experts, Does anyone know any free software about retention index prediction ?

08 August 2024 7,403 2 View

Do you know best mines of western part of Afghanistan?

I want to know more about Mn deposits in west of Afghanistan.

07 August 2024 3,427 1 View

Is Galaxy.org good to use for research for analyzing data and for publication?

Hello all, I wanted to know, can I use galaxy (USA, Europe or Australia) platform for analyzing the shotgun data, and can it be used for publication purpose as well? Thanks :)

06 August 2024 6,610 4 View

Do experts have journals in the field of artificial intelligence and big data that are not indexed by SCI or EI?

05 August 2024 8,836 2 View

What are possible strategies can be used to analyze data under sequential explanatory mixed method approach?

Better ways to analyze the qualitative and quantitative data in a sequential explanatory mixed method approaches

04 August 2024 2,703 6 View

How can I interpret the data without the need of solving it manually?

How can I interpret the data gathered without solving?

03 August 2024 9,054 3 View

Can anyone provide me with molecular docking softwares/ websites?

Molecular docking software/ websites?

02 August 2024 8,704 7 View

Philipp Upravitelev

of couse, you should use R with the data.table package (for large datasets)

Reisel González Pérez

Dear Mr Gajendra, in my humble opinion... there is no best software (advantages and disadvantages). Sometimes the combination of independent functions from various softwares provides a better result. I suggest to you to take a look on the following softwares:

Weka (http://www.cs.waikato.ac.nz/ml/weka/)
R (http://cran.r-project.org/)
Orange (http://orange.biolab.si/)
Knime (https://www.knime.org/)
RapidMiner Community Edition (https://rapidminer.com)

Best regards.

Azian Azamimi Abdullah

R, Python or Java. I prefer R for the data analysis. It is open source software.

Anurag Chaturvedi

I think the choice solely depend upon the type of data. For example, if you have data from next generation sequencing machines then for instance python may help. if data is of metabolomics or networks, graph databases such as Neo4j. R is definitely good option for medium size dataset but can only used after preprocessing data coming from any throughput technology. If you want to model data and use machine learning approaches then WEKA is the most preferred choice. You may also have a look at HDF5 (http://www.hdfgroup.org/HDF5/).

Gajendra Pal Singh Raghava

Thanks for suggestions, our group is already using R, Weka, , SVMlight, SNNS, RAPID miner for developing prediction methods. One of the major problem is speed, if I build model on large number of patterns than techniques like SVM take huge time. This BIG data is new terminology, we heard recently. I am interested whether their are software tools that can mine large data in reasonable time. I means is their any tool specifically designed to develop model on huge data. For example Hadoop is specially developed for managing BIG data.

Muddsair Sharif

i have experience since two years with big dataset for analysis and i would like to recomend R for it!

Ahmed Allali

recommended program open source for big data is LIBSVM Accelerated with GPU using the CUDA Framework

Keerthi, C. M.

R is the best and user friendly software...

David Arroyo

H2O is another good option:

http://0xdata.com/h2o/

David Ashbrook

I would suggest R as well, since it has a range of different packages and a good community behind it.

Jayaraman Thangappan

I guess with the help Cloudera ...CDH3 software list Hadoop would be more useful.

Martin Lurie

Here are a number of research papers at NIH on Hadoop in bio-informatics

http://www.ncbi.nlm.nih.gov/pubmed?term=hadoop

Here is another good source:

http://abhishek-tiwari.com/post/mapreduce-and-hadoop-algorithms-in-bioinformatics-papers

Don't rule out Python, mllib in spark, madlib, mahout, SAS etc. All depends on the problem you are trying to solve...

Noha Elprince

Currently, the best statistical analysis and mining package is R and the best infrastructure that can handle big data is Hadoop. So combining both together can make sense of big data! I suggest use "RHadoop" package..

You may like to read about how to install it in this link:

https://bighadoop.wordpress.com/2013/02/25/r-and-hadoop-data-analysis-rhadoop/

"RHadoop" uses the steaming feature recently embedded in Hadoop (MR2) and it turns the R code implicitly into efficient MapReduce code that run easily on HDFS.

Olivier Parisot

Dear Gajendra,

Your can consider 'stream mining' software like MOA: http://moa.cms.waikato.ac.nz/

Kind regards,

Olivier PARISOT

Mahesh Chaudhari

If you are looking for batch-level processing of the data, then Hadoop stack is your best choice. However, you are looking for more real-time data mining, then Spark or datastax (http://www.datastax.com/) is a better route.

Thanks everyone for your excellent suggestions.

Bin Jiang

try this if your data are heavy tailed: http://en.wikipedia.org/wiki/Head/tail_Breaks

Tarek Abd El-Hafeez

Use Python.