Choosing the right language for data analysis can be almost as complicated as actually learning the language. For many reasons, R and Python are two of the most popular: R is often praised for its great features for data visualization, as it was developed with statisticians in mind; plenty of programmers love multi-purpose Python for its so-simple-a-child-could-do-it syntax.
The R programming language, with its strength in statistics, has seen a resurgence in use concurrent with the Big Data boom, climbing the ranks in language popularity indices.
Python, meanwhile, "is rapidly gaining mainstream appeal as a hybrid of R's fast, sophisticated data mining capability, and a more practical language to build products," according to an article on Fast Company. "Python is intuitive and easier to learn than R, and its ecosystem has grown dramatically in recent years, making it more capable of the statistical analysis previously reserved for R."
Java, of course, is a de facto standard for working with Hadoop, being used to write Hadoop itself. It was also just named "Programming Language of the Year" for 2015 by TIOBE Software.
One poll suggested that these two open source languages (Python, R) were between them used in nearly 85% of all Big Data projects.
Python: Python is one of the most popular open source (free) languages for working with the large and complicated datasets needed for Big Data. It has become very popular in recent years because it is both flexible and relatively easy to learn. Like most popular open source software it also has a large and active community dedicated to improving the product and making it popular with new users.
R: R is also hugely popular and supported by a large and helpful community. Where Python excels in simplicity and ease of use, R stands out for its raw number crunching power. Its widespread adoption means you are probably executing code written in R every day, as it was used to create algorithms behind Google, FB, Twitter and many other services.
Choosing the right language for data analysis can be almost as complicated as actually learning the language. For many reasons, R and Python are two of the most popular: R is often praised for its great features for data visualization, as it was developed with statisticians in mind; plenty of programmers love multi-purpose Python for its so-simple-a-child-could-do-it syntax.
The R programming language, with its strength in statistics, has seen a resurgence in use concurrent with the Big Data boom, climbing the ranks in language popularity indices.
Python, meanwhile, "is rapidly gaining mainstream appeal as a hybrid of R's fast, sophisticated data mining capability, and a more practical language to build products," according to an article on Fast Company. "Python is intuitive and easier to learn than R, and its ecosystem has grown dramatically in recent years, making it more capable of the statistical analysis previously reserved for R."
Java, of course, is a de facto standard for working with Hadoop, being used to write Hadoop itself. It was also just named "Programming Language of the Year" for 2015 by TIOBE Software.
Dear Shafagat, for this domain and several others, there may be several options that each programmer will find useful. Normally, I do not suggest a 'best'. Whatever is good for the individual is likely to be the 'best'. Thanks.
R is a better data analysis language. Python is also preferred in data analysis with its Scipy, Numpy and Pandas platforms.However, there are other efficient programming languages used by programmers as per their choice, application area and convenience.
Coding is one of the primary skills in a data scientist’s toolbox. Some incredibly powerful applications have successfully done away with the need to code in some data-science contexts, but you’re never going to be able to use those applications for custom analysis and visualization. For advanced tasks, you’re going to have to code things up for yourself, using either the Python programming language or the R programming language.
Your database system decides which kind of programming language you have to choose. SQL is used for setting up very big internet database system. I have studied SQL 10 years ago. In my mind SQL is based on Visual B and VisualC. Microsoft Access is an application for Enterprise Edition database system.Its language is Visual B.
About Java, It is often used in applications like Macromedia Flash and Director.
With an ever-growing number of businesses turning to Big Data and analytics to generate insights, there is a greater need than ever for people with the technical skills to apply analytics to real-world problems. Computer programming is still at the core of the skillset needed to create algorithms that can crunch through whatever structured or unstructured data is thrown at them. Certain languages have proven themselves better at this task than others.
Python is one of the most popular open source (free) languages for working with the large and complicated datasets needed for Big Data. It has become very popular in recent years because it is both flexible and relatively easy to learn. Like most popular open source software it also has a large and active community dedicated to improving the product and making it popular with new users.
R has simple and obvious appeal. Through R, you can sift through complex data sets, manipulate data through sophisticated modelling functions, and create sleek graphics to represent the numbers, in just a few lines of code. It’s likened to a hyperactive version of Excel. The most popular languages continue to be R (used by 61% of KDnuggets readers), Python (39%), and SQL (37%). SAS is stable at around 20%. The highest growth was for Pig/Hive/Hadoop-based languages, R, and SQL,
In my opinion, Matlab is very successful in big data fields, especially as it allows to the programmers to write algorithms with a few line of code LOC to solve problems that need a larger number of LOC using other languages, it also helps to execute recurring operations without needs to write any loop statements ( in most cases 'matrix operations' ) which provides high execution speed thus saving a lot of time, in addition to many of statistical and analytical tools that help to analyse and visualize the results Professionally, through my personal experience with working on image retrieval systems which is one of the big data fields Matlab gave me high flexibility to work than any other languages (i think), anyway, the general rule remains "nothing perfect". you can see also:
In the department of informatics of minho university it was developed a data calculus SETS and the programming lnaguage employed was Haskell, a funcional programming language.
There is many softwares can solve your problems. depending on your idea. If you want to make your own program then many choices will arise. I agree with others and adding MiniTab software for your informations and Matlab as well.
All are very useful and supported answers, and for that reason I up-voted all. In my experience R is an excellent software with many good characteristics but mainly because is open source and there are a lot of developers solving highly specialized problems that any newcomer will appreciate. Also R has a lot of forums and blogs where any one can find good information and solutions asking well the right questions. As a result of my own experience and attending the previous answers the winner is R
Shafagat I sincerely hope you will have a great experience learning from big data with R!
Manuel
P.D.: If you want to know how to ask well the right question in specialized forum please read this first
Eric Raynmond & Rick Moen's "How to ask questions the smart way" you can get it in:
On the web, you can find many numbers comparing the adoption and popularity of R and Python. While these figures often give a good indication on how these two languages are evolving in the overall ecosystem of computer science, it’s hard to compare them side-by-side. The main reason for this is that you will find R only in a data science environment; As a general purpose language, Python, on the other hand, is widely used in many fields, such as web development. This often biases the ranking results in favor of Python, while the salaries are affected somewhat negatively.
For more information you may refer to the following link
In my opinion the answer is that it depends (for most programmers the language is a hammer and every problem is a nail). My experience with big data (I define big data as multi-terabyte or above) is that for processing of files, the level of control needed is best handled by programs that allow control of the read/write buffer to optimize it for efficient disk read/write operations . Lack of control over this buffer has given me headaches when logging procedures to recover from failures (crucial if you are handling real big data projects). Efficient memory use is also crucial. Wasteful space utilization through unoptimized language interpreters will mean less space for data in memory.
For analysis of data, double precision floating point is necessary if accuracy is needed. native support for it is preferred to avoid latency issues with library loading (you will notice the difference if the processing takes on the order of more than weeks). For text mining it is imperative for the Regex engine to be optimized for efficient use
In all whether it is for loading data, processing numbers or text or even visualizing, no programming language is king in all of them. Use the tool that is best suited for the particular stage of the process.When the data is big and processing takes months, shaving time through efficiency makes all the difference.
Matlab is catching up with functionality of R, however some features are still clumsy. For data analysis Matlab requires multiple steps whereas R can do the same in one step. Creating multiple temporary variables (or files) makes the code and analysis environment unnecessarily crowded.
If you are looking for a single name then I would say "Java". I think it's the best compromise for readability, scalability, speed, evolution, portability.. Java fits well with all big data frameworks. It's evolving in a fancy way (especially by introducing Lambdas)..
If you are looking for a more comprehensive answer then I think Philippe gave it to you.
These are the languages that are used in Data Science and Big Data:
Python
Java
R
Julia
SAS
SQL
MATLAB
Scala
Out of these the most popular and efficient are Python and R. You can also learn other when needed, but mainly learn atleast one of Python and R. Python is gaining popularity in Data Science so here start learning Python with this basic to advance Python tutorial series - https://data-flair.training/blogs/category/python/
Big data unfortunately is a nebulous term IMHO, and has come to mean different things to different people. If you are talking about statistical analyses, then SAS, R, Stata, and SPSS are all popular statistical software packages.
I recall a TIMS meeting in Miami some 30 years ago, where some proponents of so-called Big Data analyses were presenting, but unfortunately they did not have time to stick around to answer audience questions (of which there were many). Unfortunately some people find the rigour of statistical analyses inconvenient.