A Review Study of Apache Spark in Big Data Processing
V Srinivas Jonnalagadda , P Srikanth , Krishnamachari Thumati, Sri Hari Nallamala
International Journal of Computer Science Trends and Technology (IJCST) – Volume 4 Issue 3, May - Jun 2016
"Apache Spark is a powerful open source processing engine built around speed, ease of use, and sophisticated analytics. Since its release, Apache Spark has seen rapid adoption by enterprises across a wide range of industries. Internet powerhouses such as Yahoo, Baidu, Airbnb, eBay and Tencent, have eagerly deployed Spark at massive scale, collectively processing multiple petabytes of data on clusters of over 8,000 nodes. It has quickly become the largest open source community in big data, with over 1000 contributors from 250+ organizations. Spark provides a simple way to parallelize these applications across clusters, and hides the complexity of distributed systems programming, network communication, and fault tolerance. The system gives them enough control to monitor, inspect, and tune applications while allowing them to implement common tasks quickly. The modular nature of the API (based on passing distributed collections of objects) makes it easy to factor work into reusable libraries and test it locally."