Thank you for your reply. I mean how to process. For example, the naive solution is to collect all the data into one data center then process them. Is there any other method or techniques to solve the problem?
- Using Apache Hadoop: An open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware. https://www.tutorialspoint.com/hive/hive_tutorial.pdf https://www.tutorialspoint.com/impala/impala_tutorial.pdf
You can use NIFI or Sqoop for importing the data from disparate data islands and process the data using Apache Hive,Apache Pig,Python(Pandas or Numpy),Spark etc.
If you don't want to import the different sources data on Hadoop Ecosystem(HDFS),in that case you have to create a federation layer like Impala,Kudos,Drill etc.