I need to process several CSV files of different subjects based sliding windows technique.
The currently 3.2 version of pyspark package for Python is fully compatible with pandas package through this import "import pyspark.pandas". So, this is a good point. However, I have noted that pyspark takes a considerable time to load a single CSV file with pyspark.pandas.read_csv function. As I have several CSV as I have mentioned above, I am considering to save a one single CSV which concatenates all the CSV files. Next, I need to segment the dataset in sliding windows and extract features.
Is this the best way to proceed?
Thanks in advance.