Hello, dear RG community.
Personally, I have found Xarray to be excruciatingly slow, especially for big datasets and nonstandard operations (like a custom filtering function). The only suggestion how to speed up the things that I have found on the Internet is to use numpy. When I adjusted my code accordingly (i.e., used numpy), I laughed so hard because I had to convert almost every single piece of Xarray-based code to a numpy-based code. Still, the remnants of the Xarray-based code kept slowing me down. I went ahead and wrote a crazy piece of code combining Dask and Xarray and numpy and, finally, increased the speed to some acceptable magnitude. That was such a pain.
Pandas, of course, are essentially the same speed-wise. And I couldn't find anything else to handle named arrays in Python other than Xarray or Pandas (I work with multidimensional arrays, so I need Xarray anyway).
I read the docs for Xarray. The authors say the reason for Xarray is to be able to work with multidimensional arrays. I can't fully comprehend that. Why not just add this functionality to Pandas? I could understand if they started such big of a project for some big idea, but just add multidimensional functionality that should've better been added to Pandas to spare users time learning two different data bases seems like not a good justification to me. To say nothing that Xarray has ended up being as slow as Pandas.
I think that a good justification for starting a new data base project for Python is to make it really fast first and foremost. I think a new data base project that will follow numpy example must be started: when the code base is written in lightning-fast C/C++ and then Python wrappers are added on top of that.
I am wondering if anybody is aware of such an effort. If so, when should we expect the release?
Thank you in advance.
Ivan