Is it worth designing a database equivalent of Unix shell?

15 March 2021 6 9K Report

I'm looking for a topic for my master thesis. I've read a book recommended by many "Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems" by Martin Kleppmann in a hope that I may get inspired by reading it. The last chapter of the book is dedicated to yet-to-be-solved problems. One of them is designing an unbundled-database equivalent of the Unix shell. This is what the author has to say on the topic (p. 525):

The tools for composing data systems are getting better, but I think one major part is missing: we don’t yet have the unbundled-database equivalent of the Unix shell (i.e., a high-level language for composing storage and processing systems in a simple and declarative way). For example, I would love it if we could simply declare `mysql | elasticsearch`, by analogy to Unix pipes [22], which would be the unbundled equivalent of `CREATE INDEX`: it would take all the documents in a MySQL database and index them in an Elasticsearch cluster. It would then continually capture all the changes made to the database and automatically apply them to the search index, without us having to write custom application code. This kind of integration should be possible with almost any kind of storage or indexing system.

The only research that I've found is that concerning timely dataflow and differential dataflow. I wonder whether there is more research that I'm missing or maybe there isn't any because the problem presented is unimportant or has been solved a long time ago. I'd appreciate your thoughts on the idea of building such a system. Is it needed? Is it going to fill a niche?

Mark Sitkowski

All the smart research is going into in-memory databases which, having no disks to read/write, are orders of magnitude faster.

Dashamir Hoxha

Are you talking about a standard ETL (Extract-Transform-Load) interface, similar to a UNIX shell? I am not an ETL expert, but I don't think such a standard exists. So, it could be worth exploring the possibility of defining such a standard. In order to do this, first of all you should become familiar with a few ETL tools, so that you can come up with some abstractions that can be implemented by all of them. For example:

- https://blog.panoply.io/17-great-etl-tools-and-the-case-for-saying-no-to-etl

- https://hevodata.com/learn/8-best-data-transformation-tools/

Mark Sitkowski

This might interest you, too:

https://www.theregister.com/2021/03/18/oracle_cloud_data_warehouse/

Victor Fedko

I think that research in this direction is of interest to SQL Server: Windows vs. Linux

Mark Sitkowski

It only takes about an hour to learn to write SQL, so is there actually a market for all this?

Victor Fedko

When it comes to SQL Server: Windows vs Linux, comparisons of query performance, resilience, security, availability, etc. are of interest.

Why reactivity isn't increased with more empty spots in valence shell?

Which will be the best software for the Hydration shell analysis with molecular dynamics?

How to do pca analysis of c-alpha atom of the protein?

How to calculate FIC index?

How to identify wetland area in Landsat imagery?

Could someone please provide a list of journals that accept the application of methodologies in nature-based solutions?

Do anyone knows any book dealing with fish growth/zootechnical performances and somatic indexes in aquaculture?

Correct statistical approach for a hermit crab shell exchange experiment?

Which are the Scopus Indexed Journals in Computer Science with short review time?

How to get Scopus Author Index ??