PostgreSQL is not natively distributed. Scalable depends on your needs.
Typically distributed means a database that can work as a group of several nodes (instances, servers, etc.) working together. Most relational databases are not built for this architecture and instead focus on being a single-instance installation that works on one server.
For scalability, PostgreSQL (like other relational databases) is designed to scale vertically by running on bigger and faster servers when you need more performance. Starting with version 9.6, it has started to have more parallel processing so that queries can take advantage of multiple cores on a single machine vs just being single-threaded. This should greatly improve concurrency and performance since memory and storage are getting so much faster while CPUs are increasing in cores rather than raw speed.
To scale horizontally, Postgres has decent replication in the latest versions so you can create multiple replicas that can be used for reading data (not writing), but it does not offer any automatic sharding. This is a middle ground as you might be able to increase the read workload by making extra copies of your data but you can’t spread the data itself across several instances automatically.
If you still want a distributed version of PostgreSQL where data can be spread across several nodes and even have those nodes use replication for high availability, there are some 3rd party options:
Citus Data - the best option, this product is now offered as an extension that can be loaded into postgres to enable distributed architecture.
Postgres-XL - a forked version of postgres designed to be distributed and has some other features like MPP.
It’s important to note that both options above have various tradeoffs and other issues that you have to keep in mind when designing your data. Most queries should work but distributed SQL is very complicated.
There are also other distributed relational databases like MemSQL, VoltDB, CockroachDB and Google Cloud Spanner that are worth a look.
For my Distributed Databases PG course, labs were conducted using Postgresxl which provides distributed environment. The data partitioning experiments can be designed using row and column tools too. If you are also including distributed graph based tools then you may use DGraph tool for such experiments.
I would recommend Distributed Postgres using the Timescaledb extension. No DaaS provides this because the licensing prohibits it, however technology enablers such as Full Stack Engine. my firm, provide this as an implicit operator for Kubernetes. You wind up launching stacks like this:
You can do so in our cloud, which is free for life with few special conditions:
* Your research contributes to our open collective non-profit if plausible/possible
* You operate only stacks we approve of for the sake of security & performance on shared systems
Pretty lax in terms of who uses it, I'm not even tapped into the full power and it's got plenty of tenants. That could change, but for now it remains free. Just ping me and I can set you up for free with a Github Username.
Ping @large.systems on discord: https://lmg.systems/discord