Imagine a simple big-data issue: I have billions of entries, and need to access them according to a predefined key.

In the use-case I have in mind, search is not important. What is important is that I can ask the DB "does this entry already exist" and receive a "yes" or "no" answer really fast. And a number of entries is going to be enormous.

I figured out that the key/value databases are optimized for exactly this type of use cases. However, when I search for the scaling behaviour of the underlying algorithm, I get a huge number of hits on scaling the NoSQL DBs by adding new computer nodes and nothing on scaling of access times with number of entries.

At first I thought that the time should scale as O(log(n)), but apparently it's better than that - according to some obscure results it could even be O(1). Then again, I don't believe in O(1) scaling for billions of data. Something is bound to break, sooner or later.

Similar questions and discussions