On Databases and Memory

How to avoid expensive migration and knowing the next step in your tech strategy

Jan 06, 2024

RedBeardLab is an Engineering consulting company, this piece show how we would approach real world scenario.

A redditor asked feedback about moving from Postgresql to MongoDB.

They claim a difficult use case with few bad queries with a lot of conditions, long INs and self referring foreign keys.

Their current setup is too slow, handling only 500 Requests Per Second (RPS) while they were aiming for 3500 RPS.

They were hoping that switching to MongoDB would simplify the problem and automatically fixing their performance issues.

Unfortunately, of course, this migration would NOT help them.

You will be able to reach the same conclusion in two different ways.

Actually doing the migration
Knowing how databases works and where performance comes from

The approach of actually doing the migration however is ridiculously slower and much more expensive.

Businesses would rather NOT doing the migration at all and have someone telling them that the migration won't work and why.

This is where a great consultant comes is and solves the problem in a cost-effective way. And this is what we do with our consulting service.

However, it is also important to communicate well and make sure that all the stack holder are confident that the migration won't be successful.

The reasons are pretty technical and are best discussed in the engineering department.

However, it comes down to how memory, disk, databases and performance interact in Linux.

During query operations, a naive database will fetch the data it needs from the disk. This would it be prohibiting slow.

In order to speed up operations, databases use different layers of cache to read the data from memory, much faster.

Databases uses custom cache layers as well as the OS (usually Linux) default ones.

The difference in performance between a custom cache layer and the OS cache layer is oftentimes negligible. As long as there is a well implemented cache system, the performance will be similar.

Since we know that there are no significant differences between the cache implementation of PostgreSQL and MongoDB we can already be skeptical of the idea.

Both databases will read some data from disk and some other from the cache, then will perform some iterations and comparison and will return the result of the query.

This is all we need to know to be skeptical about the migration and suggest to the business a very strong PoC before investing in the real migration.

But what can be done then?

When working with databases, there are not many tricks.

Get bigger caches - usually it is a simple as using machines with more RAM
Get faster disks - caches are still a small percentage of the working set of the database, faster disks go a very long way.
Better index strategies.
Denormalize the data (putting the data you know that are needed together so that they are faster to get.)

It is possible to implement all these strategies with both PostgreSQL and MongoDB, there is no need for an expensive migration.

Thoughts from the trenches in FAANG + Indie

Discussion about this post

Ready for more?