MySQL vs Postgres

Pros and Cons of MySQL over Postgres

MySQL Postgres

– more popular and widely used
– more learning resources
– uses threads instead of processes for connections
– better indexing. secondary indexes point to primary index.
– read Uber’s blog why they switched from Postgres to MySQL.
– Postgres has no show create table command that comes with MySQL
– Postgres has no unsigned int – materialized views (although they don’t auto-refresh)
– multiple index types (B-Tree, Hash, BRIN, GIN, GIST)
– custom types
– arrays
– extensions (plugins)
– stored procedures can be written in multiple languages
– generally accepted to have more bells and whistles than MySQL
– developer community seems to be more friendly.
– most loved and wanted database on SO
– do what you want with it license. no strings attached.

Performance-wise I think both are close. There are popular tools like sysbench to compare performance but I feel the test cases are a bit contrived and don’t reflect real-world usage. Also the winner might change from year to year like the car rankings in consumer reports.

There is lots of good stuff in above article. Even though the underlying architecture between MySQL vs Postgres is very different, yet they give similar levels of performance. This is because as Nasser says in his article, no solution is without its problems. What we gain in one area is sacrificed by what we lose in another. However, MySQL’s design of using threads is undoubtedly better than Postgres’s design of using processes. You can have many more threads running on a system than processes. And inter-process communication is much more costly than communication between threads (refer this for some more details). I even like the idea of using a clustered index (MySQL) vs. heap (Postgres) to store the data. It leads to less write-amplification as Uber noted in their blog. And when it comes to concurrency, I believe that under very high contention 2PL (MySQL) would perform better than SSI (Postgres) as it would lead to less CPU wastage. Greg Kemnitz, Staff Database Engineer at Google writes: “The one huge advantage of MySQL over Postgres is in the InnoDB (and XtraDB) storage engine, which is extremely good for 24/7 OLTP environments of the sort you’d have in a large production website or customer-facing application. Postgres’ need for a storage garbage collector (VACUUM) basically kills it for these types of applications, which is the biggest reason why Uber switched from Postgres to MySQL“. Its interesting that from a theoretical perspective, MySQL seems to be the better database. Yet when it comes to practice, more people love Postgres than MySQL. Try doing a web search on terms like “I love MySQL” or “MySQL sucks” or “I love Postgres”, “Postgres sucks”, “Why MySQL is better than Postgres”, “Why Postgres is better than MySQL” etc. Getting answers to questions (think of this as customer support) is also easier with Postgres with their user-friendly mailing list instead of a clumsy web-only forum with MySQL that requires Oracle login. And the best part about Postgres is that you can do whatever you want with it. You don’t have to ask for any permission or obtain a license. That is why it is becoming more popular and companies like Microsoft (with Citus acquisition) are now actively contributing in a big way to the project.

What do you think? Which one do you prefer? The elephant or the dolphin? Let me know.

Addendum: Comparing MySQL and Postgres indexes to Chapters 10 and 9 of File Structures by Michael Folk et. al.

Recently I came across this excellent book. AFAIU, Postgres index corresponds to the index covered in Chapter 9 of this book (referred to as a B tree by the authors) whereas MySQL index corresponds to Chapter 10 (referred to as B+ tree by the authors). The authors write (Ch 9, p. 401):

In this book, we use the term B+ tree to refer to a somewhat more complex situation in which the data file is not entry sequenced (an entry sequenced file = Postgres’ heap in which elements appear in order in which they are inserted into the database) but is organized into a linked list of sorted blocks of records (this is exactly like MySQL’s clustered index AFAIU).

The great advantage of the B+ tree organization is that both indexed access and sequential access are optimized.
https://sunilwanjarisvpcet.files.wordpress.com/2020/03/file_structures_an_object-oriented_approch.pdf

Same sentiment is echoed in Chapter 1 p.4:

B trees provided excellent access performance, but there was a cost: no longer could a file be accessed sequentially with efficiency. Fortunately, this problem was solved almost immediately by adding a linked list structure at the bottom level of the tree.

Now I understand what Nasser is referring to when he writes:

I think MySQL is the winner here for range queries on primary key index, with a single lookup we find the first key and we walk the B+Tree linked leaf pages to find the nearby keys as we walk we find the full rows.

Postgres struggles here I think, sure the secondary index lookup will do the same B+Tree walk on leaf pages and it will find the keys however it will only collect tids and pages. Its work is not finished. Postgres still need to do random reads on the heap to fetch the full rows, and those rows might be all over the heap and not nicely tucked together, especially if the rows were updated.
https://medium.com/@hnasr/postgres-vs-mysql-5fa3c588a94e

With SSDs that are common nowadays, the difference in performance might be less pronounced as I read that for a SSD, random access is no that much different in terms of cost than sequential access [1]