SQL Antipatterns: Then and Now

I am an absolute fan of Bill Karwin’s SQL Antipatterns book. I rate it amongst my top 3 database books together with Martin Kleppmann’s Designing Data-Intensive Applications. However, its good to think twice about some of the advice in the book before blindly following it:

Store Images and other BLOB data in the Database

This is what Bill recommends but I think nobody does that in practice these days. Bill’s argument in favor of storing images in the database is that everything is stored in one place, there is only one backup to take and restore process is simple. I agree with all of that. Bill tries to keep things simple for the developer. But we have to pay the price in terms of performance. I think maybe this was a good practice pre-2007s when the book was written, but as data volumes have grown storing blob data in the database is an antipattern. Relational database is meant to store structured data, not unstructured data.

From High Performance MySQL 4th ed.:

In the past, it was not uncommon for some applications to accept uploaded images and store them as BLOB data in a MySQL database. This method was convenient for keeping the data for an application together; however as the size of the data grew, operations like schema changes got slower and slower due to the size of that BLOB data.

If you can avoid it, don’t store data like images in a database. Instead, write them to a separate object data store and use the table to track the location or filename for the image.

Foreign Keys

Bill is a fan of FKs. He writes:

Foreign keys are easy to use, improve performance, and help you maintain consistent referential integrity during any data change, both simple and complex.

Performance is often used as a justification for cutting corners, but it usually creates more problems than it solves – including performance problems.

Contrast this with the advice given in High Performance MySQL (3rd ed.):

We think foreign keys are a nice-to-have feature for ensuring system integrity, but they’re a luxury for applications that need extremely high performance; most people don’t use them when performance is a concern, preferring instead to trust the application code.

Foreign keys can add significant overhead. We don’t have any benchmarks to share, but we have seen many cases where server profiling revealed that foreign key constraint checks were the performance problem, and removing the foreign keys improved performance greatly.

This theme is also echo’ed here:

Avoid foreign key constraints, unless you are in a department that specifically requires you use them. MySQL allows you to define foreign key constraints. They come at a price…

Again, as with the blob dilemma the question is one of simplicity vs. performance. Which one would you rather have? Bill sides on making the developer’s life simple and is willing to sacrifice performance in favor of data integrity.

Using NUMERIC to store financial data

This one is less controversial. Bill recommends using the NUMERIC data type to store financial data to protect oneself from rounding off errors caused by floating point arithmetic. I think the advice holds today and most people will agree with it but fwiw, I found the double precision offered in Postgres 14 is good enough for 15 digits precision and there is no loss of precision for the numbers I tried. The double precision only takes 8 bytes whereas NUMERIC can end up taking much more. Depending on your application and performance trade-offs, using NUMERIC might be overkill.

This entry was posted in Computers, programming, Software and tagged , . Bookmark the permalink.

Leave a comment