Databases are painful

Guess I have been fairly fortunate in my career by avoiding the complexity of databases. I've always generally used flat files, increasingly stored as JSON (NoSQL?) and built my only little CRUD functions to use them. Of course I also have been careful to structure my data paths /user/name/2019/ and perhaps cheated a little bit with symlinks /2019/user/name/ from time to time.

Bad analogy.

Relational databases are almost always the right tool for a data storage job. If you're going to use something else you had better have a really confident reason for doing so.

Anticipating multiple terrabytes of unstructured log data for example.

— Simon Willison (@simonw) January 27, 2019

Pain point #1 - treating a RDMS like a shared database

I always imagined a database as a shared resource like 12 factor App implies.. but it seems that several competing microservices interacting with the same data is my fantasy.

My colleague has introduced me to Service Orientated Architecture where the Application and Database are tightly coupled. Now I cannot access the company's data store without going through a primitive RestFUL API.

A service should own its own data and associated operations since only the Application knows the intent of the actions being performed on that data.

My thought: Microservices needing canonical data via a RestFUL interface will suck imo.

Pain point #2 - Where does the business logic go again?

I agree that code can probably model intent the best, though perhaps a DBA would disagree?

When I see SQL features like:

I'm often left thinking, what should be in the code and what should be in the database schema? Should code just be CRUD? Should code dictate the data structure and the database is just a store?

Truth is that sometimes it's a mix, i.e. a real mess. Take deletions for example. If I delete a Unit, don't you think it's related data should also be removed? This can be modelled in SQL with a "cascade delete". Or maybe the record should NOT be deleted at all for compliance reasons, and moved to a deleted table? I have no idea what's best practice. I do know recovering data from AWS RDS takes hours.

My thought: If basic logic could be modelled in the database, might that approach be better for microservices consuming the data? It was so easy in my previous life when you could just model your data structure in a struct and just dump to a data store.

Pain point #3 - How do I iterate out of this mess?

it's always best to design properly to begin with

I believe you iterate towards the solution. You can't possibly design right from the start. However what you can do, is make sure your development iterations are short and that you can migrate / update your system painlessly ... to reach a better design.

I'm totally confounded by the complexity of schema updates. Outside Rail's rake migrate, this seems really hard to do confidently. I've yet to find a good Golang based solution.

Things like an online schema change without down time, seems like a black art.

My thought: Am I missing some tools or reading matter to help me manage a complex database schema like Bugzilla's?

Pain point #4 - Tooling and managed services

error=Error 1062: Duplicate entry '32767' for key 'PRIMARY'

How do you check a table is full or about to be full? Write your own tools!?!

Databases have been around for about 45 years, almost as long as Unix, and I'm just confounded on how bad the tooling is. Nevermind the gotchas of using the Golang client driver. Perhaps I need to familiarise myself with the Percona toolkit. Gosh things are complex. It feels extra stressful when pretty much all the business value is inside it.

Does a DBA design or administer? I'm saddled with maintaining AWS RDS. With its hundreds of parameters and nuances regarding how they applied. I've had to write a tool, just to cope with a "managed" RDMS solution. Yikes.

Pain point #5 - Too easy for things to go horribly wrong

Had a torrid day with a database today. Doing a mysqldump brought it to its knees. And then people were saying there was data loss and I could find know evidence of that. What a nightmare.

— Kai Hendry πŸ‡ΈπŸ‡¬ (@kaihendry) February 14, 2019

When things do go wrong, unlike flat files is near impossible to quickly check the integrity of a large database.

It's also non-trivial to do incremental backups, i.e. take full dumps and replay binary logs.


Here are some thoughts on the matter.
Bottom line: RDBMS are flexible repositories for your data, provided that you are comfortable with how they are organised, and you see SQL as an ally, rather than an enemy.

— Giuseppe Maxia (@datacharmer) January 30, 2019
Found any of my content interesting or useful?