Business Logic != Database Logic
Nov7
Databases are good for one thing: storing data. But so are spreadsheets, PDF files, and the like. What separates out a good database from other data storage facilities is the ability to retrieve and store records in the database efficiently.
There are a lot of table models that seem to be more worthy of a spreadsheet than a database. For example, consider this scaled down employees table (my apologies for the dots, but it makes the formatting work):
+----+------+------------+
| id | name | department |
+----+------+------------+
| 1. | Bob. | Sales..... |
| 2. | Mark | Shipping.. |
+----+------+------------+
So, once this table is filled in with all sorts of grand information about the employees we have a database to work with. We can insert employees, and delete them, and update their information. Each of these operations mirrors a database operation of the same name.
But, when we do that we lose historical data. If Bob quits all of a sudden, and he’s deleted from the database, then from that point one it’s like he never existed!
So we work around that. We make some kind of currently_employed boolean column to denote which employees are active and which are not. We also probably add a date of hire and a date of termination.
Thus, we went from database CRUD to CRU.
What happens, if instead of Bob quitting, that he gets transferred over to Mark in the shipping department?
Someone will have to go in and update the table to reflect that Bob’s department is now Shipping. Easily done. However, after that point there’s no history to show that Bob was ever in sales. Maybe his hire_date column gets changed instead, which then makes it look like he’s only work for the company a few days instead of all of those years he put in at the sales department.
Or, we could model the data structure so that it’s more flowing. Get rid of the updates and deletes, and stick with inserts and selects only.
Consider this new data model:
+----+-----------+------+------------+------------+
| id | person_id | name | department | hire_date. |
+----+-----------+------+------------+------------+
| 1. | 1........ | Bob. | Sales..... | 2002.01.01 |
| 2. | 2........ | Mark | Shipping.. | 2002.02.02 |
+----+-----------+------+------------+------------+
and see what happens now when Bob moves to Shipping:
+----+-----------+------+------------+------------+
| id | person_id | name | department | hire_date. |
+----+-----------+------+------------+------------+
| 1. | 1........ | Bob. | Sales..... | 2002.01.01 |
| 2. | 2........ | Mark | Shipping.. | 2002.02.02 |
| 3. | 1........ | NULL | Shipping.. | 2005.11.11 |
+----+-----------+------+------------+------------+
Every time something new happens we do an insert instead of an update. This keeps all of the historical data intact.
Now, all we have to do is update our software to understand this logic. When we go to query employee information, what we want to happen is that it merges all of the data together down the database based on person_id. NULL values are simply ignored.
I have implemented this scheme in a number of non-Rails applications here. The most visible place where this is handy for us is device calibrations. Instead of just loading in new values into the database for new device calibrations, we insert new rows into the database and let the logic pick out the latest and greatest calibration. But it allows us to go back after the fact and see the calibration history for each of the items, when it was calibrated, where, and by whom.
I use a Rails site to access some of this information, but it’s not seamless to make it work. I had to add some views to the database, and put some custom code in both the model and controllers. My thought here is that I could achieve this seamlessly by overriding some of the ActiveRecord::Base methods. That is my next step, time permitting.
Surely, though, I didn’t invent a new model or pattern did I? I can’t be the only one who wants to store their data in this fashion.
7 Comments
Sorry, the comment form is closed at this time.
12:27 pm on November 11th, 2005
You could use something like
acts_as_versionedto handle this. It basically keeps a mirror of your table and keeps track changes. That way you would just interact with the main table and the mirroed table would track historical changes.Also, I’m not sure why you’d store ‘Shipping’ as a text field in the table. Might make more sense to have a departments table which a person would be long to.
2:11 pm on November 11th, 2005
This is a *very* nice idea. I’ve often had the need for something like this but never pursued it to the end result you have here. I really like it. If you start packaging this into a plugin for ActiveRecord, I’d like to be involved.
Take the author of this comment and add @gmail.com to contact me.
5:47 pm on November 11th, 2005
Hmm… seems to me that you’re polluting/diluting your data semantics. “hire_date” and “name” now have different/special meanings depending on which record you’re looking at for person_id=1.
Two alternative approaches for handling this:
1. Have an archive table that stores the historic versions of the data (as Robby suggested).
2. Normalization. You’ve really got a one-many relationship here between an employee and their work history. Factor-out any fields that could change into a separate table, with person_id as a foreign key. Select for the max hire_date (or perhaps it should be called start_date) when joining to the work history table.
Making such changes to your data model may add some pain at the time, but in my experience, keeping your data semantics from being diluted will benefit you in the long run.
Just my 2 cents.
5:57 pm on November 11th, 2005
This solution makes it more difficult to read the current version. A more standard approch is to fill in every field, and mark the out of date versions obsolete. You can even create a view that excludes the out of date versions so you can simply query the current values.
We need to be able to back date changes so that they apply from some point in the past. I think of these changes as transations against the entity, and the entities current state as analogous to a balance. This approach would work nicely for this if you applied the records in date order up until the observation date for the query. We don’t plan to use the approach you suggest because in our case it is a many valued field so we will just add records.
5:55 am on November 12th, 2005
I agree with Kunz and would normalize this into an explicit one-to-many realtion with two tables. A database shouldn’t need data to be understandable.
11:46 am on November 12th, 2005
Abstracting all unchangable data, along with a person_id into a second table with a one-to-many relationship makes the most sense to me. Unfortunately, when it comes to a person, there isn’t always a lot of data that isn’t subject to change - person_id, SSN, and DOB maybe.
11:55 am on November 14th, 2005
I like Ken’s approach - with a bit more detail.
I would create a table to hold people and all necessary details that describe that person. Then create a separate table to hold all HR activity details (i.e. HR_Activities) - initial hiring, department changes, termination, etc. You would also need a table that holds corresponding HR Activity codes.
In this model, you would be able to easily determine all history with a given person without ever deleting the person. Granted, the person table would grow forever. You could eventually purge it (or offload it) by checking for persons who have had no HR Activity for a given period (X years).
Just my 2 cents