To Delete A Database Record Or Mark As Deleted? - A Discussion
By Angsuman Chakraborty, Gaea News NetworkWednesday, November 8, 2006
Yesterday I received an often asked question from a close friend of mine and someone who I very much respect. In his words:
My new company has an existing system that I am rewriting. The system
has a database that follows the philosophy of never really deleting a
record - just set a flag marking it deleted. What do you think of this?In general, I think it is a bad idea. I think it just creates work
for you in other areas. But I could be naive.I’m using “ruby on rails”, which has this guideline of DRY (Don’t
Repeat Yourself). Rails puts things that are traditionally in the
database into the object-relational mapping layer, such as rules on
referential integrity. I guess referential integrity is normally in 2
places: application code (business logic) and database meta data.
Anyway, database constraints are rarely used in rails.The system is a community site, with discussion boards and lots of
other shared content that is contributed by members. Members may
leave and can be deleted or disabled.I’m tempted to delete members if they should be deleted. I’m tempted
to delete content if it should be deleted. I’m tempted to leave
content from former members unless the content itself is
inappropriate, though I guess content that is missing an associated
member record would be sort of an orphan. (I’m just thinking out loud
here).Do you have any general recommendations?
I would hate to add to every single sql statement “when (deleted is
null or deleted=0)”. And I think there could be other issues, like
unique key collisions with records that are marked ‘deleted’.On a project I created a schema that had sort of a “weak”
relationship in it. This was the goal planning system. The goal table
had a username column. This username column could be used to find the
user in the user table. But if the user was deleted, the goals still
displayed just fine. I guess if a user joined later, reusing a
previously used username they would inherit some existing goals.
Actually this feature was motivation for the schema — I started with
a set of goals from a big spread sheet and I wanted them to exist and
I wanted to be able to add users and them take ownership of goals
already in the system.Finally, if I delete records, do you recommend saving the deleted
record in another table? Is that a common practice?
Here is my take on it:
Never deleting a record is a practice primarily followed in financial industry and some drug companies vying for FDA approval. Whenever you are dealing with financial or other sensitive data it is not a bad idea to archive it in this way. It is reasonably easy to manage such data. You can periodically move the records marked deleted to one or more secondary databases, which is only queried when you want such data, which is rare. So it is ok to have slower response times when you query deleted records.
This way your primary database remains lean and fast.
Today a third need is emerging in social networking and community sites (for never deleting a record). Often terrorist networks (as in Yahoo groups) or drug dealers (as in Orkut) use such sites to communicate with their geographically distributed members. Archiving their communication data may later help law enforcement authorities in tracking them down. The flipside is that it can be abused by the same authorities to pry into the private lives of normal citizens. But that is a story for another day
> I’m tempted to delete members if they should be deleted. I’m tempted
to delete content if it should be deleted. I’m tempted to leave
content from former members unless the content itself is
inappropriate, though I guess content that is missing an associated
member record would be sort of an orphan. (I’m just thinking out loud
here).
I can see additional value you can get from these records, like finding rate of attrition (people leaving the site or being deleted) over time, their content etc. Overall you reduce much of headache over orphan records or referential integrity problems (though I don’t think RoR depend on that) down the road. As it is hardware is cheap these days.
So to summarize I think:
a. there are several benefits in keeping deleted records and simply marking them as deleted
b. there isn’t much cost or performance overhead associated with such a decision even if your site becomes the next MySpace as you can always periodically move them over to secondary databases.
Most sql queries will simply look for records which doesn’t have the deleted field set. For queries involving the deleted records, you will have to first look in the primary database and then look in the secondary deleted records database, if you do decide to use a secondary database.
> like unique key collisions with records that are marked ‘deleted’.
That wouldn’t be an issue. If you are expecting heavy load then you may want to use long integer for index. Normal unsigned integer will give you only 4GB.
> Finally, if I delete records, do you recommend saving the deleted record in another table? Is that a common practice?
If you are simply marking them as deleted then you can simply periodically move them to a different database. I wouldn’t suggest moving them to a different table as that is likely to create more confusion and wouldn’t help you as much wrt. performance. It is reasonably common practice to save deleted records. I have worked with such systems in past. I think you will find that they (marking records them as deleted) solve more problems than they create in the long run.
Let me know, my blog readers, what is your take on this issue?
December 15, 2008: 1:50 pm
>This isn’t much of an issue as most modern orm layers or frameworks (like Ruby on Rails or Hibernate) do prefer to handle much of database functionality themselves. How can this be achieved through hibernate? |
November 10, 2006: 8:20 am
In general there is a case where you can have fields which need to be unique for non-deleted entries. In that case it should be marked as non-unique and then the logic handled in application code. This doesn’t add much to the complexity as you can easily see. |
Ivan |
November 10, 2006: 4:07 am
Hi, yes, you’re right if I don’t have unique keys this isn’t a problem and Person wasn’t a good example. But think about financial instruments, securities. In more general I see a trade off with mark as deleted. I mean you can’t rely on certain database features (unique and foreign key constraints). For example if there is a many-to-one relationship between A and B, and if I try to delete B, then the database fk constraint doesn’t allows it. On the other side I can mark it as deleted without any problem, and later I can load A with references to a deleted B. I’m sure this can be solved if I mark the A entities deleted too, but this must be written by me. And finally your code will contain a lot of “db logic” code which will be less tested than a db code. thx, Ivan |
November 9, 2006: 8:56 pm
@Ivan Normally it is solved by not using name as an unique id. I have seen lots of people with identical names, even for names as uncommon as mine. Any table should have a id field which should be the primary (and hence unique) key. This solves the problem you posed cleanly. |
November 9, 2006: 8:52 pm
@Mikael |
Ivan |
November 9, 2006: 5:11 pm
Hi, I’m facing similar problem, so I’m very happy to find your article, thx. What do you think? Thx, |
November 9, 2006: 11:21 am
There’s another aspect of this and that is that if you are ever likely to enable some sort of synchronization you need to know what records have been deleted. There is a big difference in “deleted” and “missing” in a database. You can solve this in a number of ways though. A separate table with deleted id:s would be doable. It depends of the ratio of deleted/created which way to go. Cheers, |
Paul