To Delete A Database Record Or Mark As Deleted? - A Discussion

By Angsuman Chakraborty, Gaea News Network
Wednesday, November 8, 2006

Yesterday I received an often asked question from a close friend of mine and someone who I very much respect. In his words:

My new company has an existing system that I am rewriting. The system
has a database that follows the philosophy of never really deleting a
record - just set a flag marking it deleted. What do you think of this?

In general, I think it is a bad idea. I think it just creates work
for you in other areas. But I could be naive.

I’m using “ruby on rails”, which has this guideline of DRY (Don’t
Repeat Yourself). Rails puts things that are traditionally in the
database into the object-relational mapping layer, such as rules on
referential integrity. I guess referential integrity is normally in 2
places: application code (business logic) and database meta data.
Anyway, database constraints are rarely used in rails.

The system is a community site, with discussion boards and lots of
other shared content that is contributed by members. Members may
leave and can be deleted or disabled.

I’m tempted to delete members if they should be deleted. I’m tempted
to delete content if it should be deleted. I’m tempted to leave
content from former members unless the content itself is
inappropriate, though I guess content that is missing an associated
member record would be sort of an orphan. (I’m just thinking out loud
here).

Do you have any general recommendations?

I would hate to add to every single sql statement “when (deleted is
null or deleted=0)”. And I think there could be other issues, like
unique key collisions with records that are marked ‘deleted’.

On a project I created a schema that had sort of a “weak”
relationship in it. This was the goal planning system. The goal table
had a username column. This username column could be used to find the
user in the user table. But if the user was deleted, the goals still
displayed just fine. I guess if a user joined later, reusing a
previously used username they would inherit some existing goals.
Actually this feature was motivation for the schema — I started with
a set of goals from a big spread sheet and I wanted them to exist and
I wanted to be able to add users and them take ownership of goals
already in the system.

Finally, if I delete records, do you recommend saving the deleted
record in another table? Is that a common practice?


Here is my take on it:

Never deleting a record is a practice primarily followed in financial industry and some drug companies vying for FDA approval. Whenever you are dealing with financial or other sensitive data it is not a bad idea to archive it in this way. It is reasonably easy to manage such data. You can periodically move the records marked deleted to one or more secondary databases, which is only queried when you want such data, which is rare. So it is ok to have slower response times when you query deleted records.
This way your primary database remains lean and fast.

Today a third need is emerging in social networking and community sites (for never deleting a record). Often terrorist networks (as in Yahoo groups) or drug dealers (as in Orkut) use such sites to communicate with their geographically distributed members. Archiving their communication data may later help law enforcement authorities in tracking them down. The flipside is that it can be abused by the same authorities to pry into the private lives of normal citizens. But that is a story for another day :)

> I’m tempted to delete members if they should be deleted. I’m tempted
to delete content if it should be deleted. I’m tempted to leave
content from former members unless the content itself is
inappropriate, though I guess content that is missing an associated
member record would be sort of an orphan. (I’m just thinking out loud
here).

I can see additional value you can get from these records, like finding rate of attrition (people leaving the site or being deleted) over time, their content etc. Overall you reduce much of headache over orphan records or referential integrity problems (though I don’t think RoR depend on that) down the road. As it is hardware is cheap these days.

So to summarize I think:
a. there are several benefits in keeping deleted records and simply marking them as deleted
b. there isn’t much cost or performance overhead associated with such a decision even if your site becomes the next MySpace as you can always periodically move them over to secondary databases.

Most sql queries will simply look for records which doesn’t have the deleted field set. For queries involving the deleted records, you will have to first look in the primary database and then look in the secondary deleted records database, if you do decide to use a secondary database.

> like unique key collisions with records that are marked ‘deleted’.
That wouldn’t be an issue. If you are expecting heavy load then you may want to use long integer for index. Normal unsigned integer will give you only 4GB.

> Finally, if I delete records, do you recommend saving the deleted record in another table? Is that a common practice?

If you are simply marking them as deleted then you can simply periodically move them to a different database. I wouldn’t suggest moving them to a different table as that is likely to create more confusion and wouldn’t help you as much wrt. performance. It is reasonably common practice to save deleted records. I have worked with such systems in past. I think you will find that they (marking records them as deleted) solve more problems than they create in the long run.

Let me know, my blog readers, what is your take on this issue?

Discussion

Paul
December 15, 2008: 1:50 pm

>This isn’t much of an issue as most modern orm layers or frameworks (like Ruby on Rails or Hibernate) do prefer to handle much of database functionality themselves.

How can this be achieved through hibernate?

November 10, 2006: 8:20 am

In general there is a case where you can have fields which need to be unique for non-deleted entries. In that case it should be marked as non-unique and then the logic handled in application code. This doesn’t add much to the complexity as you can easily see.
This isn’t much of an issue as most modern orm layers or frameworks (like Ruby on Rails or Hibernate) do prefer to handle much of database functionality themselves.


Ivan
November 10, 2006: 4:07 am

Hi,

yes, you’re right if I don’t have unique keys this isn’t a problem and Person wasn’t a good example. But think about financial instruments, securities.
They have primary keys but at the same time you can’t have 2 securities with the same name or ISIN code. They have to be unique. (Maybe this can be solved if the deleted mark is an integer: if mark=0, then it is not deleted, if mark != 0, it is deleted. So if you mark an entity deleted, you set mark = max(mark) + 1. Then you can put an unique key on (name, mark).).

In more general I see a trade off with mark as deleted. I mean you can’t rely on certain database features (unique and foreign key constraints). For example if there is a many-to-one relationship between A and B, and if I try to delete B, then the database fk constraint doesn’t allows it. On the other side I can mark it as deleted without any problem, and later I can load A with references to a deleted B. I’m sure this can be solved if I mark the A entities deleted too, but this must be written by me. And finally your code will contain a lot of “db logic” code which will be less tested than a db code.
I’m very confused… I’m looking for a good solution, which is simple and safe at the same time.

thx, Ivan

November 9, 2006: 8:56 pm

@Ivan
> If I have a Person table with uq key on the NAME field, then after I delete (mark) a person named Peter, I can’t add a new person with the same name, right?

Normally it is solved by not using name as an unique id. I have seen lots of people with identical names, even for names as uncommon as mine.

Any table should have a id field which should be the primary (and hence unique) key. This solves the problem you posed cleanly.

November 9, 2006: 8:52 pm

@Mikael
There is simply a field called deleted, which is by default false. When a record is deleted, it is marked as true. So all you have to do is fetch all records where deleted is true.


Ivan
November 9, 2006: 5:11 pm

Hi,

I’m facing similar problem, so I’m very happy to find your article, thx.
My problem with “marking as deleted” is the unique key constraint which is mentioned in the article, and I don’t understand your answer. If I have a Person table with uq key on the NAME field, then after I delete (mark) a person named Peter, I can’t add a new person with the same name, right?
I think this can be solved by a secondary db which seems to me a good idea. But I think this raises another problem - how can I manage the relationships between entities?
I mean if the entity (A) which I want do delete has a reference to another entity (B) (which I don’t want to delete), should I save B to the secondary database? Or should I delete this relationship and build it again up in appcode, when I load the A entity from the secondary database?
In extreme situation when you have a very complex object mapping and relationships, the first case can lead to a secondary db which is almost the same as the primary db.

What do you think?

Thx,
Ivan

November 9, 2006: 11:21 am

There’s another aspect of this and that is that if you are ever likely to enable some sort of synchronization you need to know what records have been deleted. There is a big difference in “deleted” and “missing” in a database.

You can solve this in a number of ways though. A separate table with deleted id:s would be doable. It depends of the ratio of deleted/created which way to go.

Cheers,
Mikael Grev

YOUR VIEW POINT
NAME : (REQUIRED)
MAIL : (REQUIRED)
will not be displayed
WEBSITE : (OPTIONAL)
YOUR
COMMENT :