5 Tips for Bulk Data Processing ProgrammingBy Angsuman Chakraborty, Gaea News Network
Wednesday, May 30, 2007
We are currently processing huge amount of sensitive corporate data for a Fortune 500 company as the first phase of a project. You have to be very careful in data processing, much more than any standard programming effort. Here are few tips you may find useful when programming to process sensitive data in bulk. Get your best (wo)men on the job.
Institute a policy of random manual check. It may not be feasible to manually verify all or even most of the data. However you must rigorously check a significant random subset of data from every batch. You will be surprised how much you can discover about the data as well as any errors by this simple step.
Program safely not optimally. You must program safely; this is not the time to think about optimizations. Data accuracy is your primary concern. Performance isn’t normally an issue. Name the variables clearly and accurately to help with code review.
Write down your logic in pseudo-code. Code review yourself at least twice and get at least one other person to do it in details. It is very easy to miss little details while coding. Finding such errors are easy in normal application development. Finding little logical errors in huge amount of data is next to impossible.
Thoroughly code review your final code after you are done with at least one or more senior programmers.
Extensively test with a small subset of data. Repeat the process with two or more of such set.
Get your data experts to manually review the generated data. They can find smell faster than anyone else.
I cannot over-stress the importance of writing quality unit tests for such projects. However you should also write tests to independently verify the generated / uploaded data. Get input for such tests from the domain experts. Do not compromise at all on testing.
Use a strongly typed language like Java.
Last but not the least you should get your most experienced developers on the job. Bulk data processing and mining is a different ball-game than standard application development.
Tags: Quality Assurance