5 Tips for Bulk Data Processing Programming
By Angsuman Chakraborty, Gaea News NetworkWednesday, May 30, 2007
We are currently processing huge amount of sensitive corporate data for a Fortune 500 company as the first phase of a project. You have to be very careful in data processing, much more than any standard programming effort. Here are few tips you may find useful when programming to process sensitive data in bulk. Get your best (wo)men on the job.
Institute a policy of random manual check. It may not be feasible to manually verify all or even most of the data. However you must rigorously check a significant random subset of data from every batch. You will be surprised how much you can discover about the data as well as any errors by this simple step.
Program safely not optimally. You must program safely; this is not the time to think about optimizations. Data accuracy is your primary concern. Performance isn’t normally an issue. Name the variables clearly and accurately to help with code review.
Write down your logic in pseudo-code. Code review yourself at least twice and get at least one other person to do it in details. It is very easy to miss little details while coding. Finding such errors are easy in normal application development. Finding little logical errors in huge amount of data is next to impossible.
Thoroughly code review your final code after you are done with at least one or more senior programmers.
Extensively test with a small subset of data. Repeat the process with two or more of such set.
Get your data experts to manually review the generated data. They can find smell faster than anyone else.
I cannot over-stress the importance of writing quality unit tests for such projects. However you should also write tests to independently verify the generated / uploaded data. Get input for such tests from the domain experts. Do not compromise at all on testing.
Use a strongly typed language like Java.
Last but not the least you should get your most experienced developers on the job. Bulk data processing and mining is a different ball-game than standard application development.
Tags: Quality Assurance
April 10, 2010: 12:05 am
Hello, We look forward that, you will consider our mail and paid us prompt & Positive Response from your side. |
July 16, 2009: 10:11 am
Hello Sir/Madam, We are data entry service provider unit in Tamilnadu, India. We have almost five years experience in this field. In our services we have finished many kind of project. Our focused category is Form processing, Medical billing form processing, Text Data typing, Data Conversion work, etc,. Thanking you, Best Regards, |
John |
May 30, 2007: 11:19 am
In addition to safety, one tip as a complementary is the use of high throughput package like GreenTea (https://www.GreenTeaTech.com) to speed up the data processing. We’ve used it extensively to process huge financial data sets to speed up the process. We’ve found it quite useful and would like to share it with your reader audiences. John |
Rahul