5 Tips for Bulk Data Processing Programming

By Angsuman Chakraborty, Gaea News Network
Wednesday, May 30, 2007

We are currently processing huge amount of sensitive corporate data for a Fortune 500 company as the first phase of a project. You have to be very careful in data processing, much more than any standard programming effort. Here are few tips you may find useful when programming to process sensitive data in bulk. Get your best (wo)men on the job.

Institute a policy of random manual check. It may not be feasible to manually verify all or even most of the data. However you must rigorously check a significant random subset of data from every batch. You will be surprised how much you can discover about the data as well as any errors by this simple step.

Program safely not optimally. You must program safely; this is not the time to think about optimizations. Data accuracy is your primary concern. Performance isn’t normally an issue. Name the variables clearly and accurately to help with code review.

Write down your logic in pseudo-code. Code review yourself at least twice and get at least one other person to do it in details. It is very easy to miss little details while coding. Finding such errors are easy in normal application development. Finding little logical errors in huge amount of data is next to impossible.
Thoroughly code review your final code after you are done with at least one or more senior programmers.

Extensively test with a small subset of data. Repeat the process with two or more of such set.

Get your data experts to manually review the generated data. They can find smell faster than anyone else.

I cannot over-stress the importance of writing quality unit tests for such projects. However you should also write tests to independently verify the generated / uploaded data. Get input for such tests from the domain experts. Do not compromise at all on testing.

Use a strongly typed language like Java.

Last but not the least you should get your most experienced developers on the job. Bulk data processing and mining is a different ball-game than standard application development.


April 10, 2010: 12:05 am

My self Rahul Patel. I Belong to Ahmedabad. I have a cafe with setup of 10 PCs since last 14 months. I have been seeking a best project for Offline data entry, data conversion, bulk mailing etc… I have read your Advertise & I am requesting you to provide further details of your project about what actually Work it is. Rate per entry, Contract Period, Payment Guarantee, terms & condition, Investment & other requirement. U can send us a ‘Snap-shot’ & ‘Demo’ if available. Hence we can read & analyze a Type of work.

We look forward that, you will consider our mail and paid us prompt & Positive Response from your side.

July 16, 2009: 10:11 am

Hello Sir/Madam,

We are data entry service provider unit in Tamilnadu, India. We have almost five years experience in this field. In our services we have finished many kind of project. Our focused category is Form processing, Medical billing form processing, Text Data typing, Data Conversion work, etc,.
Right now we are looking for some work related to our service. We are ready to provide our data entry service for required companies. We will give 100% quality work.
Our objective is to provide the buyers completed projects within their budget and time. We are hard working professionals and quick learner.

Thanking you,

Best Regards,

May 30, 2007: 7:42 pm

Nice contextual advert, but I will let it through this time :)

May 30, 2007: 11:19 am

In addition to safety, one tip as a complementary is the use of high throughput package like GreenTea (https://www.GreenTeaTech.com) to speed up the data processing. We’ve used it extensively to process huge financial data sets to speed up the process. We’ve found it quite useful and would like to share it with your reader audiences.


will not be displayed