Share this page

Technology

Breaking News
|
Entertainment
|
Sports
|
Business
|
Politics
|
Science
|
Technology
|
Odd News
|
Health
|
Law
More
- Newsletter
- Pet
- Press Release
- Voice
- Microblog

Naive Bayesian is a failure against SPAM

By Angsuman Chakraborty, Gaea News Network
Monday, March 14, 2005

I receive daily several thousands of spam emails on debt consolidation, mortgage and on organ enhancement advice/pills for organs I do not possess. I have been offered millions by unknown Nigerians, offered services I do not care about, images which were bad enough to spoil my day!

I manage to filter most of them daily using SpamBayes, a naive bayesian filter integrated closely with Outlook.
However spammers have adapted. They systemetically send spams to poison the database and they employ a wide variety of other tricks. As a result around 100 or so spams land everyday in my inbox and the nummber is steadily increasing.

The SpamBayes filter is very large and the filtering takes lot of time. Adding email to the filter also takes time.

Bayesian filter is not intelligent to realize that an email (even if it is a forwarded joke) from a CEO is not a SPAM, specially when he is a close friend ( and who can give me business ). The filter is too dumb (naive?) to be a long term solution.

By and large naive bayesian filters have lost against spammers as the sole anti-spam solution provider. What we need today is an array of filters in a convenient packaging which can be applied at will to weed out spam. We need to adapt too.

Filed under: Spam Watch, Web

Tags: Debt, Lost

Spam Watch News

Debt News

Discussion

achmad

May 23, 2008: 10:48 pm

how can I get the code of SpamBayes algorithms?

Angsuman

August 13, 2005: 8:03 pm

@Seth
Thanks for the valuable insight about using only last months email in the corpus.

Seth Woolley

August 13, 2005: 12:07 am

my spambayes is through procmail, which autofilters into Maildir folders for mutt and is retrained every night at 2AM on viewed hams and spams (unvieweds are left out) that have been touched in the last month (old messages are autoarchived after a month, actually). I set the spam threshold to 0.01% spam and mutt is able to display the individual spam score in the spam folder index.

One in maybe five thousand spams get through and I get 9 times the ham as spam. 1400 spam messages per month get filtered on average. I do have a percentage of false positives due to the low threshold, however 90% of the time I don’t want to see them anyways, the rest are from corporatey sales people (less than a dozen a month) who I mostly expect messages from already anyways, which I suspect training on my sent mail would help fix.

I really don’t see naive bayesians as a failure. training on only the last month’s email really helps as the spam corpus changes frequently and retraining nightly helps spot spam evolution by learning the latest 80-90% spam scorers (instead of 100%, which 80% of spam registers as).

angsuman

March 19, 2005: 10:45 pm

I will definitely try that. I have now reinstalled SpamBayes as a pop3 filter and retraining it from scratch. Lets see if it does better then last time.

Already it is complaining that I have too high spam to ham ratio(10-1) and that SpamBayes doesn’t give good results in this scenario.

MathiasW

March 19, 2005: 1:46 am

You should all try POPfile, available freely including perl sourcecode at SourceForge.net. It will run on Windows, Linux and maybe many other OSes having a perl interpreter available. It has POP3, SMTP, NNTP and even IMAP-support and a nice webinterface for configuration and training.

I don’t know if it will scale seamlesly to handle 5000 spams daily, but since it’s open source and and supports mySQL databases, there is a good chance that such an amout won’t be a problem at all, at least after making some modifications to it.

Personally, I’ve been using this piece of cake for about 3 three months now and while receiving about 2500-3000 spams in a period of 30 days, it achives an accuracy of 99,58%. This means, only 12 spams got through and there was only 1 false positive while 2.832 spams were blocked successfully.

Simple Thoughts

March 18, 2005: 2:34 pm

How naive bayesian classifier can be made ineffective

A discussion on a failure vector of naive bayesian classifiers…

angsuman

March 18, 2005: 1:03 am

Thanks for the ideas Glenn. I did retrain once about an year ago. Looks like its time again. What thresholds do you use?

Changing my email is unfortunately not an option because too many people, including my clients, friends etc. have it.

I have been using it for 5-6 years now, maybe more. I still get emails in the old hotmail address

glenn

March 18, 2005: 12:48 am

Sounds like you need to re-train your filter or change the threshold. I use spambayes to filter out about 1800-2500 spam per day and have been doing so since last spring. So I’m just under your levels. It does suck if you have to restart outlook all the time, but its no so bad once it is up. I used to train it on all spam that I got, but I’ve gotten a little more particular.over time.

Maybe it is time to change email addresses?

angsuman

March 17, 2005: 5:56 pm

For your amount of SPAM I would say SpamBayes is good enough. It is free.
I get anywhere between 2500-5000 spams everyday. It is just not scalable enough to handle this huge load.

mark	March 14, 2005: 8:37 am For Outlook I use Matador. It cost around $30 but was well worth it. I can tell you that I get about 300 spams over a weekend. Matador catches about 95% of that. I’m still looking for a good one that is free. -Mark

YOUR VIEW POINT

NAME :	(REQUIRED)
MAIL :	(REQUIRED) will not be displayed
WEBSITE :	(OPTIONAL)

YOUR COMMENT :
	Submit Notify me of followup comments via e-mail

RELATED NEWS

Hughes accuses Mancini of secretly stealing his job at Man City

LONDON - Manchester City boss Roberto Mancini's predecessor Mark Hughes has accused him of secret...

Chancellor Angela Merkel and NATO Secretary General Rasmussen on New START

Release Time: For Immediate Release ...

Fritzl planned to kill his dungeon family in acid bath

LONDON - Josef Frizl, who raped his daughter up to 3,000 times over a 24 year period after locking h...

Heidi Montag, Spencer Pratt to file for bankruptcy?

NEW YORK - Heidi Montag and Spencer Pratt might be back together, but the couple's financial conditi...

Montag, Pratt on the verge of bankruptcy

LONDON - Reality TV couple Heidi Montag and Spencer Pratt admit that their reckless spending has lef...

Presley 'could have saved ex husband MJ'

LONDON - Late Michael Jackson's ex-wife Lisa Marine Presley has confessed that she could have saved...

Lisa Marie Presley still haunted by MJ's death

LONDON - Singer Lisa Marie Presley believes she could have helped save ex-husband Michael Jackson's ...

Half of second-hand phones contain personal data

SYDNEY - A new research has revealed that consumers inadvertently pass on their personal data like ...

Half of second-hand phones contain personal data

SYDNEY - A new research has revealed that consumers inadvertently pass on their personal data like ...

Musharraf accuses India of supporting terrorist activities inside Pakistan

WASHINGTON - Former Pakistan President Pervez Musharraf has reportedly accused India's extern...

Older News
S	M	T	W	T	F	S
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27

Copyright© 2010 The Gaea Times