powers/Sundog-ML-Course: materials/emails/ham/01649.5bcdd9205f59d95e025a2896a38ee2bb

Return-Path: guido@python.org
Delivery-Date: Fri Sep  6 16:05:22 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 06 Sep 2002 11:05:22 -0400
Subject: [Spambayes] Deployment
In-Reply-To: Your message of "Fri, 06 Sep 2002 11:02:11 EDT."
             <3D788B92.22739.1D9E0FD1@localhost> 
References: "Your message of Fri, 06 Sep 2002 10:39:48 EDT."
	<3D788653.9143.1D8992DA@localhost>  
	<3D788B92.22739.1D9E0FD1@localhost> 
Message-ID: <200209061505.g86F5MM14762@pcp02138704pcs.reston01.va.comcast.net>

> > What's an auto-ham?
> 
> Automatically marking something as ham after a given
> timeout.. regardless of how long that timeout is, someone is going
> to forget to submit the message back as spam.

OK, here's a refinement.  Assuming very little spam comes through, we
only need to pick a small percentage of ham received as new training
ham to match the new training spam.  The program could randomly select
a sufficient number of saved non-spam msgs and ask the user to
validate this selection.  You could do this once a day or week (config
parameter).

> How many spams-as-hams can be accepted before the f-n rate gets
> unacceptable?

Config parameter.

> I view IMAP as a stop-gap measure until tighter integration with
> various email clients can be achieved.
> 
> I still feel it's better to require classification feedback from the
> recipient, rather than make any assumptions after some period of
> time passes. But this is an end-user issue and we're still at the
> algorithm stage.. ;-)

I'm trying to think about the end-user issues because I have nothing
to contribute to the algorithm at this point.  For deployment we need
both!

--Guido van Rossum (home page: http://www.python.org/~guido/)