Return-Path: skip@pobox.com
Delivery-Date: Mon Sep  9 17:31:12 2002
From: skip@pobox.com (Skip Montanaro)
Date: Mon, 9 Sep 2002 11:31:12 -0500
Subject: [Spambayes] deleting "duplicate" spam before training?  good idea or
	bad?
Message-ID: <15740.52432.861148.597750@12-248-11-90.client.attbi.com>


Because I get mail through several different email addresses, I frequently
get duplicates (or triplicates or more-plicates) of various spam messages.
In saving spam for later analysis I haven't always been careful to avoid
saving such duplicates.

I wrote a script some time ago to try an minimize the duplicates I see by
calculating a loose checksum, but I still have some duplicates.  Should I
delete the duplicates before training or not?  Would people be interested in
the script?  I'd be happy to extricate it from my local modules and check it
into CVS.

Skip