From: Anthony Howe
Date: 2004-10-08 04:50:29 -0400
Subject: Re: HAM and milter-spamc
More information..: http://www.milter.info/#Support
April Lorenzen wrote:
> Hi Tudor,
> I wanted to feed my users outbound mail as ham to teach spam assassin also.
> We have a dedicated outbound server, which is postfix and offers a BCC
> everything option. I set that to email@example.com. Then set up crons and
> scripts to remove certain things from the resulting mailbox, then
> periodically run the learning / training command and then deleted the
Just yesterday I installed a similar setup for false-negatives on a SMTP
/ POP / IMAP / WebMail machine. I setup a IMAP #shared/Spam mailbox that
users of the machine can subscribe to and simiply move / file Spam
messages into this community folder. A cronjob then retrains SA every so
often and empties the box.
A similar thing can be done for ham.
The above solution work fine for an all-in-one mail server such as a
private user or moderate business might setup.
The problems with the above solution are:
a) SquirrelMail web interface has no direct means to subscribe to IMAP
b) Only works well with IMAP profiles. POP profiles have no solution.
c) Outlook Repress does not appear to support IMAP namespaces; Opera
(free edition) does not support IMAP namespaces, though I'm told the
paid-for version does.
>>Is it any way to use milter-spamc to feed ham to spamassassin from the
>>analysis of email messages originating from my internal users ?
No. And not likely to either. There are several problems with this:
i) Mostly importantly is the SPAMD protocol does not have a means of
retraining. Its design only for testing. This would mean invoking yet
another external Perl process, sa-learn, to retrain for ham/spam.
ii) It would probably be easier to setup two Sendmail aliases (ham/spam)
that invokes sa-learn for an inbound message. This has to be protected
from outside influences. You don't want a spammer to try and seed SA
with their message as ham before a spam run.
iii) The other issue with forwarding to a ham/spam retraining address
technique is that when you forward from Outlook Redress, the default
techinique does NOT include all the mail headers of the original
message, so many tests in SA will not be correctly applied. You have to
"teach" your users to "foward as attachment" in Outlook Suppress.
iv) Also when you forward mail from any mail client, you would have to
extract the original message attachment, essentially strip off the
forwarder's message wrapping so that you only retrain on the original
message and don't include the forwarder's address and headers.
v) Another major factor is most users are too lazy or technophobic to
bother forwarding spam to a retraining address. A postmaster that
collects spam in a junk box and wants to reproccess false-positives can
sort those messages into another mail box (using IMAP) and then use the
a script to retrain SA for ham and use contrib/resend.pl to reintroduce
the mail into the system. We did this for a year, eye-balling about 8000
message subjects every two or three days. This consumed enormous amounts
of time and the number of messages we had to go through just kept
increasing. We gave up and switched to outright rejections of spammy
mail (-r 0), prefering to bounce something to the sender to deal with.
Retraining any statistically based solution that is hosted on a
different mail server from the recipients server or workstation is very
difficult and convoluted.
One project I've been considering for sometime now would be a milter
that logs rejections and send notices at regular intervals to the
recipients so that user can at the very least know who was rejected (and
maybe the subject). Arpil Lorenzen has reminded me of the idea on many
occassions, but its non-trivial to implement because sendmail does not
report the final delivery status of a message to the milters. It has to
be deduced. Building that only into milter-spamc would only benefit
rejects by that milter. If you have multiple milters or sendmail rules
that could reject a message, then they would not be reported. The trick
is to identify all the rejects, without resorting to mail log scanning.
Bayes is nice when it works, but a real pain in the arse to keep
properly tuned either for in-duh-viduals or site wide. Either a another
techinque is required or protocol (and mail client UI change) to make
retraining a server based solution easy to do. So far Mozilla has the
best personal Bayes identifier and interface solution for
in-duh-viduals, but this doesn't help with MX based solutions.
Anthony C Howe +33 6 11 89 73 78
7116561 AIM: Sir Wumpus
"Once...we were here." - Last of The Mohicans
Copyright 2009, 2012 by SnertSoft. All rights reserved.