[milters] Archive

Lists Index Date Thread Search

Article: 135
From: Anthony Howe
Date: 2004-10-08 04:50:29 -0400
Subject: Re: HAM and milter-spamc

Removal...........: milters-request@milter.info?subject=remove
More information..: http://www.milter.info/#Support
--------------------------------------------------------

April Lorenzen wrote:
> Hi Tudor,
> 
> I wanted to feed my users outbound mail as ham to teach spam assassin also.
> 
> We have a dedicated outbound server, which is postfix and offers a BCC
> everything option. I set that to ham@myinbound.com. Then set up crons and
> scripts to remove certain things from the resulting mailbox, then
> periodically run the learning / training command and then deleted the
> mailbox.

Just yesterday I installed a similar setup for false-negatives on a SMTP 
/ POP / IMAP / WebMail machine. I setup a IMAP #shared/Spam mailbox that 
users of the machine can subscribe to and simiply move / file Spam 
messages into this community folder. A cronjob then retrains SA every so 
often and empties the box.

A similar thing can be done for ham.

>  ...

The above solution work fine for an all-in-one mail server such as a 
private user or moderate business might setup.

The problems with the above solution are:

a) SquirrelMail web interface has no direct means to subscribe to IMAP 
namespaces.

b) Only works well with IMAP profiles. POP profiles have no solution.

c) Outlook Repress does not appear to support IMAP namespaces; Opera 
(free edition) does not support IMAP namespaces, though I'm told the 
paid-for version does.

>>Is it any way to use milter-spamc to feed ham to spamassassin from the
>>analysis of email messages originating from my internal users ?

No. And not likely to either. There are several problems with this:

i) Mostly importantly is the SPAMD protocol does not have a means of 
retraining. Its design only for testing. This would mean invoking yet 
another external Perl process, sa-learn, to retrain for ham/spam.

ii) It would probably be easier to setup two Sendmail aliases (ham/spam) 
that invokes sa-learn for an inbound message. This has to be protected 
from outside influences. You don't want a spammer to try and seed SA 
with their message as ham before a spam run.

iii) The other issue with forwarding to a ham/spam retraining address 
technique is that when you forward from Outlook Redress, the default 
techinique does NOT include all the mail headers of the original 
message, so many tests in SA will not be correctly applied. You have to 
"teach" your users to "foward as attachment" in Outlook Suppress.

iv) Also when you forward mail from any mail client, you would have to
extract the original message attachment, essentially strip off the 
forwarder's message wrapping so that you only retrain on the original 
message and don't include the forwarder's address and headers.

v) Another major factor is most users are too lazy or technophobic to 
bother forwarding spam to a retraining address. A postmaster that 
collects spam in a junk box and wants to reproccess false-positives can 
sort those messages into another mail box (using IMAP) and then use the 
a script to retrain SA for ham and use contrib/resend.pl to reintroduce 
the mail into the system. We did this for a year, eye-balling about 8000 
message subjects every two or three days. This consumed enormous amounts 
of time and the number of messages we had to go through just kept 
increasing. We gave up and switched to outright rejections of spammy 
mail (-r 0), prefering to bounce something to the sender to deal with.

Retraining any statistically based solution that is hosted on a 
different mail server from the recipients server or workstation is very 
difficult and convoluted.

One project I've been considering for sometime now would be a milter 
that logs rejections and send notices at regular intervals to the 
recipients so that user can at the very least know who was rejected (and 
maybe the subject). Arpil Lorenzen has reminded me of the idea on many 
occassions, but its non-trivial to implement because sendmail does not 
report the final delivery status of a message to the milters. It has to 
be deduced.  Building that only into milter-spamc would only benefit 
rejects by that milter. If you have multiple milters or sendmail rules 
that could reject a message, then they would not be reported. The trick 
is to identify all the rejects, without resorting to mail log scanning.

Bayes is nice when it works, but a real pain in the arse to keep 
properly tuned either for in-duh-viduals or site wide. Either a another 
techinque is required or protocol (and mail client UI change) to make 
retraining a server based solution easy to do. So far Mozilla has the 
best personal Bayes identifier and interface solution for 
in-duh-viduals, but this doesn't help with MX based solutions.

-- 
Anthony C Howe                                 +33 6 11 89 73 78
http://www.snert.com/       ICQ:
7116561         AIM: Sir Wumpus

            "Once...we were here."  - Last of The Mohicans


Lists Index Date Thread Search