[milters] Archive

Lists Index Date Thread Search

Article: 1348
From: Michael Elliott
Date: 2006-12-04 17:33:46 -0500
Subject: new milter idea: milter-random-named-file

Removal...........: milters-request@milter.info?subject=remove
More information..: http://www.milter.info/#Support

Hello Anthony.  I have a request for a new milter to put on your 
"whenever I get to it" pile.  Just let me know if that would January
or September.  ;-)  It is intended to go after the pump-and-dump 
stock scams.  That is the only real class of problem email I am having
today at the ISP level.  This will allow the first few messages through
as it auto learns.  Then it will block the rest of the garbage.

The specs would be the following:

1) Message Body filter, so only accept or 550 reject possible

2) a) if there is an attachment, 
   b) and the file attachment type is an image, 
   c) compute the file_length and crc32 of the image.
   Note: a erc32 of the mime attachment data segment is sufficient 
   and does not require mime decoding.  Skip the mime headers, as
   they are randomized.

3) a) Store filename:file_length:crc32 in a database with timestamp of 
      first encounter, last occurance, and number of occurances as the 
      data value.
   b) Store file_length:crc32 in the database with 
      first_occurance:last_occurance:number as a second database entry. 
      We are omitting the filename here so we can count totals views 
      verses filenames.

4) update 3a and 3b database entries.

Then, as each new email comes in.  If we get a match on 3a, and get
a match on 3b, compare the number of occurances for each.  If they are 
the same, or within an order of magnitude, PASS the message.  because
it is most likely a footer image from the likes of yahoo or msn.

If we do not get a match on 3a, but do get a database lookup match on 3b,
perform 4, and 550 Fail the message if it is over a predefinded threshold, 
say 10.

If we get a match on 3a that has a low count number, and a match on 3b
with a high count number, on the order of 10x, then 550 Fail the message,
as the filename was repeated, but the source file was used as at least
10 different names.

No match in 3a or 3b, pass the message.  Consider that a new, clean,
one time picture to grandma.

config options: 
1) skip_multiple_images: skip message if more than one image exists as
   an attachment.  This stops the filling of pics to grandma into the 
   database.  optional on or off, and not expected to really affect the 
   usefulness of the filter.  As the enemy gets smarter, we will have to
   turn this off as sysadmin.

2) crc_threshold: how many times to we have to see the message in 3b without
   a 3a entry before we start rejecting email.  5 default

3) crc_vs_filename_ratio: 3b/3a how many filenames do we have to see
   the image as before we start rejecting email.  5 default

4) gc_3a_flush_stale: When to flush old entries based on last occurance 
   in 3a entries.  suggest 3 weeks

5) gc_3b_flush_stale: When to flush old entries based on last occurance 
   in 3b entries.  suggest 3 weeks, I generally see the same file for about
   a week.

6) reject_message: "550: The message was rejected bause it contains the image 
   %filename which has been seen with too many different filenames."

This idea should work to kill a lot of what is floating around today, and
will be defeated when the enemy starts using image creation tools to 
modify the outgoing image for every message.  Today, while it is possible,
they are not spending that much cpu power to generate their message.
They are randomizing text sections currently, but not the image files other
than filename.  I admit this will only give us 6 months of use in the 
arms war, but it would be an effective weapon.  It will also pick off the 
occasional virus is you leave the attachment types a little more open.

-Mike Elliott

Lists Index Date Thread Search