[milters] Archive

Lists Index Date Thread Search

Article: 1352
From: Anthony Howe
Date: 2006-12-05 17:25:57 -0500
Subject: Re: new milter idea: milter-random-named-file

Removal...........: milters-request@milter.info?subject=remove
More information..: http://www.milter.info/#Support

Michael Elliott wrote:
> 1) Message Body filter, so only accept or 550 reject possible

As mentioned tagging is an option.

> 2) a) if there is an attachment, 
>    b) and the file attachment type is an image, 
>    c) compute the file_length and crc32 of the image.
>    Note: a erc32 of the mime attachment data segment is sufficient 
>    and does not require mime decoding.  Skip the mime headers, as
>    they are randomized.
> 3) a) Store filename:file_length:crc32 in a database with timestamp of 
>       first encounter, last occurance, and number of occurances as the 
>       data value.

This is problematic. Conside that most pump & dump image based spam uses 
a CID: there are no filenames to speak off, and the cid: will probably 
change between runs or every message even (you don't care about CPU when 
you're using someone else's computer).

So I would try to express your logic without requiring two different 
cache entries, just one. For the crc32 I would use MD5 as collisions are 
less likely than say the POSIX crc32 formula.

Matter of fact you could just replace the filename with the sender 
address used. Bet that gets changed more often.

There might be some use too in noting the recipients, particular if a 
message with image spam is addressed to non existent recipients, cause 
then that's almost a sure sign its spam. If the same sender sends a 
message to N different recipients in different domains, odds are it will 
be spam.

> 1) skip_multiple_images: skip message if more than one image exists as
>    an attachment.  This stops the filling of pics to grandma into the 
>    database.  optional on or off, and not expected to really affect the 
>    usefulness of the filter.  As the enemy gets smarter, we will have to
>    turn this off as sysadmin.

You don't really want this. Pump & dump image spam this summer started 
slicing the image into multiple parts and displaying them in a table or 
with CSS to defeat OCR methods in SA at the time.

I think you only want to skip attached images only if the message has no 
text/html part or they are not a cid:. Then again its very unlikely that 
grandma will get 10x the same picture.

If you combined this method with the SpamAssassin ImageInfo technique 
that reads the image headers to extract image dimensions and compute 
surface area coverage. That plugin excludes screen/image sizes that are 
commonly generated by digital cameras (320,240) (640, 480) (800,600) 
(1024,768) ...

> 2) crc_threshold: how many times to we have to see the message in 3b without
>    a 3a entry before we start rejecting email.  5 default
> 3) crc_vs_filename_ratio: 3b/3a how many filenames do we have to see
>    the image as before we start rejecting email.  5 default

Again, most image spam has no filename, just cid: references.

Now if you keyed 3a on { sender, length, md5 } and 3b on { length, md5 } 
you could probably even lower this number to two. Of course there is 
always those silly chain letters that people insist on passing around, 
bug I wouldn't mind killing those off ;-)

> 4) gc_3a_flush_stale: When to flush old entries based on last occurance 
>    in 3a entries.  suggest 3 weeks
> 5) gc_3b_flush_stale: When to flush old entries based on last occurance 
>    in 3b entries.  suggest 3 weeks, I generally see the same file for about
>    a week.

Sounds excessive to me. Most runs will change within days, so you 
probably don't need more than a week's retention.

> 6) reject_message: "550: The message was rejected bause it contains the image 
>    %filename which has been seen with too many different filenames."
> This idea should work to kill a lot of what is floating around today, and
> will be defeated when the enemy starts using image creation tools to 
> modify the outgoing image for every message.  Today, while it is possible,

I think they have already started to do this with botnets, since they 
don't need to worry about the cpu usage of a zombie machine, only that 
the messages go out.

> they are not spending that much cpu power to generate their message.
> They are randomizing text sections currently, but not the image files other

I've seen image spam that plays with similar colours, streaks, random 
dots, fuzzy backgrounds, etc. in order to defeat OCR.

> than filename.  I admit this will only give us 6 months of use in the 
> arms war, but it would be an effective weapon.  It will also pick off the 
> occasional virus is you leave the attachment types a little more open.

Actually for picking off viruses it might be more useful long term, 
since you can reduce your dependence and load on signature based scanners.

Anthony C Howe          Skype: SirWumpus                    SnertSoft
+33 6 11 89 73 78         AIM: SirWumpus    Sendmail Milter Solutions
http://www.snert.com/     ICQ: 7116561

Lists Index Date Thread Search