[milters] Archive

Lists Index Date Thread Search

Article: 670
From: Anthony Howe
Date: 2005-07-29 07:49:38 -0400
Subject: Re: White list pattern matching...

Removal...........: milters-request@milter.info?subject=remove
More information..: http://www.milter.info/#Support
--------------------------------------------------------

Taylor, Grant wrote:
>> Two possible methods:
>> 
>> a) Create a separate file containing a list of regex/value to test.
>>  Means restarting the milter when this file changes to reload
>> patterns into memory. I dislike this solution for some reason that
>> escapes me now.
> 
> 
> This does sound like a kludge.  Seeing as how the patterns that I
> would have to match are a series of numeric strings seven, five, and
> ten digits long and very unpredictable it would be VERY inefficient
> to try to cover all possible permutations in the included file.  Thus
> this solution is really not much of one at all.
> 
> 
>> b) More of a kludge, but keeps everything central in access.db. I
>> have considered a special lookup something like, where the right
>> hand side would be handled specially:
>> 
>> milter-NAME-from:domain.com /pattern/action /pattern/action ...
>> [default-action]
>> 
>> or maybe
>> 
>> milter-NAME-TAG:sender@ /pattern/action /pattern/action ...
>> [default-action]
>> 
>> For example:
>> 
>> milter-sender-from:yahoogroups.com /goof-[0-9]+-ball@/REJECT
>> /tosser-[a-z0-9]+-user@/REJECT OK
>> 
>> milter-sender-connect:192.168		/.0.[0-9]+$/OK REJECT
> 
> 
> This may be the ""kludge that I'm needing.  I would think that you
> would want to implement something like this:
> 
>
milter-date-From:/sender-[0-9]+-[0-9]+-[0-9]+-gtaylor=riverviewtech.net@returns.groups.yahoo.com/OK
> REJECT
> 
> Or possibly something more like:
> 
>
milter-date-RegEx-From:sender-[0-9]+-[0-9]+-[0-9]+-gtaylor=riverviewtech.net@returns.groups.yahoo.com
> OK

No. You can NOT specify a regex on the left-hand-side (the key). As I 
said before:

	Using key/value pairs there is no way to specify any form of
	wildcard matching in the key. Its simply not possible to derive
	a key to lookup with from the email address. You need to know
	the patterns before the keys can be generated [and then the
	access.db lookup for the corresponding value].

Just think about it. How do you perform the access.db lookup using a 
regex on the LHS, eh? If it were possible Claus would have implemented 
it in Sendmail already.

> This way you would have a different key in the AccessDB to indicate
> that you needed to run a regular expression / pattern match on the
> user-address / user part / domain part of the email address.  I think

Won't work. The regex is a value you need to find first based on a
domain and/or local-part, which is then tested and the action applied.

> I would also not have a default action on the filter, just what would
> be expected as the action to take if this pattern did match like the
> rest of your present AccessDB entries.

With my proposed syntax, you need the optional default action, because 
you still need to support the old syntax:

	milter-NAME-TAG:KEY       ACTION
	
would become

	milter-NAME-TAG:KEY	  /PATTERN/ACTION ... DEFAULT-ACTION

> Another option that you might consider doing would be to implement
> something similar to milter-cli to allow all milters (milter-date in
> this case) to call a command line to decide whether or not to process
> the email that is coming through.  If you had something similar to
> "milter-date-cli-From:<milter-cli parameters> OK" where I could write
> an external program to return the results to indicate whether or not
> milter-date needed to process the email or not would be great.  This
> would allow me to augment milter-date (and others) to handle the
> special addresses.

Ugh. That gets even to be more of a Kludge (with a capital K).

>> Essentially I could implement some form of wildcard matching
>> (either regex or simple glob-like) by selecting a pattern set based
>> on one of the existing key lookups for address-local-part,
>> address-domain, client-ip, client-domain.
> 
> 
> For what it is worth I would rather see regular expression support
> verses simple (f)grep style pattern matching.  However having said
> that (f)grep pattern matching would be better than nothing and I
> don't think it would be to difficult to implement (verses regexs).

Well a simple glob, generic wild card like asterisk (*) and maybe single 
character wild card question-mark (?) is easy to implement and faster to 
process, and does introduce new library dependencies and version issues 
(regex vs pcre).

This would probably be my first choice at an immediate solution to test 
the syntax and proof of concept and/or usefulness.

For example I could support both methods glob & regex

   milter-NAME-TAG:KEY  /REGEX/ACTION !GLOB!ACTION ... DEFAULT-ACTION

by simply using a different pattern delimiter.	

>> The actual syntax I've not really considereded in depth, but it has
>> been a rough idea I've been toying with. However, there would be a
>>  performance overhead in doing this and one of the reasons I've put
>> the idea off.
> 
> 
> I think having a different tag that the milters would look for in the
> AccessDB would help reduce the overhead in such that it would not be
> called for any emails that did not specifically need it.

I suggest you turn on

	-v database

debugging and look at how many BDB lookups are done for each tag based 
on domain, IP, and email address (it would also give you a better idea 
how the lookups work and why you cannot have a regex for the LHS). There 
is A LOT, so adding a new tag doesn't help with generating the lookup 
key in order to find a value and it would just added more overhead.

-- 
Anthony C Howe                                 +33 6 11 89 73 78
http://www.snert.com/       ICQ:
7116561         AIM: Sir Wumpus

Sendmail Anti-Spam Solutions           http://www.snertsoft.com/
                                             We Serve Your Server

Lists Index Date Thread Search