Kevin Spencer said:
Okay, let me start out by saying that I am a programmer, and that I
have been developing software for over a dozen years in half a dozen
languages, using virtually all of the technologies that exist until
very recently (including networking software that employs Pipes,
Sockets, TCP, UDP, FTP, HTTP, SMTP, NNTP, and one or 2 others I don't
recall).
Then you should quite familiar with the RFCs that define e-mail systems,
like SMTP, such as RFC 2822 which defines Internet Message Format (and
the headers that are contained within the *data* created by the user and
sent during the DATA command). You should even be familiar enough with
the RFCs to know where you can go look them up to verify or disprove my
claim that the headers you mention are completely OPTIONAL.
The RFCs do not define what constitutes SPAM.
Boy, did you go off on a tangent. I didn't say the RFCs define spam. I
said that they state that the To, Cc, Bcc, and Subject headers are
OPTIONAL. That is still nothing to do with spam! They are optional,
period! Their absence does NOT constitute a definition of spam (as
being unsolicited BULK mailings).
YOU are the one claiming that blank To, Cc, and Subject headers qualify
an e-mail as spam. Not true as those headers are optional. Those
headers are also not used in routing the mail but are *data* sent in the
DATA command to the SMTP server. The aggregated list of recipients is
used by the e-mail client to issue RCPT-TO commands to the SMTP server
and *those* are what get used to specify the recipients.
Wikipedia has one of the best definitions of SPAM that I know of:
And which never mentions that spam is defined by the absence of the
OPTIONAL headers for Internet messages as defined by RFC 2822. Get a
grip, buddy. YOUR definition of spam as having blank To, Cc, and
Subject headers is NOT a valid definition of spam - unless you can prove
that MANY recipients got the same message from the same source but then
it isn't the headers that defined the mail as spam but its bulk mailing
that makes it spam.
The actual routing of the email is indeed included in the message
headers. I am not referring to the headers that are immediately
visible when viewing an email in Outlook, but the actual Internet
message headers, which one can see by selecting "View|Options" when
the email is opened, and not in the preview pane. These are the
headers I copied into my post.
Nope. All those headers that are added by the sender's e-mail client
are *data*. That is why spammers can use modified or customized e-mail
clients to insert whatever headers they want and even try to insert
bogus Received headers (which will be before the prepended Received
headers added by the mail hosts). Those headers are NOT ever sent to
the mail server to route your mails. They are in the DATA command and
are not used for routing. Your e-mail client aggregates all recipients
listed in the To, Cc, and Bcc *fields* shown in your e-mail client into
a list of RCPT-TO commands that get sent to your sending mail server,
and it is THOSE commands that are received by your sending mail server
that are used to route your mails to the recipients. Notice I say
*fields* in your e-mail client because they are simply part of the UI
presented to you in which to specify the recipients, but they could be
called anything (and can also be called anything within the content sent
within the DATA command). That they happen to match the header names to
which the field values provides convenience to the user. The "To"
*field&*in the UI could've been called "Recipients" and the "Cc" *field*
could've been called "Carbon-Copied Recipients" and your e-mail client
could then produce To and Cc *headers* (that are still data) in the
content of your mail, used X-headers, or even not added any headers with
the list of those recipients - because those *headers* are optional in
the mail content.
Now, while it is true that I am a single recipient of the email, I own
my own domain, and the simple fact is that I get dozens of these a
day.
Then you do have some proof that the sender is spewing bulk mailings.
Well, at least, many of them to you, that is. My guess is that it is
some misconfigured trojan mailer daemon running on an infected host that
is leaving the header *data* blank or missing and it can't find its
payload (spam) to put into the body (the first blank line after the
header data section).
In addition, we have the simple fact that such emails are worthless.
Unless it is a malcontent or opponent that wants to nuisance you. Not
likely, however. Not all spammers are wizards, so lots of newbies
stealing the spammer's tools don't know how to use them, communicate
with their army of zombied hosts, or they are misconfigured or there is
interference on the zombied host that prevents "proper" functioning of
the mailer trojan on the infected host.
In the meantime, the question remains: Barring any logical reason to
prevent the filtering of such "empty" emails, and being the developers
of the most popular email client in the world, and having oodles of
development money and resources to develop the most popular email
client in the world, why has Microsoft not implemented this simple
filter?
Already mentioned. Define a NEGATIVE rule by using the exception
clause. A Google search would show several posts by me, MVPs, and other
regulars mentioning how to define the rule. Rather than defining a rule
that tests on a condition and commits an action on a positive result,
you define a rule that commits the action EXCEPT for a negative of the
condition. So rather than testing on a NUL string for the value of a
header (or for the absence of the header altogether), you test on the
existence of the characters that you deem constitute a non-blank string.
You could define a rule to delete all messages *except* those that have
A-Z, 0-9, and the other characters in the header(s) but Microsoft has
never permitted the use of regular expressions (so you cannot define
ranges of characters). You would end up defining a rule to "delete all
mails except those that have a, b, c, ..., x, y, z, 0, 1, ..., 8, 9"
(and probably don't need to test on non-alphanumeric characters, like $,
#, &, etc.). However, an e-mail with a subject of "lk.9dr4--TJK"
probably won't be one you want, either. You'll probably expect your
good mails to be in English, and also the header isn't all numbers, so
"delete all mails except if the <header> contains a, e, i, o, u".
I've used negative rules (where the condition is tested in the "except"
clause) for a long time to get rid of blank mails. However, the rules
are not reliable when testing against strings within the body of the
mail. That is, a rule that "deletes mail except if body has a, e, i, o,
u" might not trigger and you still end up with mails that have a blank
body, especially for HTML-formatted mails. Rules that test for strings
within the body of the message haven't been reliable in Outlook (from
2002 and before, that is).
At any rate, I will implement a Rule such as you have described, using
negative logic, and appreciate the suggestion. Still, my suggestion
remains. And my assertion that such a filter should be created
remains.
Some e-mail and NNTP clients do have the filter that you mention, where
they can test on a NUL string (but some of them will fail to trigger
that rule when the optional header isn't even there). The biggest
problem with most users is that they have no understanding of Boolean
logic so defining rules is a bit of a mystery to them. Many aren't even
aware that clauses are AND'ed and rules are OR'ed (unless the
stop-clause is used to short-circuit the logic). Defining negative
rules is even harder for them to understand (and sometimes it isn't
obvious).
In fact, I find the Junk Email filtering tools in Outlook to be
primitive and hardly acceptable overall.
Ditto. I use SpamPal. It uses DNSBLs (DNS blacklist), Bayesian (which
seems to be the crux of OL2003's junk filtering but isn't nearly as
configurable), HTML weighting, logging, and other handy anti-spam
functions.
Why must one include the '@' charactrer to indicate a domain name?
If Microsoft were to include support for PCRE, the user could specify
exactly where in the URI to match on the [sub]string. I doubt regular
expressions will ever show up in Outlook. It's too Unix-like for
Microsoft.
An email address already has one (after the user
name) to distinguish it from a mere domain name. Why can't one use
wild cards or regular expressions to block by domain names? And why
can one not specify IP addresses that are in the Internet headers and
return path (which are made difficult to find), but only in the From
header, which is the most likely (by virtue of being the easiest)
header to be faked? I could certainly understand why Microsoft might
make this sort of configuration a bit difficult to find for typical
users, but I have found after much research that it is simply
*impossible* to configure these sorts of filters in Outlook.
And why I use SpamPal. Although I have yet to need it (because the
other spam filtering works so well), it also has a RegEx plug-in to let
you define regular expressions (but not PCRE syntax) that will let you
search on strings in any header. Outlook was never geared to be a
spam-filtering e-mail client and the junk filtering is a tacked on
feature that doesn't come close to 3rd party solutions (but then it
really wasn't meant to, anymore than Paint was meant to compete with
Adobe Photoshop).
If I had been working on this software for the past 30 years, I most
certainly would have done better by now.
I know of no software produced by anyone where someone else doesn't have
complaints with it or suggestions for improvement. Um, 30 years? Just
how long do you think Outlook has been around? PCs (personal computers)
came out in 1982. I bought my first one in 1984. Forget about anything
about e-mail since that communication milieu had yet to be invented. I
forget from whom Microsoft bought Outlook or what was its original name.
According to
http://en.wikipedia.org/wiki/Microsoft_Outlook, the first
version (after Microsoft got it) was called Outlook 97 so it was
probably released sometime during or after 1996. That makes it 10
years, not 30, that Outlook has been around. As I recall, e-mail got
added to Outlook (i.e., e-mail was not part of the original PIM
product).
Outlook is NOT geared to the consumer market as a personal e-mail
client. It is oriented to the corporate customers where it is expected
that spam filtering is done upstream of the e-mail clients used by end
users on their desktop hosts. In fact, this is still the model used in
corporate environments: spam filtering *should* be performed upstream on
the corporate mail host (i.e., server-side spam filtering gets used),
and any client-side is considered optional and superfluous to a degree
but the corporate end users may want it to get rid of what leaks by the
server-side filtering, plus the end users get to customize further how
to filter or organize their mails beyond the basic or global settings
implemented against all end users of that organization. I'm not
surprised that spam filtering has been something of a Johnny Come Lately
feature in Outlook because the primary customer of Outlook is the
corporate customer.
If it weren't for me already having Office which included Outlook, I
wouldn't bother go buying Outlook just to do e-mail. I would use
Outlook Express, Pegasus, or some other *personal* e-mail program AND
also add some good anti-spam software in the mix which gave me the extra
control that personal e-mail clients rarely provide. If you want to see
a really impotent rules set, go look at Thunderbird. I use Outlook
because it has a decent rules set (not great, could be improved, be nice
to have PCRE, but still good). Outlook Express' rules set is far less
capable than for Outlook. Thunderbird's rules sucks worse than OE's,
and I consider OE's rules set to be weak. If I was to lose Outlook, I
wouldn't go to OE or Thunderbird but find something with much more
potent rules (well, as good as Outlook's or better). I hear Pegasus
Mail has decent rules but I haven't bothered trialing it yet. However,
because I have SpamPal, I'd probably first delve into its RegEx plug-in
to augment the rules set in OE.