Emerson Development LLC
A White Paper on "The NotaryÔ" Spam Solution
A Simple Enhancement To The Email Protocols That Stops Junk Mail At The Source
The Emerson Development Company is pleased to announce "The NotaryÔ", the first real solution to the problem of junk email. The basis for "The NotaryÔ" is that there is no validation for information passing from server to server in the current Simple Mail Transfer Protocol (SMTP) mail transfer process. Spammer's exploit this weakness to falsify the sender information -- virtually every piece of spam has a forged sender email address. Similar to having a document notarized, our patent-pending solution introduces a new SMTP step in which the receiving server asks the email server of the user shown as the sender to affirm that the user actually sent the message. If a spammer has forged the sender's email address, the message will be revealed as fraudulent and will be rejected. While some approaches to stop spam are extremely complex, our protocol enhancement is an elegant solution to the fundamental weakness in the email protocols that is straightforward and simple to implement -- it adds one new SMTP command and one step in the process of receiving email by a server.
The Notary™ will be implemented by vendors of email server software, such as Microsoft, IBM, Oracle, Sendmail, Qmail, and many others. Email operators such as ISP's and corporations will obtain the new features by upgrading their email server software to a new version. End user software does not need to be changed – implementation of The Notary™ is transparent to end users.
The Scope of the Problem
Reliable sources such as IDC estimate email traffic at about 30 billion messages per day, and rising rapidly. Both MSN Hotmail and AOL are reported to have stopped over 2 billion junk mail messages in a single day. Insiders at ATT.net, one of the largest ISP's, tally spam at 70% of all incoming mail, and that is only for the percentage that they are able to stop. Based on ATT.net's number, it is reasonable to estimate that the actual percentage of spam is more like 80% or 90%. This, in turn, suggests that spam traffic is about 25 billion or more messages per day. While the actual amount or percentage can certainly be a point of discussion, we can all agree that it is a problem of large scale.
IDC has estimated that spam costs the country $9 billion per year in software (for spam blockers), hardware (for all the extra servers to handle the load, plus to process the spam blocking software), and manpower (helpdesks, technicians, etc.). IDC estimates that this total cost breaks down to about $2-3 per month per user account. And, while this dollar amount per user also can be a point of discussion, it puts an order of magnitude to the problem.
A critical point for this discussion is that any solution to the problem of spam must not just block junk mail from delivery, it must prevent junk mail from being sent. If not, it still costs industry $9 billion a year to process.
Two Types of Junk Mail
Understanding the solution to spam requires understanding how spammers work, and the characteristics of spam. To begin, it is important to understand that there are two kinds of junk email, and to note the distinctions between them:
- SPAM -- Email with falsified sender information: Spammers disguise their identity using forged email addresses and other tricks as discussed in this document. Spammers frequently lie about the recipient "opting-in" to receive mailings. Spam constitutes more than 99% of junk email, and is solved by The Notary™.
- Email that is legitimate, but undesired by some recipients: Sometimes called UCE (Unsolicited Commercial Email) or UBE (Unsolicited Bulk Email), this type of email is characterized by a sender who represents themselves as who they really are (e.g. CitiBank wants you to have a new Visa card). Because UCE uses the sender's real email address, most UCE is benign, although the sender may lie about the recipient opting-in. UCE, less than one percent of junk mail, is manageable by conventional approaches. For example: (i) a recipient could simply click on "remove" to be deleted from the mailing list; and (ii) since the sender uses their real email address it is simple to blacklist them.
Understanding The Spammer
Most spammers are individuals or small organizations trying to make fast money. Spammer's have a single focus – they want to drive traffic to a web site. Directly or indirectly, that web site will make money. Many of these web sites are e-commerce sites selling products, but some of them make money in other ways, such as by collecting a referral fee for initiating credit card or loan applications. Sometimes the spammer is the web site owner/operator, although frequently the spammer works for a commission based on the amount of traffic they generate.
Regardless of the size of the spammer organization, they have to have an ISP account like everyone else. There are exceptions, but most ISP's will shut off a spammer's account as soon as they learn of spamming activities (which is usually from complaints). Spammers use off-the-shelf bulk email software that emulates an email server. This software enables the spammer to connect directly to the recipient's email server, and to by-pass their own ISP's email server in the process. Bypassing the ISP enables them to avoid detection by the ISP, and to disguise themselves by using false identities and email headers.
Spammers buy email addresses from businesses and individuals who are in the business of collecting them. They buy your information from Internet organizations willing to sell it, and use special software that acts like a browser to read every web site, forum, and chat room it can find, searching for email addresses. Spammers can buy a million email addresses for a couple hundred dollars, and can get as many as 20 million on a CD.
Email Headers
Email headers is a complex subject that we can treat with a broad brush. Generally, discussions of email headers include the following:
· “To:” (pseudo-header, supplied by sender)
· “From:” (pseudo-header, supplied by sender)
· “Subject:” (supplied by sender)
· “DATE:” (date/time to the second, normally assigned by the sending email server)
· “Message-ID:” (unique identifier normally assigned by the sending email server, may include date/time, a serial number, plus other information)
· “Received: from” (sometimes added by a server when it receives an email)
Most email client software (e.g., Microsoft Outlook, Qualcomm Eudora) have menu selections allowing the user to read the headers of received mail. There is a complex relationship between email headers and the email protocol transaction between two email servers – for purposes of this document it is unnecessary to explore this topic in depth. There are only three important points to be made here; 1) spammers can and do forge all of these headers; 2) the Message-ID is created by the sending email server to uniquely identify each email; and 3) email servers keep logs of sent mail which include at least the To, From, and Message-ID.
The last of these points should be indelible: when you send an email, your email server assigns a serial number to it and records that serial number along with your email address and the email address of the person you are sending it to.
When a spammer sends email they make up these headers (except, obviously, the "To"). The Notary™ process works because if actually sent you an email about a sex site, his email server would have a record of it. In The Notary™ process, the receiving server passes a copy of the headers to Bob's email server, and asks if Bob actually sent that message. Since we know that spammer's lie about the sender's name to disguise their identity, we know that the notarizing challenge will reveal the email to be fraudulent. Bob's email server will have no record of that email.
Basics of SMTP (Simple Mail Transfer Protocol)
The following table explores the five basic SMTP commands used between a sending and receiving server.
Sending System / Receiving System(opens TCP/IP connection) / 220 ("service ready", standard numerical response to connection request)
HELO – I'm imadeitup.com / 250 Welcome imadeitup.com
MAIL From: < / 250 Sender
RCPT To: < / 250 Recipient ok
DATA
354 Give it to mefake headers & the email itself / 250 Got it. ok
QUIT
/ 221 (closing connection) GoodbyeThe above portrays a simple, stylized example. The transaction uses five SMTP commands (by convention each is four characters, upper case) which are issued by the sending system. The receiving system responds with numerical progress, acceptance, or failure codes. Standard positive reply codes are shown for simplicity. Normally a textual response, such as those shown, accompanies each numerical code.
- HELO (or the enhanced version EHLO): Sending system initiates the conversation with this command, followed by its own domain name. Receiving system echoes that domain name.
- MAIL From: Sending system supplies the email address of the sender in the standard format of sender @ domain name. Receiving system echoes the sender email address.
- RCPT To: Sending system supplies the recipient's email address, echoed by the receiving system.
- DATA: Sending system issues the DATA command. The receiving system responds with a numerical code indicating "go ahead".
Sending system then sends the message headers and the message itself. The message is terminated by a period on a line by itself. After receiving the termination period, the receiving system processes the received mail. If the mail is accepted for delivery to the recipient, the receiving system acknowledges "ok" with a reply code of 250.
- QUIT: Sending system issues the QUIT command, receiving system responds with a numerical code (which could accept or reject the message)
SMTP Fundamental Weakness – No Validation
The reason spammers can send email with falsified sender information is because there is no validation in the email protocol for any of the information in the SMTP transaction. As shown above, the spammer identifies itself as having domain name "imadeitup.com". The only verification available to a receiving server is to do a DNS (Domain Name System) query to see if the supplied domain name is registered and assigned to a server. There are over 40 million working domains, so spammers have an endless choice of domain names to use. As long as the spammer supplies a real domain name, the receiving system is none the wiser.
A couple of years ago spammers would use a valid domain name, and then make up a user name at that domain. To counter this, there was a period of time when spam blocking attempted to verify that the sender name was real using the SMTP VRFY command, which one server can send to another to attempt to verify a user's email address ("please verify if this email address is valid"). But spammers figured this out and used VRFY to purify their lists, so now few servers accept a VRFY request.
The routine practice for spammers now is to first verify the domains in their email list by compiling a list of all domains in the list, and then performing DNS queries on each domain. User email addresses from non-existent domains are deleted from the list. There is no attempt to verify each individual email account from the list. Next, spammers begin to send emails, using the next email address in the list as the sender name for the current email. Using this round-robin approach, every email address has a different sender name, and every sender name is from a valid domain. A receiving server can only validate the domain name, which is always real. This technique also foils the spam-blocking technique of counting the number of emails sent by the same user since each email is shown as coming from a different person.
The Standard SMTP Email Process
The following illustration depicts the standard email process. In this arrangement, our typical user, MsInnocent, has an Internet account with ISP niceguys.com. MsInnocent uses a standard desktop PC email program such as Microsoft Outlook to create an email addressed to the recipient, Preteen, at domain helpless.com, which is Preteen's ISP. MsInnocent clicks on the send button, and her email program at step 2. sends it to her ISP's email server. This step uses a variation of SMTP referred to as POP3 (Post Office Protocol) which requires a login and password.
There are two other client software arrangements in use today. One of them, MAPI (Messaging Application Programming Interface), maintains all of the user email files and folders on the email server rather than on the PC. Microsoft Exchange is such an email arrangement. The third arrangement is web mail, in which the user interacts with a browser application that enables the user to create, send, read, and delete messages. In all three of these arrangements, the basic functionality is the same, which is to a) enable users to get their mail to and from the email server, and b) for email servers to send and receive email from each other.