Technical Architecture of Milter API's
The sendmail program is actually composed of several parts, including programs, files, directories, and the services it provides. Its foundation is a configuration file that defines the location and behavior of these other parts and contains rules for rewriting addresses. A queue directory holds mail until it can be delivered. An aliases file allows alternative names for users and creation of mailing lists.
2.1.1 The Configuration File
The configuration file contains all the information sendmail needs to do its job. Within it you provide information, such as file locations, permissions, and modes of operation.
Rewriting rules and rule sets also appear in the configuration file. They transform a mail address into another form that may be required for delivery. They are perhaps the single most confusing aspect of the configuration file. Because the configuration file is designed to be fast for sendmail to read and parse, rules can look cryptic to humans:
R$+@$+ $:$1<@$2> focus on domain
R$+<$+@$+> $1$2<@$3> move gaze right
But what appears to be complex is really just succinct. The R
at the beginning of each line, for example, labels a rewrite rule. And the $+
expressions mean to match one or more parts of an address. With experience, such expressions (and indeed the configuration file as a whole) soon become meaningful.
2.1.2 The Queue
Not all mail messages can be delivered immediately. When delivery is delayed, sendmail must be able to save it for later transmission. The sendmail queue is a directory that holds mail until it can be delivered. A mail message may be queued:
a) When the destination machine is unreachable or down. The mail message will be delivered when the destination machine returns to service.
b) When a mail message has many recipients. Some mail messages may be successfully delivered, and others may not. Those that fail are queued for later delivery.
c) When a mail message is expensive. Expensive mail (such as mail sent over a long-distance phone line) can be queued for delivery when rates are lower.
d) When safety is of concern. The sendmail program can be configured to queue all mail messages, thus minimizing the risk of loss should the machine crash.
2.1.3 Aliases and Mailing Lists
Aliases allow mail that is sent to one address to be redirected to another address. They also allow mail to be appended to files or piped through programs, and they form the basis of mailing lists. The heart of aliasing is the aliases (5) file (often stored in database format for swifter lookups). Aliasing is also available to the individual user via a file called ‘.forward’ in the user's home directory.
2.1.4 Run sendmail by Hand
Most users do not run sendmail directly. Instead, they use one of many mail user agents (MUAs) to compose a mail message. Those programs invisibly pass the mail message to sendmail, creating the appearance of instantaneous transmission. The sendmail program then takes care of delivery in its own seemingly mysterious fashion.
Although most users don't run sendmail directly, it is perfectly legal to do so. You, like many system managers, may need to do so to track down and solve mail problems.
Here's a demonstration of one way to run sendmail by hand. First create a file named sendstuff with the following contents:
This is a one line message.
Second, mail this file to yourself with the following command line, where you
is your login name:
% /usr/lib/sendmail you <sendstuff
Here, you run sendmail directly by specifying its full pathname. [1] When you run sendmail, any command-line arguments that do not begin with a -
character are considered to be the names of the people to whom you are sending the mail message.
[1] That path may be different on your system. If so, substitute the correct pathname in all the examples that follow. For example, try looking for sendmail in /usr/sbin or /usr/ucblib.
The sendstuff
sequence causes the contents of the file that you have created (sendstuff) to be redirected into the sendmail program. The sendmail program treats everything it reads from its standard input (up to the end of the file) as the mail message to transmit.
Now view the mail message that you just sent. How you do this will vary. Many users just type mail to view their mail. Others use the mh(1) package and type inc to receive and show to view their mail. No matter how you normally view your mail, save the mail message that you just received to a file. It will look something like this:
From you@Here.US.EDU Fri Dec 13 08:11:44 1996
Received: (from you@localhost) by Here.US.EDU (8.8.4/8.8.4)
id AA04599 for you; Fri, 13 Dec 96 08:11:44 -0700
Date: Fri, 13 Dec 96 08:11:43
From: you@Here.US.EDU (Your Full Name)
Message-Id: <9631121611.AA02124@Here.US.EDU>
To: you may be Apparently-To:
This is a one line message.
The first thing to note is that this file begins with seven lines of text that were not in your original message. Those lines were added by sendmail and your local delivery program and are called the header.
The last line of the file is the original line from your sendstuff file. It is separated from the header by one blank line. The body of a mail message comes after the header and consists of everything that follows the first blank line (see Fig 1.1).
Ordinarily, when you send mail with your MUA, the MUA adds a header and feeds both the header and the body to sendmail. This time, however, you ran sendmail directly and supplied only a body; the header was added by sendmail.
id AA04599 for you; Fri, 13 Dec 96 08:11:44 -0700
Date: Fri, 13 Dec 96 08:11:43
From: you@Here.US.EDU (Your Full Name)
Message-Id: <9631121611.AA02124@Here.US.EDU>
To: you may be something else (see Section 34.8.43, NoRecipientAction)
Notice that most header lines start with a word followed by a colon. Each word tells what kind of information the rest of the line contains. There are many types of header lines that can appear in a mail message. Some are mandatory, some are optional, and some may appear many times. Those that appeared in the message that you mailed to yourself were all mandatory. That's why sendmail added them to your message. The line starting with the five characters "From
" (the fifth character is a space) is added by some programs (such as /bin/mail) but not by others (such as mh).
A Received:
line is added each time a machine receives the mail message. (If there are too
many such lines, the mail message will bounce and be returned to the
sender as failed mail.) The indented line is a continuation of the line above,
the Received:
line.
The Date:
line gives the date and time when the message was originally sent. The From:
line lists
the email address and the full name of the sender. The Message-ID:
line
is like a serial number in that it is guaranteed to uniquely identify the mail
message.
And the To:
[2] line shows a list of one or more recipients. (Multiple recipients would
be separated with commas.)
[2] Depending on how the NoRecipientAction
option was set, this could be an Apparently-To:
header, a Bcc:
header,
or even a To:
header followed by an "undisclosed-recipients:;
".
A complete list of all header lines that are of importance to sendmail is presented in Headers. The important concept here is that the header precedes, and is separate from, the body in all mail messages.
The body of a mail message consists of everything following the first blank line to the end of the file. When you sent your sendstuff file, it contained only a body. Now edit the file sendstuff and add a small header.
Subject: a test
add
add
This is a one line message.
The Subject:
header line is an optional one. The sendmail program passes it through
as is. Here, the Subject:
line is followed by a blank line and then the message text, forming a header
and a body. Note that a blank line must be truly blank. If you put space or
tab characters in it, thus forming an "empty-looking" line, the header
will not be separated from the body as intended.
Send this file to yourself again, running sendmail by hand as you did before:
% /usr/lib/sendmail you <sendstuff
Notice that our Subject:
header line was carried through without change:
From you@Here.US.EDU Fri Dec 13 08:11:44 1996
Return-Path: you@Here.US.EDU
Received: (from you@localhost) by Here.US.EDU (8.8.4/8.8.4)
id AA04599 for you; Fri, 31 Dec 96 08:11:44 -0700
Date: Fri, 13 Dec 96 08:11:43
From: you@Here.US.EDU (Your Full Name)
Message-Id: <9631121611.AA02124@Here.US.EDU>
Subject: a test
note
To: you
This is a one line message.
To handle delivery to diverse recipients, the sendmail program uses the concept of an envelope. This envelope is analogous to the physical envelopes that are used for post office mail. Imagine that you want to send two copies of a document: one to your friend in the office next to yours and one to a friend across the country:
To: friend1, friend2@remote
After you photocopy the document, you
stuff each copy into a separate envelope. You hand one envelope to a clerk,
who carries it next door and hands it to friend1
in the
next office. This is like delivery on your local machine. The clerk drops the
other copy in the slot at the corner mailbox, and the post office forwards that
envelope across the country to friend2@remote
. This is like sendmail
transporting a mail message to a remote machine.
To illustrate what an envelope is, consider one way in which sendmail might run /bin/mail, a program that performs local delivery:
deliver to friend1's mailbox
/bin/mail -d friend1 sendmail runs
the envelope recipient
Here sendmail runs /bin/mail with a -d
, which tells
/bin/mail to append the mail message to friend1's mailbox.
Information that describes the sender or recipient, but is not part of the message header, is considered envelope information. The two may or may not contain the same information (a point we'll gloss over for now). In the case of /bin/mail, the email message showed two recipients in its header:
To: friend1, friend2@remote the header
But the envelope information that is given to /bin/mail showed only one (the one appropriate to local delivery):
-d friend1 specifies the envelope
Now consider the envelope of a message transported over the network. When sending network mail, sendmail must give the remote site a list of sender and recipients separate from and before it sends the mail message (header and body). Figure 1.2 shows this in a greatly simplified conversation between the local sendmail and the remote machine's sendmail.
The local sendmail tells the remote machine's sendmail that
there is mail from you (the sender
) and for friend2@remote
.
It conveys this sender and recipient information separate from and before
it transmits the mail message that contains the header. Because this information
is conveyed separately from the message header, it is called the envelope.
There is only one recipient listed in the envelope, whereas two were listed in the message header:
To: friend1, friend2@remote
The remote machine does not need to know about the local user, friend1
,
so that bit of recipient information is excluded from the envelope.
A given mail message can be sent by using many different envelopes (like the two here), but the header will be common to them all.
The Sendmail Content Management API (Milter) provides an interface for third-party software to validate and modify messages as they pass through the mail transport system. Filters can process messages' connection (IP) information, envelope protocol elements, message headers, and/or message body contents, and modify a message's recipients, headers, and body. The MTA configuration file specifies which filters are to be applied, and in what order, allowing an administrator to combine multiple independently-developed filters.
We expect to see both vendor-supplied, configurable mail filtering applications and a multiplicity of script-like filters designed by and for MTA administrators. A certain degree of coding sophistication and domain knowledge on the part of the filter provider is assumed. This allows filters to exercise fine-grained control at the SMTP level. However, as will be seen in the example, many filtering applications can be written with relatively little protocol knowledge.
Given these expectations, the API is designed to achieve the following goals:
Milter is designed to allow a server administrator to combine third-party filters to implement a desired mail filtering policy. For example, if a site wished to scan incoming mail for viruses on several platforms, eliminate unsolicited commercial email, and append a mandated footer to selected incoming messages, the administrator could configure the MTA to filter messages first through a server based anti-virus engine, then via a large-scale spam-catching service, and finally append the desired footer if the message still met requisite criteria. Any of these filters could be added or changed independently.
Thus the site administrator, not the filter writer, controls the overall mail filtering environment. In particular, he/she must decide which filters are run, in what order they are run, and how they communicate with the MTA. These parameters, as well as the actions to be taken if a filter becomes unavailable, are selectable during MTA configuration. Further details are available later in this document.
Filters run as separate processes, outside of the sendmail address space. The benefits of this are threefold:
Each filter may communicate with multiple MTAs at the same time over local or remote connections, using multiple threads of execution. Figure 1 illustrates a possible network of communication channels between a site's filters, its MTAs, and other MTAs on the network:
Figure 2.3: A set of MTA's interacting with a set of filters.
The Milter library (libmilter) implements the communication protocol. It accepts connections from various MTAs, passes the relevant data to the filter through callbacks, then makes appropriate responses based on return codes. A filter may also send data to the MTA as a result of library calls. Figure 2 shows a single filter process processing messages from two MTAs:
Figure 2.4: A filter handling simultaneous requests from two MTA's.
Before handing control to libmilter (by calling smfi_main), a filter may call the following functions to set libmilter parameters. In particular, the filter must call smfi_register to register its callbacks. Each function will return either MI_SUCCESS or MI_FAILURE to indicate the status of the operation.
None of these functions communicate with the MTA. All alter the library's state, some of which is communicated to the MTA inside smfi_main.
Function |
Description |
Register a filter. |
|
Specify socket to use. |
|
Set timeout. |
|
Hand control to libmilter. |
The following functions may be called from within the filter-defined callbacks to access information about the current connection or message.
Function |
Description |
Return the value of a symbol. |
|
Get the private data pointer. |
|
Set the private data pointer. |
|
Set the specific reply code to be used. |
The following functions change a message's contents and attributes. They may only be called in xxfi_eom. All of these functions may invoke additional communication with the MTA. They will return either MI_SUCCESS or MI_FAILURE to indicate the status of the operation.
A filter must have set the appropriate flag (listed below) in the description passed to smfi_register to call any message modification function. Failure to do so will cause the MTA to treat a call to the function as a failure of the filter, terminating its connection.
Note that the status returned indicates only whether or not the filter's message was successfully sent to the MTA, not whether or not the MTA performed the requested operation. For example, smfi_addheader, when called with an illegal header name, will return MI_SUCCESS even though the MTA may later refuse to add the illegal header.
Function |
Description |
SMFIF_* flag |
Add a header to the message. |
SMFIF_ADDHDRS |
|
Change or delete a header. |
SMFIF_CHGHDRS |
|
Add a recipient to the envelope. |
SMFIF_ADDRCPT |
|
Delete a recipient from the envelope. |
SMFIF_DELRCPT |
|
Replace the body of the message. |
SMFIF_CHGBODY |
The filter should implement one or more of the following callbacks, which are registered via smfi_register:
Function |
Description |
connection info |
|
SMTP HELO/EHLO command |
|
envelope sender |
|
envelope recipient |
|
header |
|
end of header |
|
body block |
|
end of message |
|
message aborted |
|
connection cleanup |
The above callbacks should all return one of the following return values, having the indicated meanings. Any return other than one of the below values constitutes an error, and will cause sendmail to terminate its connection to the offending filter.
Milter distinguishes between recipient-, message-, and connection-oriented routines. Recipient-oriented callbacks may affect the processing of a single message recipient; message-oriented callbacks, a single message; connection-oriented callbacks, an entire connection (during which multiple messages may be delivered to multiple sets of recipients). xxfi_envrcpt is recipient-oriented. xxfi_connect, xxfi_helo and xxfi_close are connection-oriented. All other callbacks are message-oriented.
Return value |
Description |
SMFIS_CONTINUE |
Continue processing the current connection, message, or recipient. |
SMFIS_REJECT |
For a connection-oriented routine, reject this connection; call xxfi_close. |
SMFIS_DISCARD |
For a message- or recipient-oriented routine, accept this message,
but silently discard it. |
SMFIS_ACCEPT |
For a connection-oriented routine, accept this connection without further
filter processing; call xxfi_close. |
SMFIS_TEMPFAIL |
Return a temporary failure, i.e., the corresponding SMTP command will
return an appropriate 4xx status code. For a message-oriented routine
(except xxfi_envfrom),
fail for this message. |
In addition to its own initialization, libmilter expects a filter to initialize several parameters before calling smfi_main:
If the filter fails to initialize libmilter, or if one or more of the parameters it has passed are invalid, a subsequent call to smfi_main will fail.
The following pseudocode describes the filtering
process from the perspective of a set of N
MTA's, each corresponding to a connection. Callbacks are shown beside the processing
stages in which they are invoked; if no callbacks are defined for a particular
stage, that stage may be bypassed. Though it is not shown, processing may be
aborted at any time during a message, in which case the xxfi_abort
callback is invoked and control returns to MESSAGE
.
For each of N connections
{
For each filter
process connection/helo (xxfi_connect, xxfi_helo)
MESSAGE:For each message in this connection (sequentially)
{
For each filter
process sender (xxfi_envfrom)
For each recipient
{
For each filter
process recipient (xxfi_envrcpt)
}
For each filter
{
For each header
process header (xxfi_header)
process end of headers (xxfi_eoh)
For each body block
process this body block (xxfi_body)
process end of message (xxfi_eom)
}
}
For each filter
process end of connection (xxfi_close)
}
Note: Filters are contacted in order defined in config file.
To write a filter, a vendor supplies callbacks to process relevant parts of a message transaction. The library then controls all sequencing, threading, and protocol exchange with the MTA. Figure 3 outlines control flow for a filter process, showing where different callbacks are invoked.
SMTP Commands |
Milter Callbacks |
(open SMTP connection) |
xxfi_connect |
HELO ... |
xxfi_helo |
MAIL From: ... |
xxfi_envfrom |
RCPT To: ... |
xxfi_envrcpt |
[more RCPTs] |
[xxfi_envrcpt] |
DATA |
|
Header: ... |
xxfi_header |
[more headers] |
[xxfi_header] |
xxfi_eoh |
|
body... |
xxfi_body |
[more body...] |
[xxfi_body] |
. |
xxfi_eom |
QUIT |
xxfi_close |
(close SMTP connection) |
Figure 2.5: Milter callbacks related to an SMTP transaction.
Note that although only a single message is shown above, multiple messages may be sent in a single connection. Note also that a message and/or connection may be aborted by either the remote host or the MTA at any point during the SMTP transaction. If this occurs during a message (between the MAIL command and the final "."), the filter's xxfi_abort routine will be called. xxfi_close is called any time the connection closes.
A single filter process may handle any number of connections simultaneously. All filtering callbacks must therefore be reentrant, and use some appropriate external synchronization methods to access global data. Furthermore, since there is not a one-to-one correspondence between threads and connections (N connections mapped onto M threads, M <= N), connection-specific data must be accessed through the handles provided by the Milter library. The programmer cannot rely on library-supplied thread-specific data blocks (e.g. pthread_getspecific()) to store connection-specific data. See the API documentation for smfi_setpriv and smfi_getpriv for details.
Since filters are likely to be long-lived, and to handle many connections, proper deallocation of per-connection resources is important. The lifetime of a connection is bracketed by calls to the callbacks xxfi_connect and xxfi_close. Therefore connection-specific resources (accessed via smfi_getpriv and smfi_setpriv) may be allocated in xxfi_connect, and should be freed in xxfi_close. For further information see the discussion of message- versus connection-oriented routines. In particular, note that there is only one connection-specific data pointer per connection.
Each message is bracketed by calls to xxfi_envfrom and xxfi_eom (or xxfi_abort), implying that message-specific resources can be allocated and reclaimed in these routines. Since the messages in a connection are processed sequentially by each filter, there will be only one active message associated with a given connection and filter (and connection-private data block). These resources must still be accessed through smfi_getpriv and smfi_setpriv, and must be reclaimed in xxfi_abort.