Technical Architecture of Milter API's

2.1 Sendmail

The sendmail program is actually composed of several parts, including programs, files, directories, and the services it provides. Its foundation is a configuration file that defines the location and behavior of these other parts and contains rules for rewriting addresses. A queue directory holds mail until it can be delivered. An aliases file allows alternative names for users and creation of mailing lists.

2.1.1 The Configuration File

The configuration file contains all the information sendmail needs to do its job. Within it you provide information, such as file locations, permissions, and modes of operation.

Rewriting rules and rule sets also appear in the configuration file. They transform a mail address into another form that may be required for delivery. They are perhaps the single most confusing aspect of the configuration file. Because the configuration file is designed to be fast for sendmail to read and parse, rules can look cryptic to humans:

R$+@$+          $:$1<@$2>      focus on domain

R$+<$+@$+>      $1$2<@$3>      move gaze right

But what appears to be complex is really just succinct. The R at the beginning of each line, for example, labels a rewrite rule. And the $+ expressions mean to match one or more parts of an address. With experience, such expressions (and indeed the configuration file as a whole) soon become meaningful.

2.1.2 The Queue

            Not all mail messages can be delivered immediately. When delivery is delayed, sendmail must be able to save it for later transmission. The sendmail queue is a directory that holds mail until it can be delivered. A mail message may be queued:

a)      When the destination machine is unreachable or down. The mail message will be delivered when the destination machine returns to service.

b)      When a mail message has many recipients. Some mail messages may be successfully delivered, and others may not. Those that fail are queued for later delivery.

c)      When a mail message is expensive. Expensive mail (such as mail sent over a long-distance phone line) can be queued for delivery when rates are lower.

d)    When safety is of concern. The sendmail program can be configured to queue all mail messages, thus minimizing the risk of loss should the machine crash.

2.1.3 Aliases and Mailing Lists

            Aliases allow mail that is sent to one address to be redirected to another address. They also allow mail to be appended to files or piped through programs, and they form the basis of mailing lists. The heart of aliasing is the aliases (5) file (often stored in database format for swifter lookups). Aliasing is also available to the individual user via a file called ‘.forward’ in the user's home directory.

2.1.4 Run sendmail by Hand

Most users do not run sendmail directly. Instead, they use one of many mail user agents (MUAs) to compose a mail message. Those programs invisibly pass the mail message to sendmail, creating the appearance of instantaneous transmission. The sendmail program then takes care of delivery in its own seemingly mysterious fashion.

Although most users don't run sendmail directly, it is perfectly legal to do so. You, like many system managers, may need to do so to track down and solve mail problems.

            Here's a demonstration of one way to run sendmail by hand. First create a file named sendstuff with the following contents:

This is a one line message.

Second, mail this file to yourself with the following command line, where you is your login name:

% /usr/lib/sendmail you <sendstuff

Here, you run sendmail directly by specifying its full pathname. [1] When you run sendmail, any command-line arguments that do not begin with a - character are considered to be the names of the people to whom you are sending the mail message.

[1] That path may be different on your system. If so, substitute the correct pathname in all the examples that follow. For example, try looking for sendmail in /usr/sbin or /usr/ucblib.

The sendstuff sequence causes the contents of the file that you have created (sendstuff) to be redirected into the sendmail program. The sendmail program treats everything it reads from its standard input (up to the end of the file) as the mail message to transmit.

Now view the mail message that you just sent. How you do this will vary. Many users just type mail to view their mail. Others use the mh(1) package and type inc to receive and show to view their mail. No matter how you normally view your mail, save the mail message that you just received to a file. It will look something like this:

From you@Here.US.EDU  Fri Dec 13 08:11:44 1996

Received: (from you@localhost) by Here.US.EDU (8.8.4/8.8.4)

       id AA04599 for you; Fri, 13 Dec 96 08:11:44 -0700

Date: Fri, 13 Dec 96 08:11:43

From: you@Here.US.EDU (Your Full Name)

Message-Id: <9631121611.AA02124@Here.US.EDU>

To: you                                                    may be Apparently-To:

This is a one line message.

The first thing to note is that this file begins with seven lines of text that were not in your original message. Those lines were added by sendmail and your local delivery program and are called the header.

The last line of the file is the original line from your sendstuff file. It is separated from the header by one blank line. The body of a mail message comes after the header and consists of everything that follows the first blank line (see Fig 1.1).

Ordinarily, when you send mail with your MUA, the MUA adds a header and feeds both the header and the body to sendmail. This time, however, you ran sendmail directly and supplied only a body; the header was added by sendmail.

Figure 2.1: Every mail message is composed of a header and a body

Figure 1.1

2.1.5 The Header

Let's examine the header in more detail.

From you@Here.US.EDU Fri Dec 13 08:11:44 1996

Received: (from you@localhost) by Here.US.EDU (8.8.4/8.8.4)

       id AA04599 for you; Fri, 13 Dec 96 08:11:44 -0700

Date: Fri, 13 Dec 96 08:11:43

From: you@Here.US.EDU (Your Full Name)

Message-Id: <9631121611.AA02124@Here.US.EDU>

To: you                                            may be something else (see Section 34.8.43, NoRecipientAction)

Notice that most header lines start with a word followed by a colon. Each word tells what kind of information the rest of the line contains. There are many types of header lines that can appear in a mail message. Some are mandatory, some are optional, and some may appear many times. Those that appeared in the message that you mailed to yourself were all mandatory. That's why sendmail added them to your message. The line starting with the five characters "From " (the fifth character is a space) is added by some programs (such as /bin/mail) but not by others (such as mh).

A Received: line is added each time a machine receives the mail message. (If there are too many such lines, the mail message will bounce and be returned to the sender as failed mail.) The indented line is a continuation of the line above, the Received: line.

The Date: line gives the date and time when the message was originally sent. The From: line lists the email address and the full name of the sender. The Message-ID: line is like a serial number in that it is guaranteed to uniquely identify the mail message.

And the To: [2] line shows a list of one or more recipients. (Multiple recipients would be separated with commas.)

[2] Depending on how the NoRecipientAction option was set, this could be an Apparently-To: header, a Bcc: header, or even a To: header followed by an "undisclosed-recipients:;".

A complete list of all header lines that are of importance to sendmail is presented in Headers. The important concept here is that the header precedes, and is separate from, the body in all mail messages.

2.1.6 The Body

The body of a mail message consists of everything following the first blank line to the end of the file. When you sent your sendstuff file, it contained only a body. Now edit the file sendstuff and add a small header.

Subject: a test              add

add

This is a one line message.

The Subject: header line is an optional one. The sendmail program passes it through as is. Here, the Subject: line is followed by a blank line and then the message text, forming a header and a body. Note that a blank line must be truly blank. If you put space or tab characters in it, thus forming an "empty-looking" line, the header will not be separated from the body as intended.

Send this file to yourself again, running sendmail by hand as you did before:

% /usr/lib/sendmail you <sendstuff

Notice that our Subject: header line was carried through without change:

From you@Here.US.EDU  Fri Dec 13 08:11:44 1996

Return-Path: you@Here.US.EDU

Received: (from you@localhost) by Here.US.EDU (8.8.4/8.8.4)

       id AA04599 for you; Fri, 31 Dec 96 08:11:44 -0700

Date: Fri, 13 Dec 96 08:11:43

From: you@Here.US.EDU (Your Full Name)

Message-Id: <9631121611.AA02124@Here.US.EDU>

Subject: a test                                  note

To: you

This is a one line message.

2.1.7 The Envelope

To handle delivery to diverse recipients, the sendmail program uses the concept of an envelope. This envelope is analogous to the physical envelopes that are used for post office mail. Imagine that you want to send two copies of a document: one to your friend in the office next to yours and one to a friend across the country:

To: friend1, friend2@remote

After you photocopy the document, you stuff each copy into a separate envelope. You hand one envelope to a clerk, who carries it next door and hands it to friend1 in the next office. This is like delivery on your local machine. The clerk drops the other copy in the slot at the corner mailbox, and the post office forwards that envelope across the country to friend2@remote. This is like sendmail transporting a mail message to a remote machine.

To illustrate what an envelope is, consider one way in which sendmail might run /bin/mail, a program that performs local delivery:

          deliver to friend1's mailbox

/bin/mail -d friend1          sendmail runs

                the envelope recipient

Here sendmail runs /bin/mail with a -d, which tells /bin/mail to append the mail message to friend1's mailbox.

Information that describes the sender or recipient, but is not part of the message header, is considered envelope information. The two may or may not contain the same information (a point we'll gloss over for now). In the case of /bin/mail, the email message showed two recipients in its header:

To: friend1, friend2@remote         the header

But the envelope information that is given to /bin/mail showed only one (the one appropriate to local delivery):

-d friend1                     specifies the envelope

Now consider the envelope of a message transported over the network. When sending network mail, sendmail must give the remote site a list of sender and recipients separate from and before it sends the mail message (header and body). Figure 1.2 shows this in a greatly simplified conversation between the local sendmail and the remote machine's sendmail.

Figure 2.2: A simplified conversation

Figure 1.2

The local sendmail tells the remote machine's sendmail that there is mail from you (the sender) and for friend2@remote. It conveys this sender and recipient information separate from and before it transmits the mail message that contains the header. Because this information is conveyed separately from the message header, it is called the envelope.

There is only one recipient listed in the envelope, whereas two were listed in the message header:

To: friend1, friend2@remote

The remote machine does not need to know about the local user, friend1, so that bit of recipient information is excluded from the envelope.

A given mail message can be sent by using many different envelopes (like the two here), but the header will be common to them all.

2.2 Milter Architecture

Design Goals
Implementing Filtering Policies
MTA - Filter Communication

2.2.1 Goals

The Sendmail Content Management API (Milter) provides an interface for third-party software to validate and modify messages as they pass through the mail transport system. Filters can process messages' connection (IP) information, envelope protocol elements, message headers, and/or message body contents, and modify a message's recipients, headers, and body. The MTA configuration file specifies which filters are to be applied, and in what order, allowing an administrator to combine multiple independently-developed filters.

We expect to see both vendor-supplied, configurable mail filtering applications and a multiplicity of script-like filters designed by and for MTA administrators. A certain degree of coding sophistication and domain knowledge on the part of the filter provider is assumed. This allows filters to exercise fine-grained control at the SMTP level. However, as will be seen in the example, many filtering applications can be written with relatively little protocol knowledge.

Given these expectations, the API is designed to achieve the following goals:

Safety/security. Filter processes should not need to run as root (of course, they can if required, but that is a local issue); this will simplify coding and limit the impact of security flaws in the filter program.
Reliability. Coding failures in a Milter process that cause that process to hang or core-dump should not stop mail delivery. Faced with such a failure, sendmail should use a default mechanism, either behaving as if the filter were not present or as if a required resource were unavailable. The latter failure mode will generally have sendmail return a 4xx SMTP code (although in later phases of the SMTP protocol it may cause the mail to be queued for later processing).
Simplicity. The API should make implementation of a new filter no more difficult than absolutely necessary. Subgoals include:

Encourage good thread practice by defining thread-clean interfaces including local data hooks.
Provide all interfaces required while avoiding unnecessary pedanticism.

Performance. Simple filters should not seriously impact overall MTA performance.

2.2.2 Implementing Filtering Policies

Milter is designed to allow a server administrator to combine third-party filters to implement a desired mail filtering policy. For example, if a site wished to scan incoming mail for viruses on several platforms, eliminate unsolicited commercial email, and append a mandated footer to selected incoming messages, the administrator could configure the MTA to filter messages first through a server based anti-virus engine, then via a large-scale spam-catching service, and finally append the desired footer if the message still met requisite criteria. Any of these filters could be added or changed independently.

Thus the site administrator, not the filter writer, controls the overall mail filtering environment. In particular, he/she must decide which filters are run, in what order they are run, and how they communicate with the MTA. These parameters, as well as the actions to be taken if a filter becomes unavailable, are selectable during MTA configuration. Further details are available later in this document.

2.2.3 MTA - Filter communication

Filters run as separate processes, outside of the sendmail address space. The benefits of this are threefold:

The filter need not run with "root" permissions, thereby avoiding a large family of potential security problems.
Failures in a particular filter will not affect the MTA or other filters.
The filter can potentially have higher performance because of the parallelism inherent in multiple processes.

Each filter may communicate with multiple MTAs at the same time over local or remote connections, using multiple threads of execution. Figure 1 illustrates a possible network of communication channels between a site's filters, its MTAs, and other MTAs on the network:

Figure 2.3: A set of MTA's interacting with a set of filters.

The Milter library (libmilter) implements the communication protocol. It accepts connections from various MTAs, passes the relevant data to the filter through callbacks, then makes appropriate responses based on return codes. A filter may also send data to the MTA as a result of library calls. Figure 2 shows a single filter process processing messages from two MTAs:

Figure 2.4: A filter handling simultaneous requests from two MTA's.

2.3 Milter API

2.3.1 Library Control Functions

Before handing control to libmilter (by calling smfi_main), a filter may call the following functions to set libmilter parameters. In particular, the filter must call smfi_register to register its callbacks. Each function will return either MI_SUCCESS or MI_FAILURE to indicate the status of the operation.

None of these functions communicate with the MTA. All alter the library's state, some of which is communicated to the MTA inside smfi_main.

Function	Description
smfi_register	Register a filter.
smfi_setconn	Specify socket to use.
smfi_settimeout	Set timeout.
smfi_main	Hand control to libmilter.

2.3.2 Data Access Functions

The following functions may be called from within the filter-defined callbacks to access information about the current connection or message.

Function	Description
smfi_getsymval	Return the value of a symbol.
smfi_getpriv	Get the private data pointer.
smfi_setpriv	Set the private data pointer.
smfi_setreply	Set the specific reply code to be used.

2.3.3 Message Modification Functions

The following functions change a message's contents and attributes. They may only be called in xxfi_eom. All of these functions may invoke additional communication with the MTA. They will return either MI_SUCCESS or MI_FAILURE to indicate the status of the operation.

A filter must have set the appropriate flag (listed below) in the description passed to smfi_register to call any message modification function. Failure to do so will cause the MTA to treat a call to the function as a failure of the filter, terminating its connection.

Note that the status returned indicates only whether or not the filter's message was successfully sent to the MTA, not whether or not the MTA performed the requested operation. For example, smfi_addheader, when called with an illegal header name, will return MI_SUCCESS even though the MTA may later refuse to add the illegal header.

Function	Description	*SMFIF_ flag**
smfi_addheader	Add a header to the message.	SMFIF_ADDHDRS
smfi_chgheader	Change or delete a header.	SMFIF_CHGHDRS
smfi_addrcpt	Add a recipient to the envelope.	SMFIF_ADDRCPT
smfi_delrcpt	Delete a recipient from the envelope.	SMFIF_DELRCPT
smfi_replacebody	Replace the body of the message.	SMFIF_CHGBODY

2.3.4 Callbacks

The filter should implement one or more of the following callbacks, which are registered via smfi_register:

Function	Description
xxfi_connect	connection info
xxfi_helo	SMTP HELO/EHLO command
xxfi_envfrom	envelope sender
xxfi_envrcpt	envelope recipient
xxfi_header	header
xxfi_eoh	end of header
xxfi_body	body block
xxfi_eom	end of message
xxfi_abort	message aborted
xxfi_close	connection cleanup

The above callbacks should all return one of the following return values, having the indicated meanings. Any return other than one of the below values constitutes an error, and will cause sendmail to terminate its connection to the offending filter.

Milter distinguishes between recipient-, message-, and connection-oriented routines. Recipient-oriented callbacks may affect the processing of a single message recipient; message-oriented callbacks, a single message; connection-oriented callbacks, an entire connection (during which multiple messages may be delivered to multiple sets of recipients). xxfi_envrcpt is recipient-oriented. xxfi_connect, xxfi_helo and xxfi_close are connection-oriented. All other callbacks are message-oriented.

Return value	Description
SMFIS_CONTINUE	Continue processing the current connection, message, or recipient.
SMFIS_REJECT	For a connection-oriented routine, reject this connection; call xxfi_close. For a message-oriented routine (except xxfi_eom or xxfi_abort), reject this message. For a recipient-oriented routine, reject the current recipient (but continue processing the current message).
SMFIS_DISCARD	For a message- or recipient-oriented routine, accept this message, but silently discard it. SMFIS_DISCARD should not be returned by a connection-oriented routine.
SMFIS_ACCEPT	For a connection-oriented routine, accept this connection without further filter processing; call xxfi_close. For a message- or recipient-oriented routine, accept this message without further filtering.
SMFIS_TEMPFAIL	Return a temporary failure, i.e., the corresponding SMTP command will return an appropriate 4xx status code. For a message-oriented routine (except xxfi_envfrom), fail for this message. For a connection-oriented routine, fail for this connection; call xxfi_close. For a recipient-oriented routine, only fail for the current recipient; continue message processing.

2.4 Technical Overview of Milter APIs

Initialization
Control flow
Multithreading
Resource Management

2.4.1 Initialization

In addition to its own initialization, libmilter expects a filter to initialize several parameters before calling smfi_main:

The callbacks the filter wishes to be called, and the types of message modification it intends to perform (required, see smfi_register).
The socket address to be used when communicating with the MTA (required, see smfi_setconn).
The number of seconds to wait for MTA connections before timing out (optional, see smfi_settimeout).

If the filter fails to initialize libmilter, or if one or more of the parameters it has passed are invalid, a subsequent call to smfi_main will fail.

2.4.2 Control flow

The following pseudocode describes the filtering process from the perspective of a set of N MTA's, each corresponding to a connection. Callbacks are shown beside the processing stages in which they are invoked; if no callbacks are defined for a particular stage, that stage may be bypassed. Though it is not shown, processing may be aborted at any time during a message, in which case the xxfi_abort callback is invoked and control returns to MESSAGE.

For each of N connections

        For each filter

               process connection/helo (xxfi_connect, xxfi_helo)

MESSAGE:For each message in this connection (sequentially)

               For each filter

                       process sender (xxfi_envfrom)

               For each recipient

                       For each filter

                               process recipient (xxfi_envrcpt)

               For each filter

                       For each header

                               process header (xxfi_header)

                       process end of headers (xxfi_eoh)

                       For each body block

                               process this body block (xxfi_body)

                       process end of message (xxfi_eom)

        For each filter

               process end of connection (xxfi_close)

Note: Filters are contacted in order defined in config file.

To write a filter, a vendor supplies callbacks to process relevant parts of a message transaction. The library then controls all sequencing, threading, and protocol exchange with the MTA. Figure 3 outlines control flow for a filter process, showing where different callbacks are invoked.

SMTP Commands	Milter Callbacks
(open SMTP connection)	xxfi_connect
HELO ...	xxfi_helo
MAIL From: ...	xxfi_envfrom
RCPT To: ...	xxfi_envrcpt
[more RCPTs]	[xxfi_envrcpt]
DATA
Header: ...	xxfi_header
[more headers]	[xxfi_header]
	xxfi_eoh
body...	xxfi_body
[more body...]	[xxfi_body]
.	xxfi_eom
QUIT	xxfi_close
(close SMTP connection)

Figure 2.5: Milter callbacks related to an SMTP transaction.

Note that although only a single message is shown above, multiple messages may be sent in a single connection. Note also that a message and/or connection may be aborted by either the remote host or the MTA at any point during the SMTP transaction. If this occurs during a message (between the MAIL command and the final "."), the filter's xxfi_abort routine will be called. xxfi_close is called any time the connection closes.

2.4.3 Multithreading

A single filter process may handle any number of connections simultaneously. All filtering callbacks must therefore be reentrant, and use some appropriate external synchronization methods to access global data. Furthermore, since there is not a one-to-one correspondence between threads and connections (N connections mapped onto M threads, M <= N), connection-specific data must be accessed through the handles provided by the Milter library. The programmer cannot rely on library-supplied thread-specific data blocks (e.g. pthread_getspecific()) to store connection-specific data. See the API documentation for smfi_setpriv and smfi_getpriv for details.

Resource management

Since filters are likely to be long-lived, and to handle many connections, proper deallocation of per-connection resources is important. The lifetime of a connection is bracketed by calls to the callbacks xxfi_connect and xxfi_close. Therefore connection-specific resources (accessed via smfi_getpriv and smfi_setpriv) may be allocated in xxfi_connect, and should be freed in xxfi_close. For further information see the discussion of message- versus connection-oriented routines. In particular, note that there is only one connection-specific data pointer per connection.

Each message is bracketed by calls to xxfi_envfrom and xxfi_eom (or xxfi_abort), implying that message-specific resources can be allocated and reclaimed in these routines. Since the messages in a connection are processed sequentially by each filter, there will be only one active message associated with a given connection and filter (and connection-private data block). These resources must still be accessed through smfi_getpriv and smfi_setpriv, and must be reclaimed in xxfi_abort.

Prev TOP NEXT