News directories

The tools in the Internet Utilities store Usenet news spools in a standard format. Each individual news message is a file, with news groups stored as directories containing indexes that reference those files. At the top-level of the directory structure, there are three subdirectories:

Messages/
The subdirectory where messages are stored.
Groups/
The subdirectory where newsgroup directories are stored.
Temp/
The subdirectory where temporary files, used during message posting and transport, are stored.

Messages

A standard format Usenet news spool contains one file per message. Messages are stored in a two-level directory structure. The first level is the Messages/ subdirectory. The second level is a directory named after the domain portion of the message's unique message ID. The message itself is stored in a file named after the local portion of the message's unique message ID. So a message with unique message ID <local@domain> will be stored in the file Messages/domain/local. Domains and local parts are encoded, using the same encoding scheme as is used in the standard mail directory format to encode characters in the domains and local parts that are illegal in actual directory and file names.

Note: The Internet Utilities are thus intentionally incapable of storing multiple different messages that have the same unique message ID, or messages without unique message IDs.

Once posted, message files are never written to or updated, in normal operation. Moreover, messages are created in partial form elsewhere (usually in the Work/ subdirectory), and renamed once they are fully written to file and processed by the posting program, guaranteeing that no posted message file under the Messages/ subdirectory is a partially transferred or posted message. No program may open open message files for write access, and programs opening message files should use the "deny write" file sharing mode to help enforce this.

The creation timestamp of a message file is the "arrival date" on the system, for purposes such as NNTP service. It is required that the last modification timestamp of the directory containing the message file reflect the time that the latest message file was renamed into it, since programs, such as the NNTP dæmon, will make use of that date in locating where new messages have arrived since a given point in time. (If the last modification timestamp of the directory is before that point in time, the program will skip the entire directory without searching it.) Some filesystem drivers will update the last modification timestamp of a directory automatically when a new file is renamed into it. With those filesystem drivers that do not, it is the responsibility of posting programs to do so.

Newsgroups

A standard format Usenet news spool contains one directory per newsgroup. Newsgroups are stored in a two-level directory structure. The first level is the Groups/ subdirectory. The second level is a directory named after the newsgroup name. So a newsgroup group will be stored as the directory Groups/group/. Newsgroup names are encoded, using the same encoding scheme as is used in the standard mail directory format to encode characters in the names that are illegal in actual directory and file names. (As of 2009, no Usenet newsgroup name in any of the "big 7" hierarchies actually requires this. But it is done for futureproofing.)

Within each directory are stored several files:

ByNumber
An index that maps the article number within a newsgroup to a unique message ID.
HWM
The largest article number ever used in the newsgroup.
Carried
A flag file signifying that the newsgroup is carried locally.
Moderator
The moderation status for the newsgroup.

The ByNumber index is a ISAM database, allowing article numbers to be accessed randomly or sequentially in order. Article numbers are the numbers seen via commands and responses in the NetNews Transfer Protocol, and may be sparse. Obtaining a new article number involves taking the value from the HWM, and incrementing it until it has a value that is not mapped to a message ID by the ByNumber index. Finding a message by its article number involves looking the message ID up in the index, and then retrieving the article by its message ID.

Note: An index entry that references a message ID for a non-existent message is a spool database consistency error that should not occur in normal operation unless the whole system happens to crash without flushing written data to disc. If it does occur, it is treated the same as an entry that is not in the index at all.

Unlike other news spool formats, per-newsgroup article numbers are not represented in the messages themselves. There are no local XRef: headers added to messages, and any such headers are ignored. To obtain the per-newsgroup article numbers of a message starting from a message, one looks at the message file's extended attributes. The .KEYWORDS extended attribute is an MVMT extended attribute, containing multiple ASCII attributes, each comprising a single "newsgroup:number" pair encoded as they would be in a XRef: header.

Newsgroups can be locally carried or not. Newsgroup directories are created on demand, as messages are added to the news spool. Every newsgroup referenced by a message will have a newsgroup directory. Such automatically created newsgroups are not marked as being locally carried. Only explicitly created newsgroups, created manually by news spool administrators, are marked as being locally carried by the presence of a Carried file.

If the Moderator file exists, the newsgroup is moderated, and its contents determine what the name of the mailbox to submit posted messages to is. Otherwise a newsgroup is unmoderated. A Moderator file that cannot be read still signifies a moderated newsgroup.

Note: It is not the job of the news spool format to prevent newsgroups from being created, to implement moderation policy, or to implement controls on posting to non-carried newsgroups. Those are the responsibility of a posting program, such as the NNTP dæmon. It is such programs' responsibility to reject or otherwise divert any messages listing newsgroups that local news policy forbids the presence of, or listing only auto-created newsgroups, or messages to moderated newsgroups, before adding them to the news spool.

These are, after all, but some of many rejection criteria that posting programs will usually have, all of which are function of the posting program and its implementation of local news policy, not a function of the actual news spool directory format itself.

Ancillaries

The news spool contains various ancillares.

Posting programs store spooled messages before they are posted, or canonicalized, in the Work/ subdirectory. Messages are usually renamed from some file named Work/name to Messages/domain/local when they are posted. The name portion of a working file is created to be unique so that a posting program will not overwrite other work files, but otherwise no interpretation is placed upon it. Moreover, not all work files will even end up posted as messages in the first place. Posting programs generally use three work files, one for the original message contents submitted by the client (which may be a so-called "proto-article"), one for an intermediate canonicalized message (the "proto-article" converted into canonicalized form), and one for the final message contents (with the trace message headers added).

There is no naming convention for files in the Work/ subdirectory. Programs use the OS/2 system API capability of failing if an attempt is made to create a file that already exists.

The History/ directory is structured much like the Messages/ directory, except that all files have zero length. Each file represents an expired message, recording its message ID in the filename, allowing news transport programs to recognize messages that have already been transferred but have since expired.


The Internet Utilities are © Copyright Jonathan de Boyne Pollard. "Moral" rights are asserted.