The tools in the Internet Utilities store Usenet news spools in a standard format. Each individual news message is a file, with news groups stored as directories containing indexes that reference those files. At the top-level of the directory structure, there are three subdirectories:
Messages/Groups/Temp/
A standard format Usenet news spool contains one file per message.
Messages are stored in a two-level directory structure. The first level
is the Messages/ subdirectory. The second level is a
directory named after the domain portion of the message's unique message
ID. The message itself is stored in a file named after the local portion
of the message's unique message ID. So a message with unique message ID
<local@domain>
will be stored in the file
Messages/domain/local. Domains
and local parts are encoded, using the same encoding scheme as is used in
the
standard mail directory format
to encode characters in the domains and local parts that are illegal in
actual directory and file names.
Note: The Internet Utilities are thus intentionally incapable of storing multiple different messages that have the same unique message ID, or messages without unique message IDs.
Once posted, message files are never written to or updated, in normal
operation. Moreover, messages are created in partial form elsewhere
(usually in the Work/ subdirectory), and renamed once they
are fully written to file and processed by the posting program,
guaranteeing that no posted message file under the Messages/
subdirectory is a partially transferred or posted message.
No program may open open message files for write access, and programs
opening message files should use the "deny write" file sharing mode to
help enforce this.
The creation timestamp of a message file is the "arrival date" on the
system, for purposes such as NNTP service. It is required that the last
modification timestamp of the directory containing the message file
reflect the time that the latest message file was renamed into it, since
programs, such as the
NNTP dæmon,
will make use of that date in locating where new messages have arrived
since a given point in time. (If the last modification timestamp of the
directory is before that point in time, the program will skip the entire
directory without searching it.) Some filesystem drivers will update
the last modification timestamp of a directory automatically when a new
file is renamed into it. With those filesystem drivers that do not, it
is the responsibility of posting programs to do so.
A standard format Usenet news spool contains one directory per newsgroup.
Newsgroups are stored in a two-level directory structure. The first level
is the Groups/ subdirectory. The second level is a directory
named after the newsgroup name. So a newsgroup group will be
stored as the directory Groups/group/.
Newsgroup names are encoded, using the same encoding scheme as is used in
the standard mail directory format to encode characters in the names that
are illegal in actual directory and file names. (As of 2009, no Usenet
newsgroup name in any of the "big 7" hierarchies actually requires this.
But it is done for futureproofing.)
Within each directory are stored several files:
ByNumberHWMCarriedModerator
The ByNumber index is a ISAM database, allowing article
numbers to be accessed randomly or sequentially in order. Article numbers
are the numbers seen via commands and responses in the NetNews Transfer
Protocol, and may be sparse. Obtaining a new article number involves
taking the value from the HWM, and incrementing it until it
has a value that is not mapped to a message ID by the
ByNumber index. Finding a message by its article number
involves looking the message ID up in the index, and then retrieving the
article by its message ID.
Note: An index entry that references a message ID for a non-existent message is a spool database consistency error that should not occur in normal operation unless the whole system happens to crash without flushing written data to disc. If it does occur, it is treated the same as an entry that is not in the index at all.
Unlike other news spool formats, per-newsgroup article numbers are not
represented in the messages themselves. There are no local
XRef: headers added to messages, and any such headers are
ignored. To obtain the per-newsgroup article numbers of a message
starting from a message, one looks at the message file's extended
attributes. The .KEYWORDS extended attribute is an MVMT
extended attribute, containing multiple ASCII attributes, each comprising
a single "newsgroup:number" pair encoded as they would be in a
XRef: header.
Newsgroups can be locally carried or not. Newsgroup directories are
created on demand, as messages are added to the news spool. Every
newsgroup referenced by a message will have a newsgroup directory. Such
automatically created newsgroups are not marked as being locally carried.
Only explicitly created newsgroups, created manually by news spool
administrators, are marked as being locally carried by the presence of a
Carried file.
If the Moderator file exists, the newsgroup is moderated, and
its contents determine what the name of the mailbox to submit posted
messages to is. Otherwise a newsgroup is unmoderated. A
Moderator file that cannot be read still signifies a
moderated newsgroup.
Note:
It is not the job of the news spool format to prevent newsgroups from
being created, to implement moderation policy, or to implement controls on
posting to non-carried newsgroups. Those are the responsibility of a
posting program, such as the
NNTP dæmon.
It is such programs' responsibility to reject or otherwise divert any
messages listing newsgroups that local news policy forbids the presence
of, or listing only auto-created newsgroups, or messages to moderated
newsgroups, before adding them to the news spool.
These are, after all, but some of many rejection criteria that posting programs will usually have, all of which are function of the posting program and its implementation of local news policy, not a function of the actual news spool directory format itself.
The news spool contains various ancillares.
Posting programs store spooled messages before they are posted, or
canonicalized, in the Work/ subdirectory. Messages are
usually renamed from some file named Work/name to
Messages/domain/local when they
are posted. The name portion of a working file is created to be
unique so that a posting program will not overwrite other work files, but
otherwise no interpretation is placed upon it. Moreover, not all work
files will even end up posted as messages in the first place. Posting
programs generally use three work files, one for the original message
contents submitted by the client (which may be a so-called
"proto-article"), one for an intermediate canonicalized message (the
"proto-article" converted into canonicalized form), and one for the final
message contents (with the trace message headers added).
There is no naming convention for files in the Work/
subdirectory. Programs use the OS/2 system API capability of failing if
an attempt is made to create a file that already exists.
The History/ directory is structured much like the
Messages/ directory, except that all files have zero length.
Each file represents an expired message, recording its message ID in the
filename, allowing news transport programs to recognize messages that have
already been transferred but have since expired.