Search and substition wildcards

There are two different sorts of wildcards: A search specification denotes a set of names matching a given pattern. A substitution specification controls how a given name is transformed into a different name.

Some commands allow multiple files to be specified using search wildcard specifications. Search wildcard specifications comprise two parts: a directory prefix and a basename. The directory prefix specifies the top-level directory that is to be scanned for matching filenames. The basename specifies the pattern that filenames must match, and can either be basic or extended.

How wildcard filenames are split into two parts

Everything up to and including the last slash in the search wildcard specification is taken to be the directory prefix. If there is no slash, but there is a drive letter and a colon, then that is taken to be the directory prefix. If there is no basename (i.e. the search wildcard ends with a trailing slash) the default is a single asterisk "*".

Note: If the /OLDSRCDIR standard option has been used, and a search wildcard specification matches the name of an existing directory, the directory prefix is taken to be the entire specification, and the basename is implicitly a single asterisk "*".

Examples of how search wildcards are split

WildcardDirectoryBasename
E:\FILENAMEE:\FILENAME
E:*.BAKE:*.BAK
E:\*.BAKE:\*.BAK
E:\SUBDIRE:\SUBDIR
E:\SUBDIR\E:\SUBDIR\*
E:\SUBDIR\*.BAKE:\SUBDIR\*.BAK
\\SRV\PUB\SUBDIR\*.BAK\\SRV\PUB\SUBDIR\*.BAK
\\SRV\PUB\*.BAK\\SRV\PUB\*.BAK
\\SRV\PUB\SUBDIR\\\SRV\PUB\SUBDIR\*
\\SRV\PUB\SUBDIR\\SRV\PUB\SUBDIR

Directory prefixes

If there is no directory prefix at all, the default is the current drive and directory; if the directory prefix is in the Universal Naming Convention format, no drive is the default; and if the directory prefix is not in the UNC format and does not contain a drive letter and a colon, the default is the given directory on the current drive. If the non-drive-letter portion of the directory prefix does not begin with a backslash, the path is relative to the current directory instead of the root directory.

Basenames

A basename comprises a mixture of literal characters and metacharacters. Literal characters in the basename must match the filename exactly. Search wildcards fall into two categories, basic and extended. Basic search wildcards are used by most commands, and comprise the wildcard functionality supplied in the operating system itself. Extended wildcards are an extension to this syntax that is implemented within a number of commands. The DIR command accepts extended wildcards in the DIR_COLOURS environment variable, for example.

Basic search wildcards

Basic search wildcard functionality is provided by the operating system itself. It is inconsistent, dependent from the particular filesystem driver being employed, and contains odd special casing for certain patterns.

The metacharacters available in basic wildcards are as follows:

*
An asterisk matches zero or more characters.

* will match full stops ('.') in filenames. This means that *.exe will match all files that end in the four characters '.', 'E', 'X', and 'E', irrespective of what occurs before. This is useful for processing files where one extension has been appended to another. The wildcard *.gz, for example, will match all GZIP files, including any *.TAR.GZ files.

?
A question mark matches a single character, or the end of the filename.

? will not match full stops ('.') in filenames.

Multiple trailing ? characters at the end of a wildcard may match the end of the filename, meaning, for example, that wildcards such as ??? will match all names that are one, two, or three characters long.

It should be noted that a full stop is not a metacharacter in a basic search wildcard. Since on OS/2 filenames can contain an arbitrary number of full stops, it is important to use full stops correctly in wildcard specifications, because they will only match literal full stops. The filesystem drivers in IBM OS/2 silently treat *.* as if it were simply *, but no other wildcards involving full stops are given this special case treatment.

For example, the wildcard specification *.*. will only match files with a full stop at the end of their name. And in fact it won't match anything, because the HPFS filesystem driver in IBM OS/2 silently strips trailing full stops off file and directory names at the time that they are being created, meaning that it is impossible to create a name containing a trailing full stop in the first place.

For best results, try to avoid using full stops in wildcards unless they are actually required. Instead of *.* use simply *. Only use a full stop when explicitly trying to match a full stop.

Examples of basic search wildcards

*
All files.
c*s
All filenames starting with the letter 'C' and ending with the letter 'S'.
*on*
All filenames containing the two-letter sequence 'ON' somewhere.
???
All filenames that are up to three characters long.

Extended search wildcards

Extended search wildcard functionality is provided by application library code. It has no dependence upon filesystem drivers, and has no odd special casing.

The metacharacters available in extended wildcards are as follows:

*
An asterisk matches zero or more characters.
?
A question mark matches a single character.
.
A full stop matches another full stop '.' or the end of the filename.
[abcd]
Character sets delimited with brackets match exactly one character from that set.
[0-9]
Character ranges delimited with brackets match exactly one character within the range.
{}
Braces enclose a comma-delimited list of strings, any one of which may match that part of the name character within the range.

It should be noted that both * and ? will match full stops ('.') in filenames. Full stops are not regarded as special in OS/2, and on HPFS volumes filenames can contain many full stops. This means that *.exe will match all files that end in the four characters '.', 'E', 'X', and 'E', irrespective of what occurs before. This is useful for processing files where one extension has been appended to another. The wildcard *.gz, for example, will match all GZIP files, including any *.TAR.GZ files.

Examples of extended search wildcards

*
All files.
*.*
All files. (This is for backwards compatibility only, and is highly discouraged. The full stop '.' character has had to be made into a metacharacter in order to allow this usage. Strictly speaking, *.* should match only those files that have at least one full stop '.' character in their name. * should be used instead of *.* to mean all files.)
c*s
All filenames starting with the letter 'C' and ending with the letter 'S'.
*on*
All filenames containing the two-letter sequence 'ON' somewhere.
???
All filenames that are exactly three characters long.
[a-z]*
All filenames beginning with a letter.
*[0-9]*
All filenames containing a digit somewhere.
*.{htm,html,gif,jpg,jpeg}
All HTML, GIF, and JPEG files.
*.{su,mo,tu,we,th,fr,sa}[0-9]
All Fidonet compressed ARCmail packets.

Wildcard substitions

Some commands allow "destination" filenames to be constructed from "source" filenames according to a given pattern. A substitution specification comprises an optional directory prefix, which is used as it stands, and a base filename that contains the pattern to apply to the "source" base name, comprising literal characters and metacharacters.

Note: If the /OLDDESTDIR standard option has been used, and a substitution wildcard specification matches the name of an existing directory, the directory prefix is taken to be the entire specification, and the basename is implicitly a single asterisk "*".

The following metacharacters are defined:

*
An asterisk substitutes all characters from the source up until the first character that matches the next part of the pattern.
?
A question mark substitutes a single character from the source to the destination, unless that character is a full stop.
.
A full stop substitutes a full stop into the destination and skips all characters in the source from the current position until either the next full stop or the end of the string.

Literal characters in the pattern are substituted into the destination directly, and a single character from the source is skipped, unless that character is a full stop.

Examples

SOURCEPATTERNRESULT
attrib.exe*.comattrib.com
attrib.exe*comattrib.execom
comp.exe*comcom
comp.exeattrib.*attrib.exe
comp.exeattrib*attrib.exe
sendmail.8.6.9.tar.gz*.8.6.10.*sendmail.8.6.10.tar.gz
sendmail.8.6.9.tar.gz*8.6.10*sendmail.8.6.10.tar.gz
sendmail.8.6.9.tar.gzzmailer.*zmailer.8.6.9.tar.gz
sendmail.8.6.9.tar.gz*.*.*.*.F*sendmail.8.6.9.Far.gz
sendmail.8.6.9.tar.gz*.*.*.*.F.*sendmail.8.6.9.F.gz

The 32-bit Command Interpreter is © Copyright Jonathan de Boyne Pollard. "Moral" rights are asserted.