Written text __
Scripts with case __
Several scripts __
Target scripts ___
Conclusions
In this document the term 'element' has a different sense from that used in the previous documents.
It is convenient to analyze each actual example of text in a given script
into the following elements:
variant forms being distinguished. These elements will be called
'graphic tokens' of abstract 'graphemic types'.
(Since the term 'grapheme' is not used here, the notation <...> for
graphemes will be used for graphemic types.)
many )
graphic )---> one graphemic type
tokens )
[written forms] [an abstract form]
Ex. In Latin script, roman a and italic a are two distinct graphemic
types.
Ex. Devanagari has two forms of the isolated vowel [a] (where [ ]
encloses the approximate phonetic value), giving the graphemic types
<a-1>, <a-2> for this isolated vowel. A similar
analysis applies to the two Devanagari forms of [jha] and of
[n_dot-b a] (with retroflex nasal).
Ex. If the virama is denoted by \, there exist the graphemic types
<ka> (with inherent vowel), <ka\> (with virama),
<kta>, etc. In this analysis, the k in
<kta>, for instance, does not produce a separate graphemic
type, because it never occurs alone.
Ex. Bengali script has two distinct graphemic types, <ta\>
and <t>.
Graphemic types with the same significance map to one abstract
'script character', which may then be analyzed into a string of one or more
'script generators' corresponding to abstract letters and other marks
used in the particular script. The following diagram applies to scripts
without case (see below for scripts with case).
graphemic ) (one
types of )---> one script character ---> (or more
same signif. ) (script generators
Some examples will clarify this [where s_dot-b) is the retroflex
sibilant]:
Script Graphemic types Script character Script generators
------ --------------- ---------------- -----------------
Dev <a-1> ) a {a|Dev}
<a-2> )
Dev <ks_dot-b a> ks_dot-b a {k|Dev}
{s_dot-b|Dev}
{a|Dev}
Dev <ka> ka ( {k|Dev}
( {a|Dev}
<ka\> k {k|Dev}
Ben <ta\> ) t {t|Ben}
<t> )
The Indic scripts under consideration do not have any other elements than
those mentioned, and a particular script may be said to consist of the set
of all its script characters (which in practice is a finite set).
Latin script, unlike the Brahmic scripts, has upper and lower case letters, but no case distinction for numerals and other signs. For scripts with case it is useful to distinguish case in script characters, and introduce a category between script characters and script generators, the 'sub-generators'. {A|Lat} and {a|Lat} are different sub-generators of the script generator {a|Lat}. Scripts without case have no sub-generators.
Script Graphemic Script Sub Script
types characters -generators generator
------ ---------- ---------- ----------- ---------
Lat a ) a {a|Lat} )
a ) ) {a|Lat}
)
A ) A {A|Lat} )
A )
Because of its
sub-generators, Latin includes the script characters æ and Æ.
Considering two scripts, comparative linguistics shows that most script
generators in one script have the same meaning as a script generator of the
other script. Applying this to all the scripts under consideration, we obtain
a minimal set of 'inter-script generators'. A transliteration scheme
for these inter-script generators constitutes a 'uniform
transliteration' for this group of scripts.
Script: 1 2 ... Inter-script
generators
-------------------------------------------------------------
Generators: {g-1 | 1} (g-1 | 2} ... {g-1}
{g-2 | 1} {g-2 | 2} ... {g-2}
- {g-3 | 2} ... {g-3}
... ... ... ...
Ex. Other than the Tamil script, Brahmic scripts have a script generator
{jh | scriptname}. There is therefore an inter-script generator
(jh}, which is transliterated jh.
Ex. Hindi pronunciation generally drops a final inherent a, and
also certain cases of medial inherent a, although this is not shown
when writing. Transliteration must show the inherent a, if it is not
to become transcription. If final a is omitted in transliteration,
there is no way knowing whether the original had the a or not, i.e.
whether or not virama was used. So reversibility is lost, and the scheme
cannot be used to print out text in the original script.
Ex. Where Tamil has the script generators {n | Tam} for the dental
and {n_macr-b | Tam} for the alveolar nasal, Malayalam only has
{n | Mal}, although the alveolar nasal occurs in Malayalam phonetics.
A Tamil and Malayalam word containing the combination
n_macr-b r_macr-b when written in Tamil (where r_macr-b
represents the alveolar flap) is written with nr_macr-b in Malayalam.
This n is pronounced as an alveolar, but any further indication of
this would turn transliteration into transcription.
The script generators of Latin script include the 26 letters
of the alphabet, letters with diacritical marks, punctuation marks, and
various other signs (ampersand, #, *, etc.). Its script characters are these
plus a few ligatures (a + e, f + i, etc.). Note that there are a number of
transliteration schemes (termed 'case-sensitive' schemes) which
distinguish between lower case and upper case Latin letters.
It has always been found
appropriate to use more than one Latin letter in transliterating some of the
script generators of Brahmic scripts (as in au, kh, etc.).
Occasionally it is
necessary to call on the target script to indicate things such as vowel
hiatus or word breaks, by the use of what may be called 'meta-signs'.
I therefore suggest that --
'Transliteration' is a one-to-one mapping from the script generators of the source script to a subset of either the script generators or the sub-generators of the target script, singly or in ordered combinations, together with the use of the necessary meta-signs.
'Case-insensitive transliteration' uses a subset of the script generators of the target script.
'Case-sensitive transliteration' uses a subset of the sub-generators of the target script.
'Uniform transliteration' is a one-to-one mapping from the inter-script generators of the source scripts to a subset of either the script generators or the sub-generators of the target script, singly or in ordered combinations, together with the use of the necessary meta-signs.
Uniform transliteration is
'case-sensitive' or
'case-insensitive' as defined above.
It follows that reversing the transliteration will not distinguish between graphemic types of the same meaning in the original source script.