<CENTER><A HREF="mailto:stone-catend@ntlworld.com">Feedback</A> | <A HREF="trlt.htm">Transliteration List</A> | <A HREF="trlrefs.htm">References</A> | <A HREF="trldefs.htm">Definitions</A> </CENTER>

Detailed Analysis of Scripts

with special reference to Indic scripts

Written text __ Scripts with case __ Several scripts __ Target scripts ___ Conclusions


In this document the term 'element' has a different sense from that used in the previous documents.


1. Analysis of Written Text ____ (Contents)

It is convenient to analyze each actual example of text in a given script into the following elements:



variant forms being distinguished. These elements will be called 'graphic tokens' of abstract 'graphemic types'. (Since the term 'grapheme' is not used here, the notation <...> for graphemes will be used for graphemic types.)


                 many )
              graphic )---> one graphemic type
               tokens )

        [written forms]     [an abstract form]   

Ex. In Latin script, roman a and italic a are two distinct graphemic types.

Ex. Devanagari has two forms of the isolated vowel [a] (where [ ] encloses the approximate phonetic value), giving the graphemic types <a-1>, <a-2> for this isolated vowel. A similar analysis applies to the two Devanagari forms of [jha] and of [n_dot-b a] (with retroflex nasal).

Ex. If the virama is denoted by \, there exist the graphemic types <ka> (with inherent vowel), <ka\> (with virama), <kta>, etc. In this analysis, the k in <kta>, for instance, does not produce a separate graphemic type, because it never occurs alone.

Ex. Bengali script has two distinct graphemic types, <ta\> and <t>.

Graphemic types with the same significance map to one abstract 'script character', which may then be analyzed into a string of one or more 'script generators' corresponding to abstract letters and other marks used in the particular script. The following diagram applies to scripts without case (see below for scripts with case).

          
         graphemic  )                               (one 
          types of  )---> one script character ---> (or more
       same signif. )                               (script generators

Some examples will clarify this [where s_dot-b) is the retroflex sibilant]:


   Script  Graphemic types  Script character  Script generators
   ------  ---------------  ---------------- ----------------- 
   Dev       <a-1> )           a             {a|Dev}  
             <a-2> ) 

   Dev       <ks_dot-b a>      ks_dot-b a     {k|Dev}
                                             {s_dot-b|Dev}
                                             {a|Dev}
 
   Dev       <ka>              ka          ( {k|Dev}
                                           ( {a|Dev}
             <ka\>             k             {k|Dev}
                     

   Ben       <ta\> )           t             {t|Ben}
             <t>   )                          

The Indic scripts under consideration do not have any other elements than those mentioned, and a particular script may be said to consist of the set of all its script characters (which in practice is a finite set).



2. Scripts with Case ____ (Contents)

Latin script, unlike the Brahmic scripts, has upper and lower case letters, but no case distinction for numerals and other signs. For scripts with case it is useful to distinguish case in script characters, and introduce a category between script characters and script generators, the 'sub-generators'. {A|Lat} and {a|Lat} are different sub-generators of the script generator {a|Lat}. Scripts without case have no sub-generators.


   Script  Graphemic     Script        Sub        Script 
             types     characters  -generators   generator
   ------  ----------  ----------  -----------   --------- 
   Lat        a )         a         {a|Lat} )  
              a )                           )     {a|Lat}
                                            )
              A )         A         {A|Lat} ) 
              A )                   

 _ Because of its sub-generators, Latin includes the script characters æ and Æ.



3. Several Scripts ____ (Contents)

Considering two scripts, comparative linguistics shows that most script generators in one script have the same meaning as a script generator of the other script. Applying this to all the scripts under consideration, we obtain a minimal set of 'inter-script generators'. A transliteration scheme for these inter-script generators constitutes a 'uniform transliteration' for this group of scripts.


   Script:            1          2      ...         Inter-script
                                                     generators
   -------------------------------------------------------------
   Generators:   {g-1 | 1}  (g-1 | 2}   ...             {g-1}
                 {g-2 | 1}  {g-2 | 2}   ...             {g-2}
                      -     {g-3 | 2}   ...             {g-3} 
    
                     ...        ...     ...              ...

Ex. Other than the Tamil script, Brahmic scripts have a script generator {jh | scriptname}. There is therefore an inter-script generator (jh}, which is transliterated jh.

Ex. Hindi pronunciation generally drops a final inherent a, and also certain cases of medial inherent a, although this is not shown when writing. Transliteration must show the inherent a, if it is not to become transcription. If final a is omitted in transliteration, there is no way knowing whether the original had the a or not, i.e. whether or not virama was used. So reversibility is lost, and the scheme cannot be used to print out text in the original script.

Ex. Where Tamil has the script generators {n | Tam} for the dental and {n_macr-b | Tam} for the alveolar nasal, Malayalam only has {n | Mal}, although the alveolar nasal occurs in Malayalam phonetics. A Tamil and Malayalam word containing the combination n_macr-b r_macr-b when written in Tamil (where r_macr-b represents the alveolar flap) is written with nr_macr-b in Malayalam. This n is pronounced as an alveolar, but any further indication of this would turn transliteration into transcription.



4. Target Scripts ____ (Contents)

The script generators of Latin script include the 26 letters of the alphabet, letters with diacritical marks, punctuation marks, and various other signs (ampersand, #, *, etc.). Its script characters are these plus a few ligatures (a + e, f + i, etc.). Note that there are a number of transliteration schemes (termed 'case-sensitive' schemes) which distinguish between lower case and upper case Latin letters.
 _ It has always been found appropriate to use more than one Latin letter in transliterating some of the script generators of Brahmic scripts (as in au, kh, etc.).
 _ Occasionally it is necessary to call on the target script to indicate things such as vowel hiatus or word breaks, by the use of what may be called 'meta-signs'.

5. Conclusions ____ (Contents)

I therefore suggest that --

'Transliteration' is a one-to-one mapping from the script generators of the source script to a subset of either the script generators or the sub-generators of the target script, singly or in ordered combinations, together with the use of the necessary meta-signs.

'Case-insensitive transliteration' uses a subset of the script generators of the target script.

'Case-sensitive transliteration' uses a subset of the sub-generators of the target script.

'Uniform transliteration' is a one-to-one mapping from the inter-script generators of the source scripts to a subset of either the script generators or the sub-generators of the target script, singly or in ordered combinations, together with the use of the necessary meta-signs.

Uniform transliteration is 'case-sensitive' or 'case-insensitive' as defined above.

It follows that reversing the transliteration will not distinguish between graphemic types of the same meaning in the original source script.


<CENTER>Up to <A HREF="trlt.htm">Transliteration List</A>.</CENTER>
Copyright (C) Anthony P. Stone 1996, 1997. This material may be freely used, provided the author is acknowledged.
Last updated: 10 June 2002