Vivian Cook   SLA site      Writing System site    Linguistics Glossary

Languages List

A check-list for some features of languages. Based on Vivian Cook (1997), Inside Language, Arnold

Romanian translation Polish version

The information for each language consists of:
its name. Languages often have alternative names, the difference sometimes having political overtones, for example Persian/Farsi, Bahasa Malaysia/Bahasa Melayu, Inuit/Eskimo
- where it is spoken (unless obvious from the name)
- numbers of speakers. General information mostly comes from the series B. Comrie, The Major Languages of East and South Asia/Western Europe/The Soviet Union, Croom Helm: G. Campbell, (19XX), Compendium of the World’s Languages, XXXX; D. Crystal (1987), The Cambridge Encyclopedia of Language, CUP; figures were checked and up-dated from Ethnologue (SIL International) Feb 99 version
- the usual word order of S (Subject), V (Verb), and O (Object), whether SOV, SVO, VSO or whatever. Some languages with a freer word order are hard to assign a single canonical order.
- whether it is pro-drop (PD), i.e. permits null subject sentences (Italian parla), or non-pro-drop (NPD), i.e. requires a subject (English he speaks). For many languages the information is not readily available.
- the language family to which it belongs
- its writing system. The default writing system is for convenience Roman alphabet and left>right direction, so only other possibilities are specified. Writing systems sometimes have political importance, for example in North and South Korea. Numbers of speakers are approximate and taken from a number of sources; they usually refer to the total number of speakers in the whole world, not just in one country, particularly relevant for languages such as French, English, and Spanish. See Scripts Statistics for figures on writing systems.

Other related pages:
Linguistic Glossary     SLA Bibliography     Vivian Cook's home page

Albanian
        (Albania and adjacent countries)    53 million
        SVO   Indo-European   Roman [since 1908]

Arabic
        (North African countries such as Morocco, Middle
        East, such as Jordan, etc, widespread religious use
        150 million
        VSO   PD   Semitic   Arabic alphabet (r>l, vowel-less)

Bahasa Melayu/Bahasa Malaysia/Malay
        (Malaysia)   17 million
        SVO  PD Austronesian    Roman (since 19th century),
        differing from Indonesian (Bahasa Indonesia) in
        vocabulary

Basque
        (Spanish/French border area)  500 thousand
        SOV no known family

Bengali
        (Bengal, i.e. Bangladesh and West Bengal)
        189 million
        SOV   PD Indo-European (Indic)  Devanagari script

Berber
        (name for several closely related languages such as
        Kabyle and Shawia used in Morocco, Algeria and
        neighbouring countries)  12 million
        VSO Afro-Asiatic   formerly Berber alphabet (r>l)

Burmese
        SOV  PD Sino-Tibetan   Burmese alphabet (based on
        circles)   22 million

Chinese
        (China, Taiwan, Singapore, etc)    885 million
        8 main dialects (alias languages)
        SVO    PD  Sino-Tibetan   character script

Chinook
        VSO  Penutian

Cocama/Kokama
        (Peru, Columbia, Brazil)  10 thousand
        SVO  Andean-Equatorial

Czech
        SVO PD   Indo-European (Slavic)   Roman
                (adapted)  12 million

Dutch
        (Netherlands, Belgium, Suriname)     20 million
        SVO  NPD   Indo-European (Germanic)

English
        SVO    NPD  Indo-European (Germanic)  322 million

Fijian
        Verb-initial  Austronesian  300 thousand

Finnish
        (Finland, parts of Russia and Sweden)  5 million
        SVO   Finno-Ugric    Roman (only 21 letters)

French
        (Francophone Africa, Quebec, France, French
         colonies, etc)  200 million???
        SVO NPD Indo-European (Romance)

German
        (Switzerland, Austria, Germany, etc)   98 million
        SOV NPD  Indo-European (Germanic)  Roman
        (previously Fraktur Roman)

Greek
        (Cyprus, Greece) 12 million
        SVO  PD   Indo-European  Greek alphabet

Hawaii’an
        VSO Polynesion

Hebrew
        (Israel, widespread religious use)  4 million
        VSO (SVO)  PD   Semitic  Hebrew alphabet (r>l,
        vowel-less)

Hindi
        (India)  182 million
        SOV Indo-European (Indic)  Devanagari alphabet

Hungarian
        SOV  Finno-Ugric  14 million

Indonesian
        SVO   Austronesian   close to Bahasa Malaysis except
         for some vocabulary 170 million

Irish Gaelic
        VSO  Indo-European (Celtic)  100 thousand

Italian
        SVO   PD   Indo-European (Romance)  37 million

Japanese
        SOV   PD   Altaic?   Kanji character scripts plus kana
        syllabaries    125 million

Kabardian           
        SOV Caucasian   Cyrillic alphabet (plus ????)  350 
        thousand

Korean          
        SOV (Verb Final)   PD   Altaic?         75 million

        Han’gul sound-based script (l>r) and characters (r>l)

Latin
        (dead language, formerly widespread, more latterly
        for religious use)
        SOV   PD  Indo-European

Maori
        (New Zealand)             100 thousand
        VSO Austronesian

Persian/Farsi
        (Iran, parts of Afghasistan, Tadzhik republic)
        38 million
        SOV Indo-European   Persian alphabet (derived
        from Arabic, r>l)

Polish
        Indo-European (Slavic)  44 million
        Roman (without q v x but plus many
        diacritics)

Portuguese
        (Brazil, Portugal)      170 million
        SVO   PD   Indo-European (Romance)

Punjabi/Panjabi
        (Punjab border oF India/Pakistan)   30 million
        SOV   Indo-European   Gurmukhi alphabet

Romansch
        (pockets of eastern Switzerland and northern Italy)
        50 thousand
        Indo-European (Romance)

Russian
        SVO  Indo-European (Slavic)  Cyrillic  170 million

Samoan
        VSO (V-initial)   Austronesian  200 thousand

Scots Gaelic
        VSO Indo-European (Celtic) 80 thousand

Seneca (New York State)
        Iroquoian  Roman alphabet (reduced to
        12 letters plus “?“)

Serbian
        SVO   Indo-European (Slavic)   Serbian (adapted
        Cyrillic alphabet)  18 million

Slovak
        Indo-European (Slavic)   Roman (adapted)  18 million

Spanish
        (Latin America, Spain, etc)     332 million
        SVO  PD  Indo-European (Romance)

Sranan
        (Suriname, the Netherlands)          100 thousand
        Creole (English plus Portuguese and African
        elements)

Swahili/Kiswahili
        (East Africa inc Tanzania, Kenya etc)   4 million
        SVO  Niger-Congo (Bantu0 Roman alphabet without
        c q x

Swedish          
        SVO  Indo-European (Germanic, North)  8 million

Tagalog (Phillipines)  
       
VOS Malayo-Polynesian   50 million   

Tahitian
        VSO   Austronesian  70 thousand

Tamil
        (Southern India, Sri Lanka, Malaysia, etc)   63 million
        SOV  Dravidian   Tamil syllabary

Thai   SVO Sino-Tibetan Thai alphabet   20 million

Tok Pisin         
         (Papua New Guinea)  1.5 million

        SVO   pidgin/creole

Tongan  
           V-initial   Austronesian 80 thousand

Turkish
        (Turkey plus groups in adjacent countries)   59 million
        SOV   Altaic   Roman alphabet since 1928
        (with no q w x)

Ukranian
        (Ukraine and neighbouring areas)     60 million
        Indo-European (Slavic)   Cyrillic alphabet (plus ?, i
        and ??)

Urdu
        (Pakistan, India)       58 million
        SOV   Indo-European   Urdu script ( Arabic derived,
        r>l)

Vietnamese         
        SVO Austro Asiatic  Roman (with many diacritics) 
        67 million

Welsh
        VSO  Indo-European (Celtic)  500 thousand

Yoruba
        (Nigeria)  16 million  SVO Niger-Congo

Zulu
        (South Africa)  3.5 million SVO Niger-Congo (Bantu)

Xhosa
        (South Africa, Transkei)  5 million SVO Niger-Congo
 

Vivian Cook's Home Page       Multilingualism figures