Words index Vivian
Cook
What
is a word?
It may seem
perfectly obvious what a word is. ‘beer’ is a word, ‘reeb’ is not.
‘beer’ has a meaning, ‘reeb’ does not. So a word is a sequence of
letters (or sounds) with meaning. Is that all there is to it?
For over a
hundred years researchers looking at language meaning have used an element
called a morpheme – the smallest element of a sentence
that has a meaning. Sometimes this may be a word: ‘beer’ is a morpheme as it
has a meaning, ‘reeb’ is not as it has no meaning.
Sometimes a morpheme may be smaller than a
word; ‘beers’ and ‘beery’ have different meanings from ‘beer’ so the
elements ‘s’ and ‘y’ so are
morphemes.
- units able
to occur on their own
What makes ‘beer’ a word is partly that it can stand on its own –
‘What’ll you have?’ ‘Beer’ – but ‘s’ or ‘y’ have to be added
to something else, ‘beers/beery’, ‘flour/floury’ etc. Nobody could ask a
question to which ‘s’ or ‘y’ is the answer (except of course in
technical discussions of language). One working definition of word is then the
smallest unit of language that be said on its own – the minimal free form.
Since ‘s’ and ‘y’ are bound to a preceding word, just like ‘-ish’,
‘beerish’, or ‘-ly’, ‘beerily’, they are bound morphemes not
morphemes. If you were compiling a dictionary, it would, however, be highly
laborious to hunt for occasions on which every word in the dictionary had
occurred on its own. In practice this definition is not terribly useful. And it
does not cover the important category of function words like ‘of’ and
‘my’ (see page 000), which can hardly be said to occur on their own yet we
feel are words.
- units that can’t be split up
In English you can add to the beginning of a word, ‘happy>
unhappy’ or to the end ‘happy>happiness’ but you can’t usually add
anything to the middle; a word is uninterruptible. The only exception are
certain exclamatory remarks such as ‘absobloominglutely’ where
‘blooming’ has appeared in the middle of ‘absolutely’ – add your
favourite swear-word inside ‘fan…….tastic’ and ‘in……credible’.
Doug Coupland coined some words in this way such as ‘emallgration’; Homer
Simpson talks of a ‘saxamophone’.
- items listed in a dictionary
A word is also something that can be listed in a dictionary: you can look
up ‘beer’ but you can’t look up ‘s’ (though it may depend on the
dictionary, some of which will certainly have an entry for the meanings of
‘-y’). When we talk about the words of a language, it is usually this list
in a dictionary that we have in mind.
There are still problems. Dictionaries actually have ‘entries’ rather
than words. ‘man’ is one entry, hence it is one word. But this does not mean
that it has one meaning; the Oxford English Dictionary (OED) has seventeen main
meanings for 'man' as a noun including the expected 'a
human being (irrespective of sex or age)' but also 'one
of the pieces used in chess' and 'a cairn or
pile of stones marking a summit or prominent point of a mountain'. If the
principle is one word = one meaning, how many words ‘man’ are there?
Sometimes people treat these multiple meaning as ‘homophones’, different
words with the same sounds or letters that have the same meaning, like
‘bank’ (of a river) versus ‘bank’ (type of business). Sometimes they see
it as extensions from a ‘central meaning’. In some subtle way the meaning of
‘human being’ extends to anything that looks like a
human being such as a cairn of stones. So the list of separate entries in
the dictionary may only be a rough guide to the numbers of words.
- chunks divided by spaces or
sounds
A word is also a chunk of language that can be chopped out from a
sentence. It is of course possible to speak with clear pauses between the words:
‘We – shall – overcome’. But normally this style is reserved for
beginning readers and Daleks. If you listen to someone speaking, it is hard to
detect pauses in between the words – ‘weshallovercome’; listen to a
foreign language and you probably have no idea where words begin and end. In
speech pauses are used to show
grammatical divisions, hesitations and so on, rarely divisions between words..
Writing of course is very different. Words stand out because they have
spaces in between them; we know that there are three words in ‘We shall
overcome’ because there are two spaces: a word is ‘a sequence of letters
without any spaces’. This definition works admirably for writing and
distinguishes between words and bound morphemes which are not separated by
spaces. Probably this is the working definition most people use: if it has a
space before and a space after, it’s a word.
The snag is that the convention of putting spaces between words only came
in around the 8th century AD; English had got along without it for some time
already. Nor does it apply to other languages; Chinese has spaces between
characters, not between words; Thai and Inuit are alphabetic scripts that do not
separate words with spaces. If words need spaces, then none of these languages
have words. Nor can this definition cannot apply to the spoken language where
the equivalent pauses are potential rather than invariable; defining the units
of the spoken language through units of the written language is, according to
most, putting the cart before the horse. Pauses may indeed potentially isolate
words but it would take a lifetime of listening to catch the appropriate pauses
for each word in the language
Having sorted out where words start and end, the other problem is still in
English what counts as a word. Ok ‘carpet’ is a word and fits all the above
definitions. So is ‘carpets’ a different word? Is ‘to carpet someone’ a
different word? A ‘carpeter’? Are ‘carpeting’ and ‘carpeted’ still
different words? But what about ‘recarpet’ and ‘uncarpeted’?
‘carpet-bag’ and ‘red-carpeted’? If we are counting the number of
occurrences of ‘carpet’ in English texts, do we have to add up all of
these nouns, verbs and adjectives? Do we draw the line at compounds like
‘red-carpet’? Or are we even stricter and exclude forms that have affixes
like ‘carpeter’ and or prefixes like ‘recarpet’? Indeed I though I had
made these two forms up – the spell-checker in Word rejected them - till I
checked on Google and found 9,000 web-pages with ‘carpeter’ and 20,000 for
‘recarpet’. This is then why it is so difficult to set a figure on how many
words someone knows or how many words there are in a language: it all depends on
what you count as a word.
One solution is not to count words but word-families. A word family is
defined by Paul Nation as ‘a headword, its inflected forms, and its closely
related derived forms’. So included with the headword ‘book’ as a noun are
the inflected form ‘books’ and the derived forms ‘bookish’; ‘book’
as a verb has inflected forms ‘books’, ‘booked’, ‘booking’ and
derived forms ‘bookable’ and ‘booker’. So there are two ‘book’
families with a limited number of relations
But is a ‘booklet’ part of the noun family ‘book’? A ‘bookie’
part of the verb family ‘book’? Each person comes to a
different decision – before you even start thinking whether elements in
a word family have to share a common meaning
– is the book which is a ‘printed treatise’ the same family as the book
that is ‘a telephone directory’ or ‘words to which a musical is set?’ or
‘the total of charges against a person’ or ‘a record of bets’? And the meaning
affects which of the related forms can be used: ‘bookish’ can’t be used
for musicals; ‘bookies’ take bets; ‘bookers’ book tickets, etc