|Any exposure to texts results in vocabulary growth, also in adults. Image © | Dreamstime.com|
By the time they reach adulthood, people know the words of their mother tongue. This is what allows us to communicate, right?
Not necessarily. While vocabulary size tests do suggest that word learning stops in early adulthood, there are two things about the word frequency distribution that make such testing mathematically impossible: the distribution is skewed and bursty.
Starting from high counts for the most common words like “the”, “and” and “in”, word frequency falls off sharply towards the long tail of very infrequent words.
At the dawn of corpus linguistics, researchers were hoping to cover the whole vocabulary of their language by including more and more text in the corpus. Somewhat surprisingly, this hasn’t happened. Regardless of advances in corpus sizes, first to millions, then to billions and now to trillions of word tokens1, about half of the word types occur just once in any corpus.
This also applies to the body of text encountered by a person over their lifetime. For word learning, the consequence is that if you double the amount of text you have seen, you also double the number of words you haven’t heard before (and probably will never hear again).
Such exposure to rare words continues over the whole lifetime of a person, so that older adults will be familiar with many words that they had never even heard when they were younger.
Rare words are not distributed evenly across texts. If you take a particular rare word, say corpus, then most texts will not contain it at all. However, if you do find a text that contains it, like this one, then it will probably contain it multiple times, like a burst of occurrences of this word. The same is true for people’s experience with words. Most people are not familiar with the terminology of corpus linguistics at all, but if you find a person who is, then these words will be very familiar to them.
Now, if we know that a person is familiar with linguistics terminology, does this say anything about their knowledge of terms in football, knitting or particle physics? Of course not. With the increase of life experience, peoples’ learning, including word learning, shifts more and more into the specialised domains that are part of their lives either as a profession or a hobby.
Vocabulary size tests only measure common vocabulary from the most common 20 thousand words or so – even learner’s dictionaries contain many more words. Since adult learning happens increasingly outside of this common vocabulary, it is obvious why such tests systematically fail to detect vocabulary growth in adults.
Another surprising consequence is that while the vocabulary size of each person grows with experience, the experience of each person is different, ensuring that the vocabularies are also increasingly different. A computer simulation of word learning has showed that for each individual, only about a half of their vocabulary is shared with the whole group. Think what that means to the idea of knowing your mother tongue.
1 Simplifying a bit, text is made up of word tokens that belong to word types. For instance the phrase “To be or not to be” contains 6 tokens of 4 types, two of which occur twice each.↩