You’re standing in line at the airport, inadvertently listening in as the woman ahead of you speaks into a cellphone, in a language you don’t understand. You hear a barrage of syllables, followed by a long pause as she listens, and then the fusillade resumes. Though we know, of course, that we are hearing words, they don’t really sound like words, except when a proper name like Obama or a phrase of English like air traffic control sneaks in. Everything just runs together, and if you had to pick where one word ends and another begins, you’d be doing so almost at random. Listen to a language you know, though, and such boundaries are easily discernable, even obvious.
What gives? If you studied the wavelengths that make up speech, you’d note that it’s your intuitions about unfamiliar languages that are spot on. Human speech is a nearly continuous stream of noise. This poses an interesting question for psychologists: How can infants start learning what the words in their language mean when they don’t know where words begin and end? Psychologists call this the “parsing” problem.
Though speech streams do not contain many convenient pauses, they do have something infants are surprisingly well-equipped to exploit: transitional probabilities. The syllables within words have a greater probability of occurring in succession than the syllables between words. Take the phrase air traffic control. The phrase could also be parsed as airtra ficon troll. But, with the exception of troll (whose popularity after the heyday of those eminently tufted dolls has perhaps mercifully diminished), parsing it this way doesn’t make much sense: air is rarely followed by tra, and tra is relatively likely to precede fic.
In one of the most famous set of studies in developmental psychology, researchers Jenny Saffran, Dick Aslin, and Elissa Newport (all at University of Rochester at the time) created a miniature artificial language consisting of four three-syllable nonsense words (e.g., lapitu, donegi) repeated over and over again in a random order and without any pauses separating them (e.g., lapitudonegi). They presented the miniature language to eight-month-olds, and then tested them on what they’d learned. Testing an infant, you’d correctly think, is not without its difficulties. But at least they are reliable creatures: they will stop paying attention to something as soon as it bores them. When researchers played both novel words (comprised of the same syllables but in different orders, e.g., tupila, nedogi) and words from the miniature language, infants demonstrated that they could discriminate between the two types. They listened longer to the novel words, finding them significantly less boring. In a second experiment, the researchers tested whether children could learn the more difficult relationship between the syllable strings that made up the words in the miniature language and those that spanned word boundaries (but were still possible utterances, e.g., pitudo, which spans lapitu and donegi). Again, the infants showed discrimination. All of this is especially impressive, given the amount of exposure they had to the artificial language. A day? An hour? Try two minutes.
All of this happens subconsciously, the infants’ brains engaged in statistical computations the likes of which they could never attempt by hand. You do this, too, even when you’re focused on something else, like not letting the opportunist behind you cut the line. So if your wait is more than two minutes long (quite likely) and the lady on the phone happens to be speaking an artificial language consisting of four three-syllable words (a bit less likely), know this: your wait isn’t a total waste of time.