The Art of Word Learning

By Jess Love | July 5, 2012

The art of learning words is not, it would seem, hard to master. Six-year-olds, for whom hand-eye coordination is still iffy and toy-sharing a contentious proposition, already possess a sophisticated vocabulary: around 9,000-14,000 words, according to Harvard University researcher Susan Carey. And yet many scholars, including the philosopher Willard Van Orman Quine, have noted that word learning should be extraordinarily difficult to master, since it involves mapping labels to events in the world, and real-world events are infinitely complex. If someone at a zoo points to a leopard and says “leopard,” is she referencing the entire creature? The spots? The slinky gait or the faux savannah backdrop? Or perhaps felines, or large animals, more generally?

But thankfully, we don’t just hear “leopard” once. We hear it again and again, when the leopard is close, or too far away for its spots to be visible. We hear “leopard” whether the creature is stalking or sleeping, while visiting the zoo or while watching an Attenborough special. We are likelier to hear “leopard” in the presence of leopards than elephants or flamingos or lions. That is, because every word (with very few exceptions) occurs in a slightly different set of contexts from every other word, we can use this entire set of occurrences to constrain our hypotheses about how to map word to world. Researchers call this “cross-situational learning,” and at least under optimal conditions we are pretty good at it.

But sometimes what a set of situations has in common isn’t obvious at all. Verbs, which describe often abstract relationships between objects, pose an especially difficult problem. We give presents at Christmas, directions at gas stations, kisses at dances, birth in hospitals, and advice all the time, and none of these events superficially looks much like the others.

In 1999, University of Pennsylvania researchers Jane Gillette, Henry Gleitman, Lila Gleitman, and Anne Lederer (now Annie Duke of professional poker fame) presented undergraduates with short clips taken from videotaped interactions between parents and toddlers. For each of the 24 most frequent nouns and 24 most frequent verbs spoken by the parents during these interactions, researchers selected six representative instances. They created clips containing the target nouns or verbs, as well as the context in which the target word appeared (approximately 30 seconds before the word and 10 seconds after). These clips were then presented to participants silently, with a beep indicating exactly when in the clip the target word was uttered. After viewing a clip, participants had to guess the target word, revising their guesses after each subsequent clip until they had seen all six.

How did they do in this simulation of cross-situational learning? For nouns, reasonably well: the participants correctly identified 45 percent of the nouns after viewing all six clips. But participants correctly identified just 15 percent of verbs. A full third of the verbs—including quite common ones like “say,” “think,” and “make”—were never correctly identified by any of the study’s participants.

In subsequent studies, participants did show improved verb learning. When participants were shown an alphabetized list of the nouns spoken during each segment in addition to the clip itself (e.g., me, tower for a clip of a mother saying Make me a tower), successful guesses hit 29 percent. And when participants were presented with the complete sentences (with only the verb omitted—e.g., VERB me a tower), they made correct guesses 90 percent of the time. But this additional information was linguistic in nature; in other words, it was the very sort of information that the youngest, most naïve word learners presumably don’t have much of.

So how do word learners do it? Well, six is an arbitrary number. Perhaps 10 or 50 clips (or even six much longer clips) would do the trick. In addition, in real conversations, social information such as a partner’s eye gaze can help constrain where we look. But it remains the case that learning any word more abstract than leopard shouldn’t be nearly as painless as it is. And so people like me conjure up experiments in hopes of determining what additional capacities (be they “language instincts” or something less specific) allow us to succeed.

Permission required for reprinting, reproducing, or other uses.

Jess Love is the senior director for the Ryan Institute on Complexity at Northwestern University’s Kellogg School of Management.

May 25, 2026

published by phi beta kappa

Print or Web Publication

published by phi beta kappa

The Art of Word Learning

Monday

May 25, 2026

published by phi beta kappa

subscribe

Print or Web Publication

published by phi beta kappa

The Art of Word Learning