The scene is only too easy to imagine: two women are talking at a noisy cocktail party, one sharing a secret with the other, in a voice that just barely rises above the music and laughter. Then, at the exact moment she reveals her juiciest morsel, as if on cue, the crowd hushes. Her words ring clearly throughout the room, “ … And my pants split!” Heads turn. If this were a movie, the camera would then pan to our mortified lead.
With this much routinely at stake, one has to wonder why we attend cocktail parties at all, though in general, we are quite good at monitoring our own speech—for volume, yes, as well as for much subtler qualities. Over the course of a day, we’ll constantly make slight adjustments to the way we speak until what we actually produce matches what we wanted to produce. Shira Katseff, now a postdoctoral researcher at the University of Canterbury, and her colleagues at UC-San Francisco and UC-Berkeley, John Houde and Keith Johnson respectively, set out to explore just how such monitoring works.
The researchers asked participants to sit in a soundproof booth. Words flashed on a computer screen, and the participants read them aloud into a microphone. As each participant spoke, her own speech was relayed back to her through headphones. But there was, of course, a catch: unbeknownst to her, a computer had altered the incoming speech signal such that the vowel in the word that was replayed to her was ever so slightly different from the vowel she’d produced. Participants who had said, “head,” might have heard a replay sounding something a tiny bit closer to “had.” What the researchers wanted to know was what participants would do the next time around: Would they alter their speech in response to the feedback?
It turns out that the speakers did compensate—they shifted their own vowels in the opposite direction (pronouncing “head” slightly more like “hid”) such that, with the altered feedback, “head” actually sounded the way it should. Note that such compensation was not conscious. The feedback was extremely fast—the alteration took an undetectable 12 milliseconds (it takes approximately 20 times that long to silently read a word)—and participants never suspected that they were not being played back their exact speech tokens.
But the researchers were expecting some compensation. What’s more surprising is that, when Katseff and her colleagues systematically varied the amount that vowels were adjusted, they found that more adjustment led to less compensation. This suggests that when the match is too “off”—when auditory feedback is too unreliable—we’ll just stop listening to it as much.
Instead, we’ll rely more on another aspect of speaking that we monitor: our muscle memory, or what we expect our tongue and vocal tract to do to produce a given sound (researchers call this somatosensory feedback). And it goes both ways. Other studies suggest that under local anesthesia of the mouth and tongue (which makes our somatosensory feedback less reliable), we’ll increase our reliance on auditory feedback. We therefore not only monitor the way our speech sounds and feels—we also monitor what to monitor.