The Music in Speech

Music of Speech Pic.png

This episode was written & produced by Katy Daily.

The way you speak has rhythm, timbre, and pitch. It’s more like music than you might think. We chat with The Allusionist host Helen Zaltzman, Martin Zaltz Austwick from Song by Song, Music Psychologist Dr. Ani Patel of Tufts University, and Drum Composer David Dockery on how musical our speech really is.


Weightless (instrumental) - Prague
Home - Blake Ewing
Le 15 Decembre - Brique a Braq
Everything is Moving, but not the Sky - Dario Lupo
Lights Out - Utah
No Sun - Steven Gutheinz
Years - Steven Gutheinz

20K is made out of the studios of Defacto Sound and hosted by Dallas Taylor.

Follow the show on Twitter & Facebook. Our website is 20k.org.

Consider supporting the show at donate.20k.org.

Get $50 off select Casper mattresses at casper.com/20k.

View Transcript ▶︎

You’re listening to Twenty Thousand Hertz. I’m Dallas Taylor.

The way we talk is really interesting… and lately, it’s something I’ve been thinking about a lot. See, apparently, when you start a podcast, you suddenly find out about every weird little quirk in your voice. Anyway, what makes someone’s voice interesting to listen to?

Well, recently, I came across a video on YouTube that completely changed the way I think about speech. Basically, it’s a drummer and a bassist playing along to one of the most famous scenes in Willy Wonka.

[David Dockery Drumming clip]

After hearing this, my mind tuned in on just how musical speech really is. Our voice isn’t an instrument only when we’re singing, it’s an instrument all the time.

[music in]

Everyday speech has a rhythm, a timbre, and tonality. ...and without even thinking about, your speech patterns are communicating a lot of underlying meaning.

I talked with a few other podcasting friends about this. I wondered if they think about these things when they’re tracking? Here’s Helen Zaltzman from “The Allusionist”.

[music out]

Helen: I have to think about it consciously, I consider that a big part of my job, because I want to convey some emotion, and some mood, and some tone, all in a couple of sentences.

...and this is Martin Zaltz Austwick from “Song by Song”.

Martin: I think everyone has to think about it. So I think it's happening intuitively, rather than in a more, you're thinking consciously about pitch and tone, and rhythm in the way that you would a musical composition.

Just like different instruments, every voice is unique. Helen and Martin told me what other podcast hosts they love listening to.

Helen: I love hearing Phoebe Judge's voice. That, to me, is like hearing a really low woodwind instrument or something.

[Phoebe Judge clip with woodwind underneath]

She sounds like her thought processes are very clear, and she enunciates things in such a way…

Martin: She's not working stuff out in public.

Helen: Generally not, no. And maybe again, it's that kind of idea that a low voice and a slow voice is confident, and therefore something you can trust, and you should listen to.

[music in]

By that analogy, what would Helen’s voice be if it was an instrument.

Helen: Synthesizer.

Here’s a clip from Helen’s show.

[Helen Zaltzman clip]

How about the host of 99% Invisible, Roman Mars?

Martin: He’s like a kind of John Carpenter, Moog synth.

[clip from 99% Invisible with Moog synth underneath]

Ira Glass, host of This American Life?

Martin: Viola? I think a stringed instrument.

Helen: Yeah I think stringed.

[clip from This American Life with viola underneath]

What about host of The Memory Palace, Nate DiMeo?

Helen: I think Nate DiMeo might be a violin… Like a slow, mournful violin.

[clip from The Memory Palace with violin underneath. violin ends with verb out, brief pause to end segment, start narration]

Any time we speak, we’re singing. We unconsciously vary our rhythm and tonality to create our own unique songs. And with enough practice – you can tune this performance to have more meaning.

[music in]

Dr. Ani Patel: Music and speech are two primary forms of communicating with each other in kind of rich and nuanced and complex ways.

That’s Dr. Ani patel, a music psychologist at Tufts University. He’s been studying how music and human speech overlap in our brains.

Dr. Ani Patel: One of the interesting things is that they sound very different. No one would ever confuse the sound of a cello playing a solo with the sound of a person talking. [Chell morphing into talking voices SFX.] Yet, what an increasing amount of research is showing is that within our brains and our minds, there's more overlap than you might think in how we process those two types of signals; whether it's the rhythm, or the melody, or the structure.

Take for example this next clip. It’s a famous speech that’s been turned into musical data and played by a piano. [music out] See if you can guess the speech.

[play clip: Kennedy piano example]

Here it is again, with a hint.

[play clip: Kennedy piano + original speech example]

...and here’s that same clip played through a digital whistle...and to be clear, the original speech file is not playing along with this. This is all data.

[play clip: Kennedy whistle example]

[music in]

Dr. Ani Patel: That's the power of our internal models. When we have an expectation of what we're hearing and a pattern somewhat resembles that expectation, you can then perceive that thing.

When we speak with each other, we're using a very complicated sound that has many frequencies. Even a single vowel has a whole bunch of different frequencies in it. They have certain patterns where certain frequencies are emphasized more than others.

What that piano piece was doing was it's essentially re-creating that sort of palette of frequencies and energies through piano sounds. It can't capture the way exactly a human voice does, because a piano works in a very different way. It's using the pitches and frequencies that a piano can produce to try and recreate this energy shape of a speech sound.

You program like a player piano to go through these frequency shapes in a really rapid succession in the way that a voice does. Especially if you know what words to listen for, it's amazing how you can pick them out of this sound that sounds nothing like a human voice.

[music out]

It’s still not fully understood why our brains blur the line of music and speech, but there are lots of ways to trick our minds. Take, for instance, the speech-to-song illusion. Psychologist Diana Deutsch found that certain phrases, when taken out of a passage and played in a loop, begin to sound like they’re being sung.

Dr. Ani Patel: It's such a powerful illusion that if you began to hear a phrase as sung, and then you go back and listen to the passage from which that phrase was excerpted, the rest of it will sound like speech. When you come to that phrase, it will just sound like that one phrase is sung [repeated phrase to create a musical pattern]. When you come to that phrase, it will just sound like that one phrase is sung and then it goes back to speech again. It's really wild.

[music in]

Dr. Ani Patel: What makes it interesting is that this doesn't happen for any phrase. You can also find phrases that if you take them out of context and loop them, they don't sound sung at all.

So something about certain sequences of words leads them to transform in this way.

How we choose to sing our words is powerful. It adds a whole new level of human connection. The majority of the time, this is a totally subconscious act, but there are some professions where it can’t be… it has to be thought about and practiced. We’ll hear more about that, in a moment.

[music out]


[music in]

Recording your voice for a podcast, a radio show, or really anything very much feels like a performance. There’s this unconscious grey area between speech and music. I asked Helen and Martin how their hosting voices are different from their everyday voices.

Helen: I think conversational voice is higher, and you are also inviting a particular response from the other person.

Martin: Reading off a script is completely different and hard skill from, extemporizing, which is also difficult and hard skill. It’s like, to bring a script to life is hard. I can’t read, I can’t sight read, for example. I just have to say something, like five times, until it’s like, “Okay, that’s sort of what the words should be.”

Helen: I think it’s common, because often you’re trying to find these cadences, and sometimes it’s almost like scoring your spoken script. And sometimes there’s an unexpected cadence that you don’t work out until you’ve been through it a few times.

[music out]

Of course, podcast hosts aren’t the only ones who need to think about this stuff. Politicians, actors, and especially comedians have to master rhythm. Take for example this clip from King of the Hill.

[King of the Hill clip]

David: My name is David Dockery and I compose drum scores to famous TV and movie scenes on YouTube where I synchronize the drum beats with the actor's words.

I started it more or less just to kind of push myself to work on my timing, work on my musical phrasing and approach. I just saw it as a challenge more than anything else because it was bound to be timing-wise really complex because there's no meter to it. There's no pulse. So, I said if I could do that, it'll surely be good for my tempo and timing.

But what he discovered is that was there was tempo and timing in a lot of these scenes. It’s just not as obvious. Once this rhythm was uncovered, it really highlighted the talent of these actors.

I was thinking of kind of iconic scenes that I’d seen.

[play clip: Willy Wonka scene with drumming]

When people get more emotional, the rhythm becomes so much more pronounced. You can actually measure it right here, because that is the point in the scene where I, as a drummer, start having more fun, I suppose.

[play clip: Willy Wonka scene with drumming]

I have so much more to work with in terms of Gene Wilder's delivery, those lines.

It's just perfectly in rhythm, I didn't have to do much with that at all. I just played a drum beat along with it.

[play clip: Willy Wonka scene with drumming]

I think the reason that those scenes work, where people are kind of at their most heated is just because they raise their voices and that always means it suits drums more because they are just by nature a really loud, obnoxious instrument. When people get really heated up about stuff I think they tend to employ more rhythm in their voice.

[music in]

Dr. Ani Patel: Excited speech is faster. It's more variable in its timing ... and dynamics.

Again, that’s Dr. Ani Patel.

Dr. Ani Patel: Sad speech is quiet. It's slow. It's got less pitch variation.

When we talk about rhythm in speech, one thing that's important to realize is that we're not talking about a steady beat you could tap your foot to. That's kind of obvious in some sense, right? We don't dance to ordinary speech.

But linguists will tell you that speech has rhythm. The way that the syllables are patterned in time, the way accents are put on words, the way phrases are created, all have a characteristic pattern in a given language. That's rhythm.

Same with melody. Where we put the pitch accent, how many words tend to get emphasized using pitch. When you put those two together, you end up with a very characteristic sound.

[music out]

The starkest contrast is between a happy voice and a sad voice.

[Parks and rec clip]

You can hear it in their voice. It's fast. It's a lot of pitch variability. The voice has a bright kind of timbre to it.

[Parks and rec clip]

What are you hearing that that lets you read that emotion? Well, their voice is quieter. It's slower. There's much less pitch variation. It has a darker kind of timbre to it.

[music in]

The music of our speech works in tandem with our words. Together, they raise communication to another level, and arguably a more natural level. But, lately, our world has been moving away from this. We now tend to text, email, or comment on social media largely without our voices...and sometimes it actually feels kinda weird to just call someone. What are we losing when we don’t communicate with our voice?

Helen: What is the best way to convey how we speak, but in a readable form? I really don't know. You almost need the words, and then like a little graph of emotion to overlay it, so with the intonation.

Martin: I just think there's so much in the English language, where you can completely change the meaning of a sentence, just by the way that you say it.

Dr. Ani Patel: We have to remember that humans, over many hundreds of thousands of years of evolution have become extremely attuned to the sounds of each others' voices. And pulling out nuances, and reading these kinds of signals that we give each other through our voices. And when we communicate through texts or through email, we're just not using that. And so, cutting off that rich part of how we read each other's emotions, feelings, intentions, thoughts, moods, and so on.

I think part of that is this emotional connection that happens when you hear a voice, as opposed to just reading a silent message.


Twenty Thousand Hertz is produced out of the studios of Defacto Sound, a sound design and mix team that supports ad agencies, filmmakers, television networks and video game publishers. If you work in these fields, be sure to drop us a note at hi@defactosound.com.

This episode was written and produced by Katy Daily...and me, Dallas Taylor. With help from Sam Schneble. It was sound designed, edited, and mixed by Colin DeVarney.

Thanks to Helen Zaltzman from the Allusionist, and Martin Zaltz Austwick from Song by Song. You should immediately go subscribe to both of those podcasts! Martin also makes music under the name Pale Bird. Check out his music on Bandcamp or at martinzaltzaustwick.com. Also thanks to Dr. Ani Patel of Tufts University and David Dockery. You can find more videos of David drumming to film and TV scenes by searching “David Dockery” on YouTube. The clip of the drumming and bass guitar you heard at the top of the episode was from Fabiano Mexicano’s youtube channel.

The music in this episode is from our friends at Musicbed. They represent more than 650 great artists, ranging from indie rock and hip-hop, to classical and electronic. Head over to music.20k.org to hear our exclusive playlist.

You can find us all over the internet by searching Twenty Thousand Hertz. That’s Twenty Thousand Hertz all spelled out.

We’d also love to hear from you! Especially your actual voice! If you want to tell us something record it as a voice memo and email it to hi@20k.org.

Thanks for listening.

[music out]

Recent Episodes