Like many creative breakthroughs, this one started with an annoyance. I had just upgraded my copy of Spell Catcher, the brilliant system-wide spell-checking program. Spell Catcher watches what you’re typing in every program and then either corrects or announces your spelling mistakes as you make them. Instead of teaching each program on your computer that your name is spelled correctly, you do it once, and Spell Catcher watches your back from then on.

Spell Catcher (which runs on Mac and Windows) can play a sound when you make
spelling and grammar mistakes. It’s terrific for annoying co-workers—especially
with the cruddy built-in sounds.
Spell Catcher’s auto-correction feature is also invaluable. Whenever I type “adr,” for example, the program dutifully types out my address. When I type “wb,” it spits out the complete code for a web page, gnarly doctypes and all.
The problem was that the alert sounds that came with the program were poorly recorded and grating. And because I’m a clumsy typist, I heard them a lot. Listen to the hiss, microphone thumps, and lackluster performance in these default Spell Catcher alerts, which I suspect were originally 8-bit files:
Misspelling (8KB)
Capitalization (12KB)
Punctuation (8KB)
Curious (8KB)
There had to be a better way. That’s when I thought of speech synthesis.
I’ve been a fan of speech synthesizers for years. Sometimes when I wanted to remember something I needed to do at the end of the day, I’d render an audio file and put it in my Shutdown Items folder, so it would play when I turned off the computer. I’ve also written scripts that command the Mac’s speech synthesizer to speak alerts:
This script speaks a more detailed version of the alert dialog, and then
gives spoken feedback (“Cool!”) as well. I spelled “your”
phonetically to make it sound clearer.
Here’s the speech synthesizer’s output, generated with the Macintosh Whisper voice:
Hello Dave (44KB)
Surfing over to AT&T’s online speech synthesis demo, I quickly created high-quality AIFF files to replace Spell Catcher’s gritty alerts. The demo offers a variety of voices. I settled on the James Earl Jones-esque “Rich” voice for several Spell Catcher alerts:
Rich: double-word error (12KB)
Rich: spelling error (12KB)
Despite using creative phonetic spelling, I couldn’t get Rich to say “punctuation” with enough conviction, so I tapped the “Mike” voice instead:
Rich: punctuation (12KB)
Mike: punctuation (12KB)
Then, with a nod to AOL, I decided to replace my e-mail alert. To make mail call more special, I picked a friendly female voice:
Lauren: You’ve got some mail (16KB)
For more urgent mail, I chose Rich:
Rich: Mail Alert (24KB)
The AT&T Labs online text-to-speech demo generates downloadable audio
files. A related page offers slightly higher
quality versions and some different voices.
According to the AT&T site, the files it generates are for your own amusement, not commercial use. But there’s a ton of amusement potential once you start making the foreign voices speak English phrases. I don’t feel so bad about making capitalization errors when I'm reprimanded by a French maid:
Juliette: Capitalization error (16KB)
And feeding lines of gibberish to the vocal robots can produce some wonderful sounds. Here’s one I like to add to music mixes (at an almost subliminal volume) for an otherworldly effect:
Latina murmur (20KB)
I also liked this British fembot’s pronouncement, though not its timing:
UK Boop-Bop (48KB)
To make the delivery more rhythmic, I imported the “Boop-Bop” file into Ableton Live and adjusted the timing of individual syllables. While I was happily immersed in Live, I received an e-mail from a friend complaining that his fancy new digital recorder had generated hideous digital noise during a recent jam session. He included an excerpt.
Rising to the challenge, I imported the noise into Live and chopped it into a groove by applying rhythmic volume envelopes. Then I added a drum loop from the BT Breakz from the Nu Skool sampling CD, a drum fill I’d created in Propellerhead ReBirth, an Access Virus synthesizer loop, and—for good measure—a sampled politician yell. Here’s the happy result:
Boop-Bop Noise Jam (576KB)
There’s something oddly compelling about hearing computers speak. In an interview for The Art of Digital Music, I asked Ableton designer Robert Henke (a.k.a. Monolake) if he had any signature sounds. He replied, “I have a few. I once made a sample of the Macintosh speech synthesizer’s whispering voice, completely deconstructed it using some granular [synthesis] techniques, and then made a loop. That’s an all-time favorite. I can always use it on stage. If I don’t know what to do next, I just turn up that loop, and then I can think for a while.”
One of the highest compliments people pay to an instrumental performance is to compare it to the human voice. When we describe expressive instruments, we talk about screaming guitars, or singing sustain, or wailing saxophones. Try adding some synthetic vocals to your next musical production. You just may find they make it more human.
“Voices from the Machine”: An introduction to speech synthesizers from a musical perspective.
KAE Labs VocalWriter: An integrated singing synth and sequencer for Mac OS 9.
Yamaha Vocaloid: A state-of-the-art singing synthesizer. Check out the copious MP3 demos.
Copyright © 2009 O'Reilly Media, Inc.