Friday, December 26, 2008

Text to Speech Just Got a Lot Easier

A new program called Textcast has just made it a lot easier for Mac users to produce audible speech from text files.

If you are a geek like me, your reaction to this is probably, "Dude, we've been able to do this for years."

If you're not a geek, then your response is more like, "Dude, who cares?"

Valid points, both. Let me first suggest why a non-geek would care about this technology. In a nutshell, text-to-speech software is part of my anti-boredom survival kit. Although I am endlessly capable of entertaining myself, my capacity for self-amusement is pretty limited when I'm driving.

I spend at least an hour per day in my car, commuting or running errands. This interstitial part of my life can be pretty darned boring. To keep from going nuts while driving, I listen voraciously to all sorts of audio files, including NPR broadcasts, podcasts and audio books.

However, for all the stuff that I can listen to, there are a zillion other things I'd love to hear but that just aren't available in audio format. For example, I'd love to be able to listen to newspaper stories and radiology journal articles in my car. This is where text-to-speech programs come in.

Text-to-speech software has come a long way. In the old days (5 years ago), most of these programs sounded about as lifelike as Gorgo the Space Robot. Fortunately, things are getting better. Nowadays this software is not only inexpensive, but produces fairly decent speech [1].

For the past few years I've been using several "voices" by Cepstral, which cost me a whopping $30 apiece for the premium editions. Depending on your tastes, they will sell you male or female voices speaking not only with American, but also British, German, Scottish, Italian French and Spanish accents.

Besides the vocal quality, another thing I like about Cepstral voices is that I can control them via the command line. If I want my computer to chat about its urinary habits in a British accent, I just type:
swift -n Lawrence -o ipee3.wav -p speech/rate=150,speech/pitch/shift=0.9 "I pee three times a day"

With several similar commands, I can automate the following process: 
  1. grab a page of text from my web browser and save a copy in a folder on my computer
  2. convert a whole folder of these text files into speech
  3. shove these speech files into iTunes
  4. move all of these speech files to my iPhone
If you are a non-geek, this process may leave you a bit cold. In fact, you'd probably rather take out your own spleen through your nose than write a shell script. For you, there is salvation, courtesy of BitMaki Software: Textcast will do all of this heavy lifting for you for $25.

I haven't yet figured out how to get Textcast to use any of my Cepstral voices, but it will use any of the built-in voices on my Mac, including a rather nice voice called "Alex". To my ear, Alex sounds at least as good as any of my Cepstral voices.

Some Clean-up May Be Necessary...

1. Convert the print version of a webpage

Textcast generally does a fine job of converting webpages to speech. However, it will convert every frakking bit of text on the page into speech. This includes headers, navigation bars, URL's, and other cruft. One easy way to clear a lot of this stuff off the page is to select a "print" version of that page, if available. For example, just about every article in the online version of the New York Times has a link to display that article with the headers, footers and nav bars stripped off.

2. Connecticut and Maryand must die!!

It's the odd radiology journal article that doesn't have the abbreviations "CT", "MD" and "MR" sprayed liberally throughout. Unless you want to hear "Connecticut", "Maryland" and "mister" in a lot of unexpected places, replace these abbreviations with "computed tomography", "em dee" and "magnetic resonance imaging", respectively, before converting your articles to speech.

3. Do some quick hand-editing before conversion

Before you point Textcast or some other program at a text file, consider doing a bit more judicious hand-editing. It's usually easy to spot large piles of cruft and delete them. Unfortunately, this can quickly become waaaaaaay tedious. If you find certain recurrent patterns of cruft (i.e. tables of content) scattered throughout your text, a quick global search and replace may be just the thing. For more complex patterns, geeks will want to trot out their grep tool on the command line to properly flense their files.

4. Use a medical phonetic dictionary

Medical and other technical journals contain a boatload of jargon. Thus, for your average AJR article, Textcast or Cepstral will mispronounce these words as badly as a TV doctor. Cepstral can help prevent this by letting you create a custom phonetic dictionary of your own particular jargon. Alex can likewise be taught certain words using Apple's VoiceOver utility.

5. Use a voice with a foreign accent

Here's one final tactic to make your text-to-speech sound less like a robot -- use a voice with a foreign accent. That way, when a word is oddly pronounced, the accent will cancel out the brain's tendency to interpret it as a speech defect. Don't tell your friends that you used a computer to produce these files -- they'll just assume that the speaker is Texan... 

[1] I would even call it life-like, albeit like a live person with a mild speech defect.


Tor Arne said...

Very cool! :)

Also, don't think we didn't notice the use of the word "frakking", no, Sir.

Ron Starc said...

I think Text Speaker is one of the best text to speech software. You can use it for reading all your documents, emails, ebooks, and more. It also has a good selection of the most natural sounding voices.