The Null Device
Posts matching tags 'speech synthesis'
Another online speech synthesizer demo; this one (ScanSoft's rVoice), however, has multiple accents, including British (i.e., RP), Scottish, Australian (only in sheila, though, and not bloke), Spanish, and not only American but also Valley Girl (more formally known as "Southern California").
Which is rather nifty; it's good to be able to get synthesized speech that doesn't sound either generic-American or (occasionally) RP-British (which some call the BBC accent, except for the fact that nobody on the BBC talks like that these days).
Apparently one of their markets is call centres and voice-response systems (and some of the voices have normal and call-centre modes of diction). Which could explain the presence of a Scottish accent; apparently, studies in Britain found that Scottish accents are considered the most soothing/least aggravating to call centre callers.
The NYTimes has a piece on Vocaloid, the new singing voice-synthesis program that could automate the last part of music performance still done by humans. Vocaloid is interesting because voices are stored as interchangeable "fonts" of vast numbers of samples and articulation data. The first fonts coming out (from British samplemongers Zero-G) are a pair of soul-singer voices, Leon and Lola:
In the case of Leon and Lola, session singers were hired to record what Mr. Stratton calls "generic soul-singing voices." The decision to start with soul was purely a marketing calculation: Mr. Stratton figured that the most common use of Vocaloid, at least in its early stages, would be to serve as background singers. With a soulful sound, the company could target a commercial market that ranges from Justin Timberlake to Jay-Z.
(Bugger soul singers, I say, just give me Liz Fraser. Or Ian Curtis. A generic French-accented female voice could also be useful for all the post-Stereolab acts.)
The process, of course, could be exploited for mischief, as described below. Though doing so would require a vast amount of raw data, work and expertise to prepare the voice font, something beyond the reach of casual pranksters.
What's to stop dilettantes from creating their own fonts? Could it be long before falsified but entirely convincing clips of Britney Spears begging for Justin's forgiveness circulate on the Web to say nothing of George Bush conspiring with Tony Blair about weapons of mass destruction?
The major market will be celebrity voices, undoubtedly priced beyond the reach of mere mortals, and giving Fortune 500 corporations that touch of class that comes with having Frank Sinatra sing the company song:
Licensing Elvis for Vocaloid would be a different matter, though, says Gary Hovey, vice-president of entertainment for Elvis Presley Enterprises. "If someone came to us and said, `We want Elvis to sing this new song,' we'd have a lot to contemplate," he said. "We tried to retain the integrity of his original song with the remixes. Now you're talking about a whole new vocal performance of a song he never sang or knew? How do we know he'd want to sing it?" "Believe me, that would go all the way to Lisa," he added, referring to Elvis's daughter, Lisa Marie Presley, who owns Elvis's estate.
Once a full palette of vocal fonts is available (or once Yamaha allows users to create their own), the possibilities become mind-boggling: a chorus of Billie Holiday, Louis Armstrong and Frank Sinatra; Marilyn Manson singing show tunes and Barbra Streisand covering Iron Maiden. And how long before a band takes the stage with no human at the mike, but boasting an amazing voice, regardless?
The article then points out that, with this in place, the entire process of song production could be automated. Lyrics could be pieced together from a database of stock phrases or using a narrative engine (though, then again, given how songs can succeed without the lyrics making sense (look at any 90s Eurodance hit), that may not be necessary); instruments can be synthesised (this includes guitars; I have in my collection a program named Virtual Guitarist which does just that, passably if inflexibly in places, though certainly well enough for pop songs), and the mixing can be automated. Finally, the hit quality of the finished product can be mathematically assessed using the Hit Song Science algorithm, and a genetic algorithm used to evolve the catchiest song. All stages of the process (from instrumentation/lyrical content to final scoring) could be tweaked using market research ("Electroclash is out, booty bass is coming back ironically, chip tunes are the dog's bollocks, and 90s grunge retro is due any day now"). And then we may all end up living in a Greg Egan story.
If you can read this, then we're back. A routine machine relocation didn't go quite to plan, but it's all fixed now (hopefully).
And below is the backlog of blog items that didn't get posted to The Null Device over the past few days:
- Your tax dollars at work: A US spy agency as been monitoring webcams at an Islay distillery, just in case they were making chemical weapons instead of whisky. Defense Threat Reduction Agency officials stressed that monitoring Scottish distilleries was not a high priority, but stated that it would take just a "tweak" to modify the whisky-making process to produce chemical weapons. (Hmmm; that suggests some interesting near-future scenarios for potential flashpoints between the United States of America and Britain and a rogue People's Republic of Scotland.)
- An interesting paper on the design of the Google File System, a custom file system optimised for storing huge (multi-gigabyte) files on large farms of fault-prone hardware. (via bOING bOING)
- The latest fad in baby naming in the U.S. involved naming your children after your favourite brands of consumer goods. Looks like Max Barry wasn't all that far off: (via Techdirt)
"His daddy insisted on it because Timberlands were the pride of his wardrobe. The alternative was Reebok," said the 32-year-old nurse, who is now divorced. "I wanted Kevin."
This is only the latest chapter in the boom of giving children unique names.
According to the most recent census, at least 10,000 different names are now in use, two-thirds of which were largely unknown before World War II.
- "We're Gonna Get You After School!" Gibson's Law applies to playground mob psychology, with kids setting up websites and blogs to call their classmates names. This way, technology may be said to have democratised bullying, as it's no longer the musclebound alpha-jocks and the popular rich girls who have a monopoly on making others' lives miserable. (via TechDirt)
One 12-year-old blogger, writing on the popular Angelfire Web site, recently announced she would devote her page to "anyone and everyone i hate and why." She minced no words. "erin used to be aka miss perfect. too bad now u r a train face. hahaha. god did that to u since u r such a b -- . ashley stop acting like a slut wannabe. lauren u fat b -- can't even go out at night w/ ur friends. . . . and laurinda u suck u god damn flat, weird voice, skinny as a stick b -- ."
The author of the article calls for the use of "parental control devices" to stamp out "social cruelty", much in the way that filters have been used to stop pornography. Which sounds more like it would strip those kids put upon by the alpha-jocks/princesses of their online support networks of fellow outsiders.
- More on the internet's impact on human interaction: Internet chat addiction can stunt social skills in introverted adolescents, says a researcher in "social administration". Dr. Mubarak Rahamathulla says that research suggests that chat rooms have contributed to some teenagers fearing conventional social interaction, and becoming more dependent on anonymity or pseudonymity. However, he says, webcams may be a safe, healthy way for to explore their sexuality. Perhaps the future belongs to asocial chatroom onanists, who are into anything as long as it doesn't involve actual human contact?
- The AT&T text-to-speech demo site now has two British voices; the male one sounds somewhat deranged, as if having at some time in the past eaten some BSE-contaminated beef. (via kineticfactory)
- A company is now selling licensed arcade ROMs for MAME. StarROMs currently have a few dozen titles, all from Atari, but plan to have more; games cost between US$2 and US$6 per title, and all are unencrypted ROM images suitable for MAME, with no DRM chicanery to be seen. Let's hope this idea catches on.
- Transcosmopolitan, or Spider Jerusalem's stint as features writer for a women's lifestyle magazine. (via Warren Ellis' LiveJournal comments)
First there were pocket-sized USB flash disks, then USB flash disks with built-in MP3 players (for those whose music collections fit in 128Mb), and now, if an ad on the front page of the Computer Trader (a cheaply printed monthly paper of classifieds and price lists) is to be believed, there are USB flash disks with text-to-speech. It doesn't say exactly how it works, but I presume that you copy text files to it and it reads them to you while you drive/jog/catch the bus. Which could be useful, depending on other things (i.e., how listenable the voice used is, how easy it is to navigate through texts, what file formats it can read (plain text? MS Word? Unicode?).
Yamaha have developed a program for synthesizing sung vocals. Named Vocaloid, the program uses libraries of vocal fragments and articulation algorithms to synthesise realistic singing. It currently comes with a "Soul Vocalist" data set, for all your throaty dance vocal needs. Windows-only, I'm afraid, and no word of VST compatibility; there's a screenshot here. (via Found)
A company has developed speech synthesis with user-selectable accents, including an Australian accent and a Scottish brogue. Wonder on which platforms this technology will be available.