[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Access-activists] Re: [Accessibility] Call to Arms

From: Eric S. Johansson
Subject: Re: [Access-activists] Re: [Accessibility] Call to Arms
Date: Wed, 28 Jul 2010 13:29:44 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100713 Thunderbird/3.1.1

 On 7/28/2010 12:58 PM, Christian Hofstader wrote:

rms: However, you say that the free software speech recognition programs
are so far behind that you consider them unusable.  If that is the
case, then we simply cannot recommend ANY program that works with

cdh: Has anyone actually done an objective study of the FLOSS speech reco engines with an eye on comparing them to DNS, IBM ViaVoice/ETI Eloquence, the dictation built into MS Windows Vista and 7? We all seem to be working under an assumption that DNS is superior to all others but have we tried a real world comparison?

Command and control 5% WER (word error rate)
Medium vocabulary 15% WER
Large vocabulary 30% WER
Large vocabulary short utterances 50% WER

if you have noisy audio or some accent multiply this number on 2.

That's pretty frightening. With NaturallySpeaking, I get one error in 10 to one error in 20. I think that would make the word error rate 5% to 10%. remember also that large vocabulary for Sphinx is about 20,000 to 30,000 words. It's also non-real-time response at that vocabulary size. NaturallySpeaking is at 120,000 words and still not real time but close enough.

ALso, after we get going on our corpus collection project and train the FLOSS engines, we should do another compare and contrast between the currently existing engines. If the process shows us that the libre engines, after the retraining process, work reasonably well, we will have an acceptable alternative to DNS. Of course, if we don't know how well the different engines work relative t each other today, we have no baseline from which we can start to measure improvement/decay anywhere.

You should build a test framework so that you can run it against NaturallySpeaking as well as against our tools.

cdh: Can someone volunteer to read some standard bit of modern English, perhaps a chapter from Harry Potter or some other relatively simple vocabularly set, into a bunch of different FLOSS and proprietary engines and publish the results for each of us to check out?

I think we should be using something from project Gutenberg or something with a creative Commons license.. I use JFK's speech when training.

Have you selected a microphone and audio input device? If you're willing to take just anything, I have a VXI b 100 microphones and it's Siamese twin USB audio pod.

cdh: Also, I've never tested this but I've heard that DNS does poorly with command and control tasks as it prefers streams of speech rather than one or two words at a time. If this is true, it may be a really poor solution for the programming by voice solution as, in this modality, lots of single terms will be more necessary than continuous speech.

That's more of a rumor than reality. Lots of people have created macros within DNS itself, unimacro, vocola, and dragonfly which perform quite nicely.

Also, don't think single words for command and control. We don't want to add the People's damage. Speak as your brain wants you to. This will be easiest on the mind and easiest on the voice. Speaking the keyboard (i.e. short phrases or cryptic phrases) really are counterproductive. I remember one guy who used Uber short phrases like grunts and squeaks to speed up his recognition and I've noticed he's no longer on the net or visible in the speech recognition world. There may not be a connection but...

reply via email to

[Prev in Thread] Current Thread [Next in Thread]