[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Accessibility] Call to Arms

From: Eric S. Johansson
Subject: Re: [Accessibility] Call to Arms
Date: Mon, 26 Jul 2010 14:44:25 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100713 Thunderbird/3.1.1

 On 7/25/2010 10:52 PM, Richard Stallman wrote:
     I was speaking shorthand. It's not an add-on to
     NaturallySpeaking. It is an add-on to the communications framework
     between recognition application and user application.

Something like that might be independent enough of the recognizer
to be a valid project.  But there ARE free software packages for
speech recognition.  So people should develop it to work with them.
If users can also run it with NaturallySpeaking, that is ok,
as long as we don't suggest it.

Sorry, I really have to correct this and correct it hard.

***there are no large vocabulary continuous speech usable speech recognition engines out there today *** From what I can tell, Simon is the closest and it's pretty far. Sphinx is a great tool to keep grad students busy. To keep the speech recognition rate high enough, you need to keep your vocabulary in the 1000 word range. to keep your accuracy high enough you need to keep your recognition vocabulary in the 1000 word range. I spoke with the Sphinx numeral four developer about using this when I was part of the oopen-source speech recognition initiative and he told us, it's IVR only. Don't even think about using it for dictation.

When we did a survey of all the available packages, the closest one we found was the MIT dugout package. But its creator admitted it was missing all of the language model, acoustic modeling etc. that it needed and it was better than all the alternatives other.

Need a first step for this whole process should be collecting a corpus for training and experimenting with different recognition parameters. you need to have one before you can ship a working recognition system. Hopefully you can get it done a couple of years. Dragon Systems took something on the order of a year or two with a heavy interview/recording schedule for the baseline and then kept gradually improving it.

However, I think we should not include such things in THIS project,
because we need to focus energy on the goal of making those free
recognizers better.  For us, replacing important proprietary software
takes priority over advancing the capabilities of software.
Richard, you use the language of someone who has no clue about how difficult this problem is. I've been friends with speech recognition developers and living with speech recognition for 15 years. I have an idea of the problems they've encountered. I have no idea how to solve them nor do I fully understand them but, I've learned enough to have a clue. I'm not saying you don't know anything about the problem but your language and expectations expressed are frightening me because expectations based on what I'm seeing have been responsible for the failure of more than one speech recognition tools program, something much less complex than the recognizer/line with model/acoustic model/audio processing/predictive search engines/correction systems/training... that all go into a full recognition system. and we still haven't started talking about how badly screwed up the Linux audio sound system is. If you want good speech recognition, you better need to rewrite the entire audio system to make it work better for speech recognition. This really is a big chunk of work you are biting off.

ask yourself this question: why was NaturallySpeaking the only large vocabulary continuous speech recognition product on the market? (hint: it's really f-n hard and it's a small market)

I want this to succeed but it's got to have real expectations and most importantly serve the needs of the users because unlike any other Project you've ever been on, the users are the most important thing. Yes, I know this is counter the free software foundation philosophy but being injured, working with other injured people, I can't see myself looking at this project in any other way but compassionate. doing otherwise is just wrong according to my spiritual/ethical/moral/greedy self-interest foundations.

I really apologize for being blunt. You have been one of my heroes for a long time but I am willing to kick even my heroes in the shins if I think he is going really wrong and I think you are going really wrong. If would help any, I'm could come down to lunch and talk about some of these issues the next time you're in Boston. if I remember the location correctly, We could probably ask the guy (gs) to the left of your office to join us and act as a moderator/referee :-). He knows me through ATMoB.

for the meantime, I'm going to have to drop this but, it is extremely important. I am willing to help out with requirements for the toolset and basic thinking about what the user needs to do until we get to the point where I can write code using speech recognition. Are you okay with that?

--- eric

PS, not sure where to fit this in but it's an example to think about. VR-mode is a bridge between Emacs and NaturallySpeaking. It gives full voice control and editing control like NaturallySpeaking doesn't proprietary programs. If it would work more consistently, I would be using Emacs instead of proprietary programs which work better with speech recognition.

here's another thing. If I had VR mode working, I would be able to write a moderate amount of Python code with a bare-bones recognition system.

This is the kind of incremental migration to a freer software environment that I'm hoping for. First you modify the applications, then come with proper bridge design, you pull out the evil proprietary stuff and replace it with a good free stuff. right now, it's proprietary software all the way. I have no choice if I want to work or play. I really hate it and I want to be working with free software that works well with disabled users.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]