Re: [Accessibility] Can you help write a free version of HTK?

accessibility

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Accessibility] Can you help write a free version of HTK?

From:	Eric S. Johansson
Subject:	Re: [Accessibility] Can you help write a free version of HTK?
Date:	Mon, 12 Jul 2010 13:03:59 -0400
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100608 Thunderbird/3.1

 On 7/12/2010 4:24 AM, Bill Cox wrote:

Hi, Eric.  You make some good points below.

and I'm glad we are both in agreement on the chunks of this problem. I'll pickaway at responding to these questions because the they all deserve careful answers.

On Fri, Jul 9, 2010 at 2:06 PM, Eric S. Johansson<address@hidden>  wrote:

The consensus of this August body was that all of the speech recognition
toolkits out there (Julius HEK, Sphinx) were all designed to keep graduate
students busy but not designed for use in the real world. I did take a look
at Simon, it looks like it's the closest of the bunch but I estimate it
somewhere between 5 to 8 years away from being useful (i.e. on parity with
NaturallySpeaking).

I agree that these tools are grad-student research oriented.  I also
agree that a major rewrite may be required to compete with Naturally
Speaking.  However, I disagree that we have to be competitive with
Naturally Speaking to be productive programming by voice.  I used
Dragon Dictate in 1996, and later Naturally Speaking.  I found that
Dragon Dictate was just about the same as Naturally Speaking in terms
of productivity for writing code.  The main problem was that Naturally
speaking would make me pause between commands, like Dragon Dictate,
and only did continuous recognition for dictating text.

I apologize but I'm probably going to go on at length about issues andprogramming by voice. I will harp on issues of not damaging a person's throat aswell as vocabulary and name structure after that, I apologize.

My rationale for requiring the equivalent to NaturallySpeaking is twofold. Firstis programmatic control over the environment and second is vocal load.NatuurallySpeaking has far better control over its environment through ssometoolkit like dragonfly then DragonDictate ever did. I believe that's because ofthe environments (Windows versus DOS). Having used DragonDictate, I found itincredibly wearing on my throat. I had to go to a throat specialist in straincaused by speech recognition and then had to layoff the product for a few weekswhile while the muscles recovered. This is not an uncommon story. Almost everysingle person on the voice coder mailing list has had this kind of event happen.Discreet utterance speech recognition really does nasty things to your throat asdoes non-natural sentence construction in commands with NaturallySpeaking. Youfind yourself wanting to say more but you have to tighten up your throat stopbecause the grammar doesn't match the way your brain wants to speak.

I know this is random but remember it. One of the things Susan Cragin did inOSSRI was she managed to get the dougout recognizer relicensed. I think it'sunder an MIT/X11 license. I'll CC her on this message (hey Susan, sign up andsay hi to everybody)

In any case, there's one thing to remember about programming by voice. You needthe same vocabulary size for coding as you do for writing. This two reasons forthat which I will let you think about and I will address later.

Simon's approach can reduce the active vocabulary at any time to
perhaps a couple of hundred words or less, apparently enabling high
accuracy.  If we could have continuous command recognition, we could
easily beat my old productivity.  I read that there's a newer tool
called vocola, which enables continuous command recognition with
Naturally Speaking.

vocola, Unimacro, dragonfly all do continuous recognition macros. with rawnatlink, you can even do it in Python.It's a hard problem, I agree.


> Perhaps I'm somewhat wishful in my

thinking, but no matter how I do the calculation, I estimate we have
many times more potential volunteers than such a project will require.
  I think the main trick is finding advisors who do have the extensive
knowledge about how to make good recognition engines, and effectively
organising volunteers.  I think you probably would agree that
Naturally Speaking is not the only good recognition engine ever sold.
There should be experts around from failed or abandoned efforts who
could help as advisors.  Give me one or two of those guys, a dozen
motivated volunteer voice coders, and three years, and I think we
could get there.

I agree about many good recognition engines out there. They all fail indifferent ways but they can work. One of the huge challenges we will face isnavigating the patent land mines of other peoples technology. I think this isone of the reasons why nuance is been on an acquisition been. I believe They'retrying to buy up as many patents as they can to protect themselves against anymarket intrusion. one way to defend ourselves would be through our own licenseacquisition. For example, look at what Cornell did with video codecs. Thepatent license terms said that if it was used in an open-source project, thenthere was no charge and no risk of being sued by Cornell for infringement. If wecould build a similar licensing patent portfolio from other players, that mighthelp us take advantage of pr-existing work versus reinventing the wheel to getaround the patent.

I'm using Google's cloud-computing gmail service to write this e-mail.
  I typically review them with a closed-source binary TTS called voxin.
  I've been contacted by Skype twice today, and I've watched a couple
flash videos.  I think we are in violent agreement on this point.
People with disabilities need solutions, not a philosophy.

wow. I have Google accounts but I rarely use them and certainly not for anythingimportant. :-) But yes, people with disabilities do need solutions first. Weneed to, as Christian philosophy says, teach them how to fish at the same time,we, as the Buddhist philosophy teaches us, need a right livelihood.

Let's look at where we are.  In the early 1990's a tiny company wrote
Dragon Dictate, using the signal processing hardware in the sound card
to make speech recognition on PC's useful for the first time.  They're
market was exclusively people with physical impairments.  I discovered
them in 1996, when I needed them to remain a programmer.  There may
have been some new code written by the community to get around the
crap we get from Nuance, but it seems that the tools they ship hasn't
improved programming by voice significantly in well over a decade.
Instead, they focus on helping us write emails faster.  How nice.
Look at where the real innovation in this area is coming from.  Is it
from Nuance, or the user community?  For future innovation, where
should we look?

I remember when I was first injured and a friend set me up with a 486, 16 MB ofRAM in a lunchbox case that I would carry from customer site customer site usinga luggage two wheeled cart. I thought it was so wonderful when I got my firstlaptop that weighed 10 pounds.

Yes, I don't think NaturallySpeaking is really improved since version 6. It's alittle more accurate, more stable, doesn't make Windows puke quite as often butI think all they have done since version 6 maybe seven is fix bugs. However, Iwill suggest that writing e-mails faster is not a bad thing. I write fiction fora hobby and if I can improve accuracy, I can write more because editing sucks.

when you look at a piece of rough text and try to change it, you really see thelack of inventive or creative effort necessary to make editing easier. Because Idon't use speech recognition enabled editors, I can't say something like "selecta sentence containing "brilliance of her smile" and have a sentence placed intoa dictation box for editing. And yes, I deliberately used an odd number of quotemarks because, why do you need to" was on the end of the line in a command mode.Also, it insistence of using the Windows selection mechanism (drag with mouse)makes it difficult to select a small number of words if your hands are likemine. You really want something I can Emacs Mark in point so that you can use atablet or even a mouse and say "leave market" and "end region". Yes, I left theprevious sentence uncorrected just because was too much work to drive the mouse.

I believe innovation comes from people like us. Back in the bad old days ofDragon Systems, disable users would be brought in occasionally to experimentwith different interfaces or talk about their experience with the product. Iwould make some radical changes if I had sufficient hands to write the UI. Forexample, I would make dictation box with filters on both the input and output soyou could modify code to look like English text thereby enabling familiarediting patterns in a dictation box. And I'm output, I would retranslate thetext back into code. But also I want plug-ins on dictation box to make itpossible to edit other things.

a great example of where this editor can help is in HTML e-mail. I need togenerate and receive it should not e-mails when dealing with customers. Yeah, itsucks but it's reality. Thunderbirds editor is a stinking pile of bird poop whenediting by hand and even worse by voice. Using a dictation box model as Idescribed above, one could translate HTML or HTML fragments into something onecould edit by voice. We could do this without needing to touch the application.Iwas there.


> I also bought every

microphone that seemed promising at improving recognition rates.  By
the way, what do people feel is the best microphone now days?

there is no one best microphone. We do not have sufficient information todetermine which microphone works best with a voice and a computer system with asound card. You buy microphones until you find one that works best and then youstick with it religiously. I think I said elsewhere, VXI is the only one thatworks with my voice. As soon as circumstances permit, I'm going to try and getthe current Bluetooth headset. The previous one was the most wonderful headsetavailable but unfortunately, the battery charging system, Bluetooth pairing, andI did not get along real well. Something was funky and I had to repair everytime I charged which was twice a day. Serious nuisance.

I do a ton of volunteer work for Vinux, which is Linux based on
Ubuntu, customised for the needs of the visually impaired.  People
often post emails saying, "Today I'm switching my main machine to
Vinux!"  I generally suggest that dual-booting, or having Vinux on a
virtual machine is the way to go.  Vinux is not as productive an
environment as either Windows with JAWs or Mac for the blind, at least
not yet.  However, we aim to be better than either.  To get there as
rapidly as possible, I would like volunteers to continue using what
works best for them.  Except Sina.  He should switch to 100% Vinux
today!

that's really cool. Unfortunately, I'm not in position to do a whole lot ofvolunteering. Need to take care of fundamentals first.


> I agree. you get flamed badly if you suggest people could be more productive

with proprietary tools.  Frankly, it's a bit scary discussing this on
a gnu.org list.

heh. the way I would manage that particular problem would be to develop selfcontained components that can be GPL' ed to death and others with more generousintentions could work on the bridge.

However, FOSS seems to be the only way that we can organise many
volunteers from around the globe to work together to write and improve
accessibility tools.  This isn't about ideology or politics or
freedom.  It's about people like us who are fed up with being second
class citizens, and tired of begging for access to new technology.
This is about programmers like us taking control over the future of
accessibility, because we're not going to get what we need otherwise.

and my snarky frame of mind, any collection of thoughts unified by a singlepurpose is an ideology. It's okay because I think you hit the Crips ideology onthe head. handicap accessibility is too important to be owned. we should notput up with being second class citizens and we should own the means ofproduction. Unfortunately, there is a difference between accessibility tools(speech recognition, text-to-speech etc. and the ability to use thataccessibility tool with an application or system. I haven't quite figured out ashorthand yet but something like accessibility tools versus accessibilityavailability is close. we need tools and we need access to other platforms thatemployers and governments use.

Why not do both in parallel?  There are so many of us, yet each of us
has unique gifts and skills.  Most of us should do as you suggest, and
work at the application level to improve accessibility.  I think some
of us should become SR and TTS experts and work on the next
generation.  Actually, if I didn't have to work so hard with glue and
tape to make Vinux work, SR and TTS is the sort of thing I'd probably
do well at.

You are far more optimistic than I am. My experience try to get Emacs updatedand dtach modified for crip use has not been successful at attracting help eventhough they are far more useful on day one then a new speech recognizer.

As for a pool of experts, we can try mining the OSSRI BOARD OF Directors forpossible candidates. That's something we'll have to talk to Susan about.

When I do simple estimates, I just can't see how we don't have enough
potential volunteers to do this.  I just can't believe that 99.9% of
us with RSI injuries or visual impairments are the sort of people to
sit on our butts and do nothing.  From what I've seen, a fair
percentage of us happen to be decent programmers, and are the sort
that refuse to believe we have limitations.

I can unfortunately. Because programming by voice has been so difficult and thehostility of employers to anyone using something like speech recognition in openoffice plan, many programmers, including myself, have left the field. Somemigrated to completely different fields such as bicycle design and others, likemyself, have become self-employed as it's the only way to insulate oneself fromcorporate stupidity and the egregious workloads that injured us in the first place.

Perhaps I have a strong voice, but I spoke non-stop to my computer for
10 hours a day for over three years, and found that all I had to do
was sip water constantly.  I programmed by voice using macros,
eventually writing over 1,600 of them, mostly to control emacs.  I
think it was the best way to continue my career, without giving into
my typing limitations.

You are a very different person than I am. I was able to program in Python usingEmacs with less than 50 macros. I could not remember 1600 of them. somethingabout RSI and its treatment messes with your memory. Most developers I've knownwould not be able to remember 1600 macros as well as the entire body of codethey are working with. When I have written code, I have changed how I writeclasses as a way of accommodating my memory deficits. I also tried to write asmall number of macros that were easy on the voice. as I said before also manydevelopers suffer vocal strain at a far lower level of effort than you have putyourself through. memory shortcomings are something else we will need toaccommodate. I think this is the driving force behind the methods I've developedfor exploring a speech interface. I can't remember what I'm supposed to say nextso, the system should prompt me and gave me the ability to navigate within thatprompt. the great example is change directory. It's a delightful intellectualexercise as well as demonstration of the flexibility of a discoverable speechinterface

I am very interested in ideas like you suggest for enabling
applications without modifications, and doing anything that reduces
vocal and cognitive load.  We need new ideas, and I agree with your
point about not needing another useless type-by-voice project.  Part
of the problem is that many of these projects are funded by well
meaning institutions, but implemented by people interested in research
and their own careers.  I think the code we write would be far better
focused on our own needs.

Okay, this is a conversation why have far more time and possibly one message pertopic. Should pop up in the next week or so.

Sorry, but I have to ask: if you can dictate e-mail, why can't you
write code?


that's a real good question. I think the best answer is:

       If it's too difficult to do, it's not worth doing until it's simple.

this is the classic programmer hubris, laziness, arrogance all rolled into one.It's actually design philosophy for me even before I was injured. If it's hardto do, you're doing something the wrong way. You don't understand the problem.You don't even know you're an idiot. When you sit down and answer all of thequestion the back of your mind creates and manifests as "I'm not comfortablewith this" only then should you start thinking about implementation.

Now, I did write Python byte code. I created a Web framework with a markuplanguage that accommodates disabled users. It will work with speech recognitionbut it will also, theoretically, be accessible to blind, text-to-speech users.It's simple, the current implementation is a bit of a pig but I just wanted toprove the concept of the usability of a disabled user focused markup language.


It's on launchpad under the name "akasha"

Python is the only language I've seen so far that isn't completely hostile tounenhanced speech recognition. I can't manipulate C., Java, or any otherlanguage with the same ease. I consider the whole C. language family is soungodly hostile to speech recognition it's the take a huge interface layer tocross between the two.

I bet you're asking why. An overabundance of special characters with specialspacing. I shouldn't have to do that. The environment should know enough aboutwhat I'm saying to put things in the right place. Jumble cap misspelled wordsused for symbols. Again, why should I have to spell that. I should really saythe nearest English equivalent and the tool translates. These two featuresalone will significantly drop the vocal load of programming by voice. They willreduce the cognitive load of trying to remember how to generate that symbol.Done right you will be able to edit a misrecognition in the middle of amisspelled word, possibly even before you inject it into your code. By using thedefault code and simple style, code generation will be easier on so many levels.


I could say more but I will spare you. :-)

Anyway, you don't have to type code to contribute.  I
would like to hear more about your models.  I'm want to put together
an e-mail list to discuss programming by voice, and the direction we
should take in implementing and improving the tools we need.  Your
input is welcome!  Would it be better to host that e-mail list in
vinux land, or in gnu.org land?  Regardless, I would like to work in
Vinux to enable programming by voice at some basic level, and then I'd
like to get lots of voice coders on board to make it better.

Models later when I have more time. Probably this weekend coming up. Like Isaid, there is already a list but, I think I would choose the vinux world asbeing more culturally/philosophically on board with what we are trying to doregarding accessibility approaches.


I'm out of time for today. I'll try to get back to the rest of this later.

[Prev in Thread]

Current Thread

[Next in Thread]

[Accessibility] Can you help write a free version of HTK?, Bill Cox, 2010/07/09
- Re: [Accessibility] Can you help write a free version of HTK?, Eric S. Johansson, 2010/07/09
  - Re: [Accessibility] Can you help write a free version of HTK?, Jeremy Whiting, 2010/07/10
  - Re: [Accessibility] Can you help write a free version of HTK?, Bill Cox, 2010/07/12
    - Re: [Accessibility] Can you help write a free version of HTK?, Eric S. Johansson <=
    - Re: [Accessibility] Can you help write a free version of HTK?, Bill Cox, 2010/07/12
- [Accessibility] Re: Can you help write a free version of HTK?, grasch, 2010/07/11
  - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, Alexander Schliep, 2010/07/11
    - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, grasch, 2010/07/11
    - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, Thomas Harris, 2010/07/11
  - RE: [Accessibility] Re: Can you help write a free version of HTK?, Sina Bahram, 2010/07/11
  - [Accessibility] Re: Can you help write a free version of HTK?, Bill Cox, 2010/07/12
    - Re: [Accessibility] Re: Can you help write a free version of HTK?, Eric S. Johansson, 2010/07/12
    - [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?, Mathieu Blondel, 2010/07/12

Prev by Date: [Accessibility] Re: [Ghmm-list] Can you help write a free version of HTK?
Next by Date: Re: [Accessibility] Can you help write a free version of HTK?
Previous by thread: Re: [Accessibility] Can you help write a free version of HTK?
Next by thread: Re: [Accessibility] Can you help write a free version of HTK?
Index(es):
- Date
- Thread