[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Accessibility] Can you help write a free version of HTK?

From: Jeremy Whiting
Subject: Re: [Accessibility] Can you help write a free version of HTK?
Date: Sat, 10 Jul 2010 14:11:52 -0600

Bill, Eric,

Not sure how viable it is, but at Akademy I spoke with Peter Grasch (Simon developer) and he thinks it wouldn't be too hard to switch simon to use sphinx 4 instead of htk and julius.  I'm not sure how sphinx 4 is licensed, but I believe it's open source.  I agree with Eric however that if something works it's good enough, whether it's free as in freedom or not.  Anyway, just wanted to add that to the discussion I guess.

Best Regards,
Jeremy Whiting

On Fri, Jul 9, 2010 at 3:06 PM, Eric S. Johansson <address@hidden> wrote:
 I've been reading list for a while and bills posting finally prompts me to introduce myself.

I am a former developer, injured in 1994 ago after 18 year software development career.  I have been a successful user of speech recognition in writing but little else because of fundamental mismatches between speech recognition and computer interfaces. I was involved in programming by voice efforts through roughly the early to thousands when I had organized a couple of workshops for like-minded people to discuss programming by voice issues as well as a training session at Dragon Systems for use of Joe Gould's natPython system. I've been a student of speech user interfaces and an observer of how the general dictation vocabulary market collapsed from various Factors ranging from price erosion through false competition to monopolistic acquisition of competitors.

In  the mid-to thousands, I was one of the founding members of the open-source speech recognition initiative (nonprofit organization) and again observed its subsequent failure because of a lack of resources (i.e. developers). In OSSRI, we had some seriously high level people involved ranging from computational linguists to speech recognition engine designers (Sphinx 4) and cutting-edge users ( ISS and Mars mission applications).

The consensus of this August body was that all of the speech recognition toolkits out there (Julius HEK, Sphinx) were all designed to keep graduate students busy but not designed for use in the real world. I did take a look at Simon, it looks like it's the closest of the bunch but I estimate it somewhere between 5 to 8 years away from being useful (i.e. on parity with NaturallySpeaking). Based on my experience in OSSRI, you could shorten that timeline if you had around $10-$15 million to spend and pay for full-time developer efforts but you're not looking at anything any faster than three years. Speech recognition is unbelievably hard problem that doesn't work very well but works well enough to keep people trying. This is why there is little or no competition in the market (high-cost, low results)

it may seem like I'm trying to drag things down but, I mostly try to keep people from making the same old mistakes I've lived through multiple times in the past. What I believe is necessary to support disabled people is not going to be pleasant for those driven by OSS ideology. For example:

Handicap accessibility trumps politics.

If a disabled person is kept from working because of ideology, then the ideology is wrong.   I use NaturallySpeaking because for a fair number of tasks, it works and works far better than typing. I'm not even going to try and open source equivalent because it's still too much work that burns my hands that I need to use on other tasks so I can feed myself (cooking and making money). If someone was to tell me, they had a fully featured programming by voice package for thousand dollars complete with a restrictive license, I would use it without a second thought except how I get the money. I wouldn't lose a second of sleep over the licensing as long as it let me make money to live.

>From my perspective, OSS ideology blinds developers and organizations from solving the real problem, keeping disabled developers and others operating computers at a level equivalent to TAB usability. This tells me that any OSS accessibility interface should work from the application in towards the accessibility tool. For example, any tools used to make applications accessible should be built first using existing core technology such as NaturallySpeaking. Developing recognition engines should be dead last because they have the smallest impact on employability or usability.

We should be putting more effort into building appropriate speech level user interfaces instead of replicating the same cruel mistakes and useless hacks of the past 15 to  20 years.   instead of trying to get people to speak the keyboard or build interfaces which have been proven to destroy people's voices, we should be spending our time looking at other solutions for enabling applications without any application modifications or solving command discovery problems. Both of these solutions can reducing vocal and cognitive load which is a good thing. I've seen too many people try to use speech recognition in inappropriate ways (i.e. programming by voice using macros) end up doubly disabled both in the hands and the throat. Talk about well and truly screwed.

I've worked out a few models of how to produce better speech interfaces. Given my hands don't work well and I can't write code anymore, I have not been able to implement prototypes. I'll spare description but only say that I have talked about them with people involved in the speech recognition world and gotten double thumbs up on the ideas.

The current accessibility toolkits are doomed to fail because there is a 15ish year history of that model failing. They count on application developers to do things they have no financial interest in doing.  In a speech recognition world, the number of applications explicitly integrated with NaturallySpeaking is virtually unchanged since NaturallySpeaking version 4. The number of incidentally integrated applications (through the use of "standard edit control") has dropped because there are more people using multiplatform toolkits that don't follow standard practices or use a standard edit controls.  There is exactly one OSS application which was enabled for speech recognition but that has fallen into disrepair because I've been told "it would encourage the use of proprietary packages".   nice way to treat the disabled.

I would like to see accessibility start focusing on the edges, tools where people work. I used buzzword, a flash-based word processor, because it works better, faster, with better recognition than any open source word processor.  I'm even considering going back to Microsoft Word because that has specifically is supported and enabled.  Why not make something like OpenOffice or ABIword work with speech recognition because that lets people make an open-source choice at a level that matters to them.  All the other crap can come later once they understand the benefits of open-source applications.

I also suggest looking to history. Look at all the things that have failed repeatedly. I can give you a very long list that's very discouraging but the nice thing about the list is that it forces you to think different. Don't try to impose a GUI interface on speech recognition. Build a user interface which has discoverability. Don't try to force a disabled user to work on a single machine. Embrace the fact that your applications, data etc. run on a different machine.  remember that with speech recognition, you don't need to just enter data, you also need to edit it.

Xvoice was in fact used by programmers with typing impairments up
until the day IBM stopped selling licenses to ViaVoice for Linux.
When IBM did that, those programmers lost the ability to program by
voice natively in Linux.  IBM derailed programming by voice in Linux
for a decade, and we still have not recovered.  In case you didn't
know, Microsoft owns HTK, not Cambridge University.  So, every Linux
project that depends on HTK can be killed at any time by Microsoft.

That's not exactly what happened. In the first place, programming by voice is never really been practical. Creating code by voice became more practical with the voice coder project. Not wonderful but, better than straight dictation except it ruins your ability to dictate comments. IBM had nothing to ruining your ability to program by voice. It was that we couldn't get any attention by anyone in the open-source community to help us with the problem. We have a solution, it does some really nice things but I think the problem needs to be solved by going a different direction.

as a person that actually tried to use the IBM product, it was a stinking pile of crap that had a boatload of errors that IBM had no interest in fixing. When I posted a list of failures, that message was censored from the list. I sent it to a bunch of people who asked questions, see seeing the list and the second time it got through. As far as I'm concerned, it wasn't useful, it was a cruel joke that it hundred hours of my life and my hands which was a loss I didn't need at the time.

As for the whole HTK thing, I really don't care. I use NaturallySpeaking, if nuance stops selling it, I can keep using my license. If anything gets in the way, I go to court to get a remedy. I suspect I would not be the only disabled person working with the courts either.  if you take the same approach to HTK (i.e. mirror in case of legal disaster), you can move on with your life and deal with a problem when it comes up. I believe the courts look favorably on innovative solutions that solve disability problems without impairing normal commercial activity.

Also, I know people don't want to hear this but programming by voice is independent of the speech recognition engine. If you build on top of the dragonfly SDK, you don't care if you are using Microsoft or nuance for your speech recognition engine. If you want to really support disabled people, help build applications using dragonfly and once you solve the problem for disabled users, then go build a speech recognition engine.   Remember, handicap accessibility trumps politics. If we can't work, it's bloody useless.

Because of the HTK license, Simon is not going to be fully integrated
into Vinux, or Ubuntu which is the upstream distro we test
technologies for.  Simon built on HTK can never be included Debian, or
Fedora.  In other words, Simon is dead, because of HTK.  Typing
impaired programmers around the world will not benefit from all the
hard work of either the Julius or Simon project.  If you can't tell,
this really pisses me off.

cool. But you're getting pissed off the wrong reason.

Fortunately, we can freely read the HTK source code, and can learn how
it works.  We can then go rewrite it, and hopefully do a better job.
I propose we start an open-source effort to do exactly that, in order
to enable Simon and other accessibility software to be freely used to
help typing impaired people.  There is already a similar effort under
way, with a proper license:

Let let me reinforce. Typing impaired people (a really bad nomenclature since I'm also driving impaired, door opening impaired, preparing food impaired, hugging other people impaired...) don't care about licenses. They care about being able to participate online, work, write, etc.  full native language dictation is the most import feature. If you are speech recognition package can't be used to create a message like this e-mail, then you failed. Completely and totally failed.

How about this. Let's start with something simple like fixing Emacs vr-mode so we can use NaturallySpeaking with Emacs on multiple platforms. If you can't get a useful tool for disabled programmers working then something is seriously wrong and I don't believe you have the interests of disabled programmers in mind. Harsh words but right now, I can't use Emacs and I go to proprietary editor because that's the only choice I have if I'm going to work.

Maybe this other example might help. When the free software foundation for started up, Emacs ran on a bunch of proprietary platforms. It showed people the benefits of open source. Then came a whole bunch of other components in the gnu tool chain.   Eventually, thanks to Linux, a TAB was able to use a completely free system or, a broadly functional not so free system. Right now, we are back at the beginning. We don't even have the basic Emacs equivalent in handicap accessibility applications. Let's start with Emacs again and gradually add speech recognition enhancements throughout the entire system.

--- eric

Accessibility mailing list

reply via email to

[Prev in Thread] Current Thread [Next in Thread]