[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Accessibility] Priorities, Ideas?
Eric S. Johansson
Re: [Accessibility] Priorities, Ideas?
Mon, 19 Jul 2010 18:35:57 -0400
Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:126.96.36.199) Gecko/20100608 Thunderbird/3.1
On 7/19/2010 10:40 AM, Bill Cox wrote:
We've made far less progress in the typing impairment group. I
believe our top priority here should be enabling programming by voice
natively in Linux. This would allow the many excellent programmers
with RSI injuries to join FOSS efforts to improve the rest of the
The first glimmer of hope I've seen in the programming by voice area
is from the Simon tool. It depends on a non-free library to build
speech models, called HTK. I recently upset a number of people by
recommending we write a FOSS version of that library. However, some
knowledgeable people informed me that it is possible to build less
accurate voice models today using Sphinx, and it should not be too
hard to get them working with Simon. It seems to me that this is
currently the most viable short-term path to creating Debian
compatible packages for programming by voice.
obviously I have a different opinion. I think we should concentrate on the
wrapper tools around a speech recognition core because we have one that works.
Simon might be okay. Sphinx is not only less accurate but it has a smaller
vocabulary (a couple of thousand words for good accuracy). You can't write
comments, you can't create/edit/search/... symbols and, it's so restricted it's
really only good for airline ticket reservations and weather reports.
my experience shows if you are adapting speech recognition to programming,
you're going about the wrong way. You really should adapt programming to speech
recognition. This experience came about as result of my frustration with Web
frameworks. They were completely unspeakable, required weeks upon weeks of
editor modification to be able to work with a subset of the frameworks
capabilities and it wasn't worth my time.
About the same time, I discovered a minimalist Web framework called Aether. It
used a markup language which was remarkably friendly to speech recognition. The
guy had no interest in expanding it so I took it over. it looks like this
(Required fields in [bold bold]).
[form [id cust_info]
[cell [bold Name:]][cell [Text [name customer][size 25]]]
[cell Company:][cell [Text [name company][size 25]]]
[cell Phone number:][cell [text [name phone][size 25]]]
[cell [bold E-mail address (for bundle delivery):]][cell [text [name eaddr][size
[cell [button [name swp1] [value Get Started Now] [type submit]]]
Almost everything here is something you can speak. a couple things aren't
because I was either careless or tired but for the most part, it's pretty close
if you want to see where this code fragment is used,
click on left side link: 201 cmr 17.00 look for green box on the right hand
side. the code running the form looks like:
def __init__(self, name, reveal, do_auth, page_maker):
esjworks_fillin.__init__(self, name, reveal, do_auth, page_maker)
# def page_init(self, name, query):
# """return gather CGI information, e-mail to me if successful in sending
def handle_swp1(self, name,query):
syslog.syslog("201fillin page_init %s %s"%(name,str(query)))
cust = query.has_key('customer') and query['customer'] is not None
phone = query.has_key('phone') and query['phone'] is not None
address = query.has_key('eaddr') and query['eaddr'] is not None
syslog.syslog("cust, phone, address %s %s %s "%(cust, phone, address))
#if cust and phone and address:
if cust and address:
if not query.has_key('company'):
query['company'] = 'no company'
if not query.has_key('phone'):
query['phone'] = 'no phone'
s = smtplib.SMTP()
ajax_page = self.page_maker.fileserver.load(name+"/failsend")
ajax_page = self.page_maker.fileserver.load(name+"/oksend")
syslog.syslog("swp arg failed 1")
ajax_page = self.page_maker.fileserver.load(name+"/failarg")
#syslog.syslog("swp arg failed 2")
#syslog.syslog("swp arg failed 3")
#syslog.syslog("swp arg page %s"% http_page)
Most of this code was written using speech recognition. If a common symbol is
used that's not speech recognition friendly, it was cut-and-paste because I was
too tired/cranky to think of a verbose form.
again, this is an example of adapting my code to speech recognition which lets
me write some code without busting the time bank on the minimally productive or
I'm fairly negative on specialized grammars or macros for these kind of
applications because they really don't work in practice. I think I spent the
first five years being disabled and using speech recognition creating and
modifying macros to fit whatever my current situation is. I discovered that I
was using the same utterances over and over over again in different contexts and
the tools couldn't cope. I discovered that I was spending as much if not more
mental energy figuring out how to get the speech recognition system to produce
the code I wanted from a relatively limited set of commands (i.e. 50 to 100) as
I was figuring out what the code should look like in the first place.
It was this experience that drove me to the conclusion that when using speech
recognition, you want to drive the user out of the application context to a
speech recognition friendly environment where they can do their work and then
return to work through a filter which converts into a context specific form. In
other words, convert code to plaintext, modify plaintext, convert plaintext back
Whatever I say this, inevitably some very-smart-person (TM) points out that I'm
trying to solve the natural language programming problem and everyone has failed
so far. And they are right, kind of. What I'm trying to do is use a limited
grammar using natural language words and constructs to express code. Much
smaller problem, much more easily handled and, if you do it right, editor
independent. I know some people might frown on the thought of anyone using
anything but Emacs but we all know there are a lot of perverts out there. :-)
I see what I have done with akasha is a small-scale proof of how you can express
user hostile markup notations using English words that can be easily spoken. I
believe you can do the same with programming.
In trying to solve this problem, I think that everyone should throw away their
keyboards and use speech recognition for everything. The pressure between what
you want to do and what you can do helps us understand just how very different a
speech driven interface is from a keyboard and/or mouse driven interface. There
are two ways responding to the pressure as I illustrated above. You can either
try to force the input device to work with a different UI model or, you change
the UI model to accommodate the input device. We have a long history of forcing
speech recognition to work with a GUI model and have a very small amount success
to show for it. Taking other approach is relatively unexplored territory. Most
of the papers I saw a decade ago concentrated on speech user interfaces for
cellular phone. IE Sphinx scale speaker independent, small vocabulary applications.
But even this technique of throwing away the keyboard won't work for everybody
if they don't have some awareness of how their mind and body changes when they
use speech recognition. I remember what I first started using it, it felt all
wrong and what I said were the wrong words. After a few months, I realized that
the sensation I had in my mind was not an illusion but an awareness of my mind
shifting from speaking spoken speech to speaking written speech. More
specifically, from speaking continuous spoken speech to speaking discrete
utterance written speech. It's that kind of awareness that will guide us through
developing good speech user interfaces.
But that's enough rambling for today.