[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Accessibility] Priorities, Ideas?

From: Eric S. Johansson
Subject: Re: [Accessibility] Priorities, Ideas?
Date: Mon, 19 Jul 2010 18:35:57 -0400
User-agent: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv: Gecko/20100608 Thunderbird/3.1

 On 7/19/2010 10:40 AM, Bill Cox wrote:
We've made far less progress in the typing impairment group.  I
believe our top priority here should be enabling programming by voice
natively in Linux.  This would allow the many excellent programmers
with RSI injuries to join FOSS efforts to improve the rest of the

The first glimmer of hope I've seen in the programming by voice area
is from the Simon tool.  It depends on a non-free library to build
speech models, called HTK.  I recently upset a number of people by
recommending we write a FOSS version of that library.  However, some
knowledgeable people informed me that it is possible to build less
accurate voice models today using Sphinx, and it should not be too
hard to get them working with Simon.  It seems to me that this is
currently the most viable short-term path to creating Debian
compatible packages for programming by voice.

obviously I have a different opinion. I think we should concentrate on the wrapper tools around a speech recognition core because we have one that works. Simon might be okay. Sphinx is not only less accurate but it has a smaller vocabulary (a couple of thousand words for good accuracy). You can't write comments, you can't create/edit/search/... symbols and, it's so restricted it's really only good for airline ticket reservations and weather reports.

my experience shows if you are adapting speech recognition to programming, you're going about the wrong way. You really should adapt programming to speech recognition. This experience came about as result of my frustration with Web frameworks. They were completely unspeakable, required weeks upon weeks of editor modification to be able to work with a subset of the frameworks capabilities and it wasn't worth my time.

About the same time, I discovered a minimalist Web framework called Aether. It used a markup language which was remarkably friendly to speech recognition. The guy had no interest in expanding it so I took it over. it looks like this

  (Required fields in [bold bold]).
[form  [id cust_info]
       [action javascript:wpsubmit(document.getElementById('custform'));]
       [method post]
[cell [bold Name:]][cell [Text [name customer][size 25]]]
[cell Company:][cell [Text [name company][size 25]]]
[cell Phone number:][cell [text [name phone][size 25]]]
[cell [bold E-mail address (for bundle delivery):]][cell [text [name eaddr][size 25]]]
[cell [button [name swp1] [value Get Started Now] [type submit]]]

Almost everything here is something you can speak. a couple things aren't because I was either careless or tired but for the most part, it's pretty close if you want to see where this code fragment is used,

click on left side link: 201 cmr 17.00 look for green box on the right hand side. the code running the form looks like:

class esjworks_201fillin(esjworks_fillin):
    def __init__(self, name, reveal, do_auth, page_maker):

        esjworks_fillin.__init__(self, name, reveal, do_auth, page_maker)

#    def page_init(self, name, query):
# """return gather CGI information, e-mail to me if successful in sending white paper."""

    def handle_swp1(self, name,query):

        syslog.syslog("201fillin page_init %s %s"%(name,str(query)))
        #validate inputs

        cust = query.has_key('customer') and query['customer'] is not None
        phone = query.has_key('phone') and query['phone'] is not None
        address = query.has_key('eaddr') and query['eaddr'] is not None
        syslog.syslog("cust, phone, address %s %s %s "%(cust, phone, address))
        #if cust and phone and address:
        if cust and address:
            if not query.has_key('company'):
                query['company'] = 'no company'

            if not query.has_key('phone'):
                query['phone'] = 'no phone'

                s = smtplib.SMTP()

                syslog.syslog("swp failed")
                ajax_page = self.page_maker.fileserver.load(name+"/failsend")
                syslog.syslog("swp failed")
http_page = self.page_maker.make_ajax_page(ajax_page,'200',self.cookie_output)
                #return http_page

                syslog.syslog("swp passed")
                ajax_page = self.page_maker.fileserver.load(name+"/oksend")
http_page = self.page_maker.make_ajax_page(ajax_page,'200',self.cookie_output)
                return http_page

            syslog.syslog("swp arg failed 1")
            ajax_page = self.page_maker.fileserver.load(name+"/failarg")
            #syslog.syslog("swp arg failed 2")
http_page = self.page_maker.make_ajax_page(ajax_page,'200',self.cookie_output)
            #syslog.syslog("swp arg failed 3")

            #syslog.syslog("swp arg page %s"% http_page)

            return http_page

Most of this code was written using speech recognition. If a common symbol is used that's not speech recognition friendly, it was cut-and-paste because I was too tired/cranky to think of a verbose form. again, this is an example of adapting my code to speech recognition which lets me write some code without busting the time bank on the minimally productive or useful macros.

I'm fairly negative on specialized grammars or macros for these kind of applications because they really don't work in practice. I think I spent the first five years being disabled and using speech recognition creating and modifying macros to fit whatever my current situation is. I discovered that I was using the same utterances over and over over again in different contexts and the tools couldn't cope. I discovered that I was spending as much if not more mental energy figuring out how to get the speech recognition system to produce the code I wanted from a relatively limited set of commands (i.e. 50 to 100) as I was figuring out what the code should look like in the first place.

It was this experience that drove me to the conclusion that when using speech recognition, you want to drive the user out of the application context to a speech recognition friendly environment where they can do their work and then return to work through a filter which converts into a context specific form. In other words, convert code to plaintext, modify plaintext, convert plaintext back into code.

Whatever I say this, inevitably some very-smart-person (TM) points out that I'm trying to solve the natural language programming problem and everyone has failed so far. And they are right, kind of. What I'm trying to do is use a limited grammar using natural language words and constructs to express code. Much smaller problem, much more easily handled and, if you do it right, editor independent. I know some people might frown on the thought of anyone using anything but Emacs but we all know there are a lot of perverts out there. :-)

I see what I have done with akasha is a small-scale proof of how you can express user hostile markup notations using English words that can be easily spoken. I believe you can do the same with programming.

In trying to solve this problem, I think that everyone should throw away their keyboards and use speech recognition for everything. The pressure between what you want to do and what you can do helps us understand just how very different a speech driven interface is from a keyboard and/or mouse driven interface. There are two ways responding to the pressure as I illustrated above. You can either try to force the input device to work with a different UI model or, you change the UI model to accommodate the input device. We have a long history of forcing speech recognition to work with a GUI model and have a very small amount success to show for it. Taking other approach is relatively unexplored territory. Most of the papers I saw a decade ago concentrated on speech user interfaces for cellular phone. IE Sphinx scale speaker independent, small vocabulary applications.

But even this technique of throwing away the keyboard won't work for everybody if they don't have some awareness of how their mind and body changes when they use speech recognition. I remember what I first started using it, it felt all wrong and what I said were the wrong words. After a few months, I realized that the sensation I had in my mind was not an illusion but an awareness of my mind shifting from speaking spoken speech to speaking written speech. More specifically, from speaking continuous spoken speech to speaking discrete utterance written speech. It's that kind of awareness that will guide us through developing good speech user interfaces.

But that's enough rambling for today.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]