Re: [Accessibility] Priorities, Ideas?

accessibility

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Accessibility] Priorities, Ideas?

From:	Eric S. Johansson
Subject:	Re: [Accessibility] Priorities, Ideas?
Date:	Mon, 19 Jul 2010 18:35:57 -0400
User-agent:	Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.4) Gecko/20100608 Thunderbird/3.1

 On 7/19/2010 10:40 AM, Bill Cox wrote:

We've made far less progress in the typing impairment group.  I
believe our top priority here should be enabling programming by voice
natively in Linux.  This would allow the many excellent programmers
with RSI injuries to join FOSS efforts to improve the rest of the
tools.

The first glimmer of hope I've seen in the programming by voice area
is from the Simon tool.  It depends on a non-free library to build
speech models, called HTK.  I recently upset a number of people by
recommending we write a FOSS version of that library.  However, some
knowledgeable people informed me that it is possible to build less
accurate voice models today using Sphinx, and it should not be too
hard to get them working with Simon.  It seems to me that this is
currently the most viable short-term path to creating Debian
compatible packages for programming by voice.

obviously I have a different opinion. I think we should concentrate on thewrapper tools around a speech recognition core because we have one that works.Simon might be okay. Sphinx is not only less accurate but it has a smallervocabulary (a couple of thousand words for good accuracy). You can't writecomments, you can't create/edit/search/... symbols and, it's so restricted it'sreally only good for airline ticket reservations and weather reports.

my experience shows if you are adapting speech recognition to programming,you're going about the wrong way. You really should adapt programming to speechrecognition. This experience came about as result of my frustration with Webframeworks. They were completely unspeakable, required weeks upon weeks ofeditor modification to be able to work with a subset of the frameworkscapabilities and it wasn't worth my time.

About the same time, I discovered a minimalist Web framework called Aether. Itused a markup language which was remarkably friendly to speech recognition. Theguy had no interest in expanding it so I took it over. it looks like this


  (Required fields in [bold bold]).
[form  [id cust_info]
       [action javascript:wpsubmit(document.getElementById('custform'));]
       [method post]
[grid
[cell [bold Name:]][cell [Text [name customer][size 25]]]
[cell Company:][cell [Text [name company][size 25]]]
[cell Phone number:][cell [text [name phone][size 25]]]

[cell [bold E-mail address (for bundle delivery):]][cell [text [name eaddr][size25]]]

[cell [button [name swp1] [value Get Started Now] [type submit]]]
]]]]

Almost everything here is something you can speak. a couple things aren'tbecause I was either careless or tired but for the most part, it's pretty closeif you want to see where this code fragment is used,


http://www.esjworks.com/packaged_services

click on left side link: 201 cmr 17.00 look for green box on the right handside. the code running the form looks like:


class esjworks_201fillin(esjworks_fillin):
    def __init__(self, name, reveal, do_auth, page_maker):

        esjworks_fillin.__init__(self, name, reveal, do_auth, page_maker)

#    def page_init(self, name, query):

# """return gather CGI information, e-mail to me if successful in sendingwhite paper."""


    def handle_swp1(self, name,query):

        syslog.syslog("201fillin page_init %s %s"%(name,str(query)))
        #validate inputs

        cust = query.has_key('customer') and query['customer'] is not None
        phone = query.has_key('phone') and query['phone'] is not None
        address = query.has_key('eaddr') and query['eaddr'] is not None
        syslog.syslog("cust, phone, address %s %s %s "%(cust, phone, address))
        #if cust and phone and address:
        if cust and address:
            if not query.has_key('company'):
                query['company'] = 'no company'

            if not query.has_key('phone'):
                query['phone'] = 'no phone'

            try:
                send_wp(query['eaddr'])
                s = smtplib.SMTP()
                s.connect()
                s.sendmail('address@hidden',
                           ['address@hidden',
                            'address@hidden'],
                           notify_message%query)
                s.close()
            except:

                syslog.syslog("swp failed")
                ajax_page = self.page_maker.fileserver.load(name+"/failsend")
                syslog.syslog("swp failed")

http_page =self.page_maker.make_ajax_page(ajax_page,'200',self.cookie_output)

                raise
                #return http_page
            else:

                syslog.syslog("swp passed")
                ajax_page = self.page_maker.fileserver.load(name+"/oksend")

http_page =self.page_maker.make_ajax_page(ajax_page,'200',self.cookie_output)

                return http_page
        else:

            syslog.syslog("swp arg failed 1")
            ajax_page = self.page_maker.fileserver.load(name+"/failarg")
            #syslog.syslog("swp arg failed 2")

http_page =self.page_maker.make_ajax_page(ajax_page,'200',self.cookie_output)

            #syslog.syslog("swp arg failed 3")

            #syslog.syslog("swp arg page %s"% http_page)

            return http_page

Most of this code was written using speech recognition. If a common symbol isused that's not speech recognition friendly, it was cut-and-paste because I wastoo tired/cranky to think of a verbose form.again, this is an example of adapting my code to speech recognition which letsme write some code without busting the time bank on the minimally productive oruseful macros.

I'm fairly negative on specialized grammars or macros for these kind ofapplications because they really don't work in practice. I think I spent thefirst five years being disabled and using speech recognition creating andmodifying macros to fit whatever my current situation is. I discovered that Iwas using the same utterances over and over over again in different contexts andthe tools couldn't cope. I discovered that I was spending as much if not moremental energy figuring out how to get the speech recognition system to producethe code I wanted from a relatively limited set of commands (i.e. 50 to 100) asI was figuring out what the code should look like in the first place.

It was this experience that drove me to the conclusion that when using speechrecognition, you want to drive the user out of the application context to aspeech recognition friendly environment where they can do their work and thenreturn to work through a filter which converts into a context specific form. Inother words, convert code to plaintext, modify plaintext, convert plaintext backinto code.

Whatever I say this, inevitably some very-smart-person (TM) points out that I'mtrying to solve the natural language programming problem and everyone has failedso far. And they are right, kind of. What I'm trying to do is use a limitedgrammar using natural language words and constructs to express code. Muchsmaller problem, much more easily handled and, if you do it right, editorindependent. I know some people might frown on the thought of anyone usinganything but Emacs but we all know there are a lot of perverts out there. :-)

I see what I have done with akasha is a small-scale proof of how you can expressuser hostile markup notations using English words that can be easily spoken. Ibelieve you can do the same with programming.

In trying to solve this problem, I think that everyone should throw away theirkeyboards and use speech recognition for everything. The pressure between whatyou want to do and what you can do helps us understand just how very different aspeech driven interface is from a keyboard and/or mouse driven interface. Thereare two ways responding to the pressure as I illustrated above. You can eithertry to force the input device to work with a different UI model or, you changethe UI model to accommodate the input device. We have a long history of forcingspeech recognition to work with a GUI model and have a very small amount successto show for it. Taking other approach is relatively unexplored territory. Mostof the papers I saw a decade ago concentrated on speech user interfaces forcellular phone. IE Sphinx scale speaker independent, small vocabulary applications.

But even this technique of throwing away the keyboard won't work for everybodyif they don't have some awareness of how their mind and body changes when theyuse speech recognition. I remember what I first started using it, it felt allwrong and what I said were the wrong words. After a few months, I realized thatthe sensation I had in my mind was not an illusion but an awareness of my mindshifting from speaking spoken speech to speaking written speech. Morespecifically, from speaking continuous spoken speech to speaking discreteutterance written speech. It's that kind of awareness that will guide us throughdeveloping good speech user interfaces.


But that's enough rambling for today.

[Prev in Thread]

Current Thread

[Next in Thread]

[Accessibility] Priorities, Ideas?, Christian Hofstader, 2010/07/19
- Re: [Accessibility] Priorities, Ideas?, Bill Cox, 2010/07/19
  - Re: [Accessibility] Priorities, Ideas?, Eric S. Johansson <=
- Re: [Accessibility] Priorities, Ideas?, Piñeiro, 2010/07/19
  - Re: [Accessibility] Priorities, Ideas?, Steve Holmes, 2010/07/19
    - Re: [Accessibility] Priorities, Ideas?, Bart Bunting, 2010/07/19

Prev by Date: Re: [Accessibility] Priorities, Ideas?
Next by Date: Re: [Accessibility] Priorities, Ideas?
Previous by thread: Re: [Accessibility] Priorities, Ideas?
Next by thread: Re: [Accessibility] Priorities, Ideas?
Index(es):
- Date
- Thread