Jump to content

Cre8asiteforums Internet Marketing
and Conversion Web Design


Voice Ai - What Do We Do With It?

  • Please log in to reply
4 replies to this topic

#1 cre8pc


    Dream Catcher Forums Founder

  • Admin - Top Level
  • 14,819 posts

Posted 19 March 2017 - 03:55 PM

There is so much in this piece that I enjoyed. I thought some of you might like to see it too.


Voice and the uncanny valley of AI


So, when I said that voice input 'works', what this means is that you can now use an audio wave-form to fill in a dialogue box - you can turn sound into text and text (from audio or, of course, from chatbots, which were last year's Next Big Thing) into a structured query, and you can work out where to send that query. The problem is that you might not actually have anywhere to send it. You can use voice to fill in a dialogue box, but the dialogue box has to exist - you need to have built it first. You have to build a flight-booking system, and a restaurant booking system, and a scheduling system, and a concert booking system - and anything else a user might want to do, before you can connect voice to them. Otherwise, if the user asks for any of those, you will accurately turn their voice into text, but not be able to do anything with it - all you have is a transcription system. And hence the problem - how many of these queries can you build? How many do you need? Can you just dump them to a web search or do you need (much) more?


Edited by cre8pc, 19 March 2017 - 03:55 PM.

#2 iamlost


    The Wind Master

  • Site Administrators
  • 5,517 posts

Posted 19 March 2017 - 05:44 PM

Yes, non-keyboard input exists.

I've been taking my cues from the Far East (or as one of my friends puts it, the western Pacific :)) and started with OCR:
* first UPC (Universal Product Codes) and EAN (European Article Numbers) bar codes
* then QR (Quick Response Codes) and variants
* finally item/product images
Note: very very big in Asia.

Voice is simply another step along the way. A great big step, a step that can be right off a cliff...

The article is quite correct is that natural language queries are a common interim step. The advantage of a text NLQ input is that there is relatively little fuzziness; audio has a fascinatingly frustrating level of fuzziness: tone, accent, noise, etc. that must be filtered out before the query itself can be identified (hopefully).

From my first post on the subject, albeit very broad and wishing on a star:

I have been testing a variation on an automated chat application - totally computer generated FAQ and Glossary pages in either/both text and audio. The idea being that the user can request greater detail or simplification or clarification and drill in and out as simply as zoom on a maps app.
I especially like the idea of a web page which allows you to zoom in and out on each idea, term, graphic, etc. to the capacity of the site DB and then go searching the web via linked further references or external SE with the instantaneous ability to pull back and continue reading the original content. The page as content DB interface.

And the SE relegated back to a search and fetch tool.

Over the years feature sets have been implemented if not to the full degree mentioned. Definitely still a work in progress.

Being a unilingual English comprehender with sites available in 3-other languages text input and response is fairly straight forward (hire translators!). Voice is not. Almost all the research being done, and certainly all the work I can attempt to play with/from, is in English. So, while I have got an almost viable audio search input with either/both text/audio for my English sites, that may never happen for the others. By almost viable I mean 100% correct 60% of the time, 80% correct 95% of the time. Not quite up to the cutting edge but well on it's way; usable but not quite to site live testing, I want 100% correct 80% of the time and 80% correct 99% of the time for that. Regardless, come a long way since that 2008 post. :)

Fortunately, voice is not necessary. And often where it might be a great convenience it can be short commands rather than full natural language queries. This abridged version is in live site testing, in all four languages. But only in conjunction with the limited OCR mentioned above and/or primary navigation and NOT in site search for reasons above.  

Critical Note: with personal audio software, i.e. Dragon NaturallySpeaking, or hardware, i.e. Siri, Alexa, fine tuning is time worth spending. However, with ye olde average web site, no one is going to spend the time to calibrate their voice with the site's recognition software. And so comparable results are unlikely or very very difficult. Apps are sort of a middle case and where I'm doing most of my live benchmarking (in return for specials :))

HAL is a long long long way away. But then so is a manned mission to Jupiter. :)





  • Hall Of Fame
  • 6,404 posts

Posted 19 March 2017 - 07:50 PM

I am surprised that some of my old fart friends and relatives are using voice input.   If you watch them on desktop machines, which they have been using for years, they seem like noobs.  But, they are using voice input to search and text.

Edited by EGOL, 19 March 2017 - 07:50 PM.

#4 wiser3


    Mach 1 Member

  • 250 Posts Club
  • 282 posts

Posted 20 March 2017 - 11:47 AM

I remember in the 90's when computer got faster enough to record sound digitally of a decent quality. All the experts predicted that keyboards and mice would disappear in a couple years as everything switched to voice command. They reasoned that since computer speed would double in 18 months plus a few months to write the speech recognition software voice control was about to take over everything. They were clueless about the difficulties deciphering speech, separating speech from other sounds in the room etc...


Now, 25 years later speech recognition is just becoming good enough to be useful.

#5 iamlost


    The Wind Master

  • Site Administrators
  • 5,517 posts

Posted 20 March 2017 - 04:03 PM

One rather critical point to keep in mind with speech recognition is that it is computationally heavy - web voice control is currently and will be for the forseeable future uploaded to servers and not handled on the client. That means that what you say will be held indefinitely by others. And will be subject to hack (to access your voice access security, C3), ad personalisation, governmental whatever, etc.


I've been playing with several of the popular home products and access is child's play. It will likley only get worse before it gets better. Just try not to be among the catastrophes and miseries that drive eventual change.

RSS Feed

0 user(s) are reading this topic

0 members, 0 guests, 0 anonymous users