Jump to content


Discussing Web Design & Marketing Since 1998

  • Announcements

    • cre8pc

      Thank you! Cre8asiteforums 1998 - 2018   01/18/2018

      Internet Marketing Ninjas released many of the online forums they had acquired, such as WebmasterWorld, SEOChat, several DevShed properties and these forums back to their founders. You will notice a new user interface for Cre8asiteforums, the software was upgraded, and it was moved to a new server. Thank you for your support as we turn 20 years old.  

Voice Ai - What Do We Do With It?

Recommended Posts

There is so much in this piece that I enjoyed. I thought some of you might like to see it too.


Voice and the uncanny valley of AI


So, when I said that voice input 'works', what this means is that you can now use an audio wave-form to fill in a dialogue box - you can turn sound into text and text (from audio or, of course, from chatbots, which were last year's Next Big Thing) into a structured query, and you can work out where to send that query. The problem is that you might not actually have anywhere to send it. You can use voice to fill in a dialogue box, but the dialogue box has to exist - you need to have built it first. You have to build a flight-booking system, and a restaurant booking system, and a scheduling system, and a concert booking system - and anything else a user might want to do, before you can connect voice to them. Otherwise, if the user asks for any of those, you will accurately turn their voice into text, but not be able to do anything with it - all you have is a transcription system. And hence the problem - how many of these queries can you build? How many do you need? Can you just dump them to a web search or do you need (much) more?


Edited by cre8pc

Share this post

Link to post
Share on other sites

Yes, non-keyboard input exists.

I've been taking my cues from the Far East (or as one of my friends puts it, the western Pacific :)) and started with OCR:
* first UPC (Universal Product Codes) and EAN (European Article Numbers) bar codes
* then QR (Quick Response Codes) and variants
* finally item/product images
Note: very very big in Asia.

Voice is simply another step along the way. A great big step, a step that can be right off a cliff...

The article is quite correct is that natural language queries are a common interim step. The advantage of a text NLQ input is that there is relatively little fuzziness; audio has a fascinatingly frustrating level of fuzziness: tone, accent, noise, etc. that must be filtered out before the query itself can be identified (hopefully).

From my first post on the subject, albeit very broad and wishing on a star:

I have been testing a variation on an automated chat application - totally computer generated FAQ and Glossary pages in either/both text and audio. The idea being that the user can request greater detail or simplification or clarification and drill in and out as simply as zoom on a maps app.
I especially like the idea of a web page which allows you to zoom in and out on each idea, term, graphic, etc. to the capacity of the site DB and then go searching the web via linked further references or external SE with the instantaneous ability to pull back and continue reading the original content. The page as content DB interface.

And the SE relegated back to a search and fetch tool.

Over the years feature sets have been implemented if not to the full degree mentioned. Definitely still a work in progress.

Being a unilingual English comprehender with sites available in 3-other languages text input and response is fairly straight forward (hire translators!). Voice is not. Almost all the research being done, and certainly all the work I can attempt to play with/from, is in English. So, while I have got an almost viable audio search input with either/both text/audio for my English sites, that may never happen for the others. By almost viable I mean 100% correct 60% of the time, 80% correct 95% of the time. Not quite up to the cutting edge but well on it's way; usable but not quite to site live testing, I want 100% correct 80% of the time and 80% correct 99% of the time for that. Regardless, come a long way since that 2008 post. :)

Fortunately, voice is not necessary. And often where it might be a great convenience it can be short commands rather than full natural language queries. This abridged version is in live site testing, in all four languages. But only in conjunction with the limited OCR mentioned above and/or primary navigation and NOT in site search for reasons above.

Critical Note: with personal audio software, i.e. Dragon NaturallySpeaking, or hardware, i.e. Siri, Alexa, fine tuning is time worth spending. However, with ye olde average web site, no one is going to spend the time to calibrate their voice with the site's recognition software. And so comparable results are unlikely or very very difficult. Apps are sort of a middle case and where I'm doing most of my live benchmarking (in return for specials :))

HAL is a long long long way away. But then so is a manned mission to Jupiter. :)


Share this post

Link to post
Share on other sites

I am surprised that some of my old fart friends and relatives are using voice input. If you watch them on desktop machines, which they have been using for years, they seem like noobs. But, they are using voice input to search and text.

Edited by EGOL

Share this post

Link to post
Share on other sites

I remember in the 90's when computer got faster enough to record sound digitally of a decent quality. All the experts predicted that keyboards and mice would disappear in a couple years as everything switched to voice command. They reasoned that since computer speed would double in 18 months plus a few months to write the speech recognition software voice control was about to take over everything. They were clueless about the difficulties deciphering speech, separating speech from other sounds in the room etc...


Now, 25 years later speech recognition is just becoming good enough to be useful.

Share this post

Link to post
Share on other sites

One rather critical point to keep in mind with speech recognition is that it is computationally heavy - web voice control is currently and will be for the forseeable future uploaded to servers and not handled on the client. That means that what you say will be held indefinitely by others. And will be subject to hack (to access your voice access security, C3), ad personalisation, governmental whatever, etc.


I've been playing with several of the popular home products and access is child's play. It will likley only get worse before it gets better. Just try not to be among the catastrophes and miseries that drive eventual change.

Share this post

Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now