Yes, non-keyboard input exists.
I've been taking my cues from the Far East (or as one of my friends puts it, the western Pacific ) and started with OCR:
* first UPC (Universal Product Codes) and EAN (European Article Numbers) bar codes
* then QR (Quick Response Codes) and variants
* finally item/product images
Note: very very big in Asia.
Voice is simply another step along the way. A great big step, a step that can be right off a cliff...
The article is quite correct is that natural language queries are a common interim step. The advantage of a text NLQ input is that there is relatively little fuzziness; audio has a fascinatingly frustrating level of fuzziness: tone, accent, noise, etc. that must be filtered out before the query itself can be identified (hopefully).
From my first post on the subject, albeit very broad and wishing on a star:
I have been testing a variation on an automated chat application - totally computer generated FAQ and Glossary pages in either/both text and audio. The idea being that the user can request greater detail or simplification or clarification and drill in and out as simply as zoom on a maps app.
I especially like the idea of a web page which allows you to zoom in and out on each idea, term, graphic, etc. to the capacity of the site DB and then go searching the web via linked further references or external SE with the instantaneous ability to pull back and continue reading the original content. The page as content DB interface.
And the SE relegated back to a search and fetch tool.
Over the years feature sets have been implemented if not to the full degree mentioned. Definitely still a work in progress.
Being a unilingual English comprehender with sites available in 3-other languages text input and response is fairly straight forward (hire translators!). Voice is not. Almost all the research being done, and certainly all the work I can attempt to play with/from, is in English. So, while I have got an almost viable audio search input with either/both text/audio for my English sites, that may never happen for the others. By almost viable I mean 100% correct 60% of the time, 80% correct 95% of the time. Not quite up to the cutting edge but well on it's way; usable but not quite to site live testing, I want 100% correct 80% of the time and 80% correct 99% of the time for that. Regardless, come a long way since that 2008 post.
Fortunately, voice is not necessary. And often where it might be a great convenience it can be short commands rather than full natural language queries. This abridged version is in live site testing, in all four languages. But only in conjunction with the limited OCR mentioned above and/or primary navigation and NOT in site search for reasons above.
Critical Note: with personal audio software, i.e. Dragon NaturallySpeaking, or hardware, i.e. Siri, Alexa, fine tuning is time worth spending. However, with ye olde average web site, no one is going to spend the time to calibrate their voice with the site's recognition software. And so comparable results are unlikely or very very difficult. Apps are sort of a middle case and where I'm doing most of my live benchmarking (in return for specials )
HAL is a long long long way away. But then so is a manned mission to Jupiter.