What Is Voice Commerce? – Perspectives and Challenges10 min read

What Is Voice Commerce? – Perspectives and Challenges10 min read

11/03/2019 8 By Vasyl Tsyktor

Voice commerce, also known as V-commerce, is a product buying principle that allows customers to make purchases using their mobile devices and voice commands. Instead of entering keywords in a search line, users can tell the system what they’re looking for on the Internet to get search results.
Within the conversational commerce concept, V-commerce may rely on such mobile devices as tablets and smartphones that use virtual personal assistants like Siri and Google Assistant as a means of software for processing voice commands. There also is special hardware designed to rely on voice control. This hardware includes Amazon Alexa, Google Home, and Apple Homepod.


Based on speech recognition technologies, voice commerce has a great potential for how customers make purchases online. Michael Dertouzos, a computer scientist who was among the World Wide Web creators, forecasted speech recognition technologies to significantly expand the Internet.


In his book “What Will Be: How the New World of Information Will Change Our Lives” written in 1997, Michael claimed that these technologies would provide those people incapable of writing or reading with access to the Internet. In his opinion, speech recognition will greatly improve the way we’re used to look for information online.

The emergence of voice devices

Prior to modern voice assistants like Google, there were devices that could generate some words or numbers. In 1961, IBM, the largest computer manufacturer at the time, introduced the Shoebox computer enabled with speech recognition. This computer could recognize 16 different words and numbers from zero to nine. When someone said a number using the attached microphone, the computer then lighted the corresponding lamp on its housing.


The history of voice commerce began in May 2000 with the emergence of the VoiceXML markup language. In that time, the first VoiceXML version appeared at the W3 Consortium. This markup language was supposed to bring the advantages of the Web development to interactive voice response (IVR) app development. VoiceXML allows developers to create software solutions capable of generating dynamic voice menus while storing dialogue processing algorithms on the XML server instead of a web server.

Virtual assistants

The first voice-based virtual personal assistant, which allowed users to make purchases via voice commands, was the Speech Interpretation and Recognition Interface from Apple. Also known as Siri, this mobile system appeared on October 4, 2011.

However, users could try Siri in action only three years later after its official release in 2014. This year became the new era for online purchases and it can be considered as the exact time when the voice commerce concept appeared. In 2016, Google presented its own voice-based personal assistance system called Google Assistant designed for Android devices.

How voice commerce works

The typical buying process happening with the use of voice commands has the following algorithm:

  1. A certain mobile device perceives the human voice through a built-in microphone
  2. Pre-installed software processes the voice command.
  3. The user receives a list of search results
  4. The user places an order using voice commands.

Let’s dive deeper into the technology behind voice recognition. The working principle of the voice recognition can be divided into two main phases: converting human speech into text to be further understood by a virtual assistant (speech-to-text) and generating request results in a text form in order to further make it sound like a human speech (text-to-speech).


How do bots understand what we tell them? This becomes possible due to the speech-to-text technology since machines cannot understand voice but they do understand the text. Such special software solutions like Google Cloud Speech-to-Text allow developers to convert audio to text relying on neural networks and the deep learning technology. Such systems can automatically recognize a spoken language and convert both real-time streaming and prerecorded audio into text.


How do bots understand what they should say? They first generate answers in text form and then sound their answers using the text-to-speech technology sometimes provided by solutions like Amazon Polly. Special algorithms transform all the numbers in words and decode the abbreviations. Then they divide the text into separate phrases that read them with a seamless intonation while paying attention to the punctuation and idioms.


To read the text in a proper way, artificial intelligence algorithms compile a phonetic transcription for each word. In order to understand how to read a certain word and where to put an emphasize, the system uses dictionaries. If there’s no necessary word in a dictionary, the machine creates a transcription based on academic rules.


To choose the right intonation, the system calculates the necessary number of frames of 25 ms long. Each of these time frames has different parameters: phoneme, position, and syllable. To read the complete text, voice-based systems use an acoustic model that establishes the correspondences between phonemes with certain characteristics and sounds. The acoustic model contains information on how the system should pronounce the phoneme. Based on machine learning, the model can learn over time. The more data it uses to read the text, the better the final result is.


Voice-based Ecommerce has been around for 10 years now.  Since its emergence, it has been significantly changed. Today, we can order food, taxi, and shop online without even touching a display of our smartphones. With just “Ok Google” voice command, you can do what you used to need for browsing different pages for. Let’s consider some of the voice commerce examples.

Ordering food

Google Home

The era of voice commerce is even closer than it seems. You might have used voice commands to search for some information on the Internet. According to a 2019 survey by Stone Temple, 58% of mobile users sometimes use voice search. One of the purpose mobile device owners can use voice commands is ordering food.


V-commerce is somehow the next gen of chatbots that provide users with a conversational interface where users can order pizza by typing in what they need. Unlike text-based AI-chatbots, voice control is about bots that understand human speech instead of text. For example, Google Home supports Domino’s pizza ordering. With this device, you can place your favorite order, or select those pizzas you’ve ordered the previous time. You place an order, you just need to tell Google Home you want pizza.

Online shopping

Amazon Echo Dot

V-commerce significantly changes the way we make purchases online. It expands our typical online shopping experience and creates new opportunities for content optimization for Ecommerce businesses. According to Google, 70% of all search queries in Google Assistant sound like natural speech. That’s why customers can make usual search queries likes “I’d like to buy shoes” instead of using weird “buy shoes online Minnesota”.


Amazon Echo Dot allows users to make purchases on Amazon.com with voice commands like “Alexa, order paper towels”. Alexa will search for paper towels in your order history, place an order, ask you for confirming it. To buy online with Amazon Echo, you must have an Amazon account, saved payment information, and Amazon Prime membership.

Market size

V-commerce is what Ecommerce businesses should think about right now to benefit from the emerging technology. The voice commerce market size will rapidly grow in the next few years. The international consulting company OC & C calculated that the sales revenue from voice services would have reached the value of  $40 billion in the U.S. and $5 billion in the UK by 2022. These numbers will be equal to 6% and 3% of the overall online spending respectively.


With the growing popularity of voice-based Ecommerce, various vendors start implementing voice control in their solutions. The international payment system Mastercard has a special product called Masterpass widely used in chatbots. This solution is a payment gateway that enables online payments within conversational interfaces. According to Ann Cairns, vice chairman at Mastercard, the company is planning to integrate Masterpass into voice assistants like Google Assistant and Amazon Alexa.


“Voice offers a unique opportunity for business to deliver faster, easier and more convenient experiences,” Ann Cairns, vice chairman at Mastercard New York

It seems like voice messages become more and more popular. Stone Temple states that users prefer recording his speech instead of typing long text messages. Frankly, it annoys me since I can’t listen to what he wants to tell me while outside with my headphones left at home. Obviously, people become lazier because saying is easier than typing. Therefore, voice commerce has a few important advantages over text-based chatbots.


In a typical scenario, you have to use both hands to make purchases online. No matter either you use a mobile device or laptop. In addition, you should focus on the shopping process if you want to do it fast and buy exactly what you need. With V-commerce, you can buy an item on-the-fly while going along the street, washing up, or even driving.


According to a 2018 report “Is there anyone here? Conversation commerce gains voice” published by Mastercard, the convenience is the key driver of voice-based services. Furthermore, the above mentioned Stone Temple’s survey shows that nearly half of users prefer voice services because speaking is accurate and they don’t need to type.


Making purchases with chatbots requires typing. It can take up to a couple of minutes to type a long message. Dictating what you want to say is definitely much faster than texting. For example, I can type up to 40 words per minute or pronounce about 120-160 words within the same time. Therefore, speaking is 3-4 times faster than typing. The report by Stone Temple also proves this thought: about 70% of users appreciate voice services because they’re fast.


Voice commerce is relatively a new concept still remaining in its development phase. According to Stone Temple, only 15% of men and 6% of women have ever used voice commands for shopping while nearly half of all users used this feature for texting. The reason may lay in that the technology isn’t mature enough. However, there are some other challenges that can make users avoid V-commerce.

Personal data privacy

Modern Internet users are aware of conversational and voice commerce. According to the above-mentioned study conducted by Mastercard and Future Agenda, 87% of Americans have heard about voice assistants and chatbots while 66% of them have ever used either chatter bots or voice commands. In Europe, 20% of users have ever made a purchase using a voice or text assistant.


However, the study also showed that many customers worried about their privacy when using voice commands, especially when it comes to making payments online. The lack of data privacy can become the main obstacle towards V-commerce to become widely used.

Lack of capabilities

Despite the existence of various voice assistants and smart speakers, the capabilities of current voice commerce are still poor. For example, with Google Home, users don’t have access to the same features for pizza ordering which they have when using Domino’s website. They can only buy the same pizza they’ve ordered in the previous time or place their most frequent order. They can’t buy something new, change delivery address, or use another payment data.


Amazon Echo Dot has similar disadvantages. To enter your credit card information or set where you want to get your order, you have to use a separate mobile app on your smartphone. It still requires you to type. Like Google Home, Amazon Alexa enables users to buy what they used to purchase. As an option, you also can select among items in the Prime Now list. Unfortunately, voice assistants can’t fully replace chatbots or mobile apps as of now.

Final thoughts

Voice commerce is only gaining momentum in improving the customer shopping experience. Voice assistants significantly simplify our typical buying process and allow us to make frequent purchases in a fast and convenient way. You can buy your favorite pink paper towels while driving as like you would stay in front of a shop assistant and talk to him or her.


However, V-commerce can’t exist on its own without mobile apps or chatbots at least now. You can’t find on the Internet and buy anything you want using only voice commands. Unfortunately, there are too many limitations that include the requirement for a particular online store or booking service to support voice control. Although, it’s just a beginning, there much more new capabilities to come in the next few years.