Speech recognition

Why is it crucial in the video era?

The presence of videos in marketing strategies is not a novelty but rather a necessity which cannot be ignored, given users’ preferred modes of using content. However, the rapid growth in adoption of this highly effective, multipurpose content implies considerable challenges for marketers, such as the need to provide subtitles. Subtitling a video is vital for maximising the audience that uses the content, but may involve high costs.

From this point of view, speech recognition technologies may provide valuable support for the generation of automatic subtitles, especially if integrated with a Digital Asset Management solution.


  1. The continual expansion of videos and their key role in marketing
  2. The importance of automatic subtitles: efficient processes and maximum accessibility
  3. Speech recognition: definition and rapid survey of state of the art, from Audrey to Siri
  4. AI at the service of marketers and their videos
  5. The importance of having a digital overview

1. The continual expansion of videos and their key role in marketing

First of all, we need to understand why videos are so important and thus why systems equipped with speech recognition can make a crucial difference for businesses.

According to the statistics, 99% of marketing strategies include video content1 of some kind (promotional, corporate, tutorial, webinars) intended for the most widely varying channels (websites, e-commerce, social media, online or offline advertising campaigns). There are good reasons for this choice: in 2021 YouTube was the world’s second most visited website2 and it attracts more than 2 billion active users every month3. Moreover, 84% of consumers have admitted to being persuaded to make a purchase after viewing an explanatory video about a product or service4. Content of this kind does not only attract large numbers of viewers on YouTube, the best known clip channel. Videos are also playing a more and more central role on the trendiest social media channels (Instagram and Tik Tok above all) and live streaming platforms (such as Twitch) are growing fast. What’s more, according to i59% of the marketers interviewed in a DataBox study5 sponsored posts on Facebook which contain videos generate better performances than images alone. KPI Video Marketing

The trend is also pretty clear within organisations themselves. Companies’ activities, from daily meetings to regular training courses, are increasingly taking place digitally, with the major benefit - amongst other things - that every exchange of information can be recorded so that it is also available subsequently. However, the exponential growth in the number of videos in circulation has led to major challenges for businesses, with the management of data storage space only the tip of the iceberg.

1 The State of Video Marketing, 2020, Wyzowl.
2 Top Websites Ranking, 2021, Similarweb.
3 YouTube Press, 2021, YouTube.
4 What Video Marketers Should Know in 2021, According to Wyzowl Resarch, 2021, HubSpot.
5 Video vs Images in Facebook Ads, 2021, DataBox.


2. The importance of automatic subtitles: efficient processes and maximum accessibility

Anyone who produces videos is well aware that one of the main cost items is creation of the subtitles. Subtitling even a single hour of video may generate huge costs, estimated by Affde6, at about 165 dollars. In fact, subtitling involves a number of different activities: from transcription to synchronisation, by way of the various quality reviews and any translations required. However, subtitles are essential, for a number of reasons. In particular because:

  • a subtitled video is accessible to deaf people and those with hearing impairments
  • multilingual subtitles expand the potential audience for the content
  • subtitles make the content of the video searchable by keywords, just like a document or this article
  • subtitling a video is always advisable, to render it usable for anyone, anytime and in any situation, including users unable or unwilling to turn on the sound.
Importance of subtitles

In other words, subtitles vastly improve the quality of use of a video but, as already mentioned, creating them by hand has major operating implications. Here, technology offers valuable assistance. Specifically, artificial intelligence, thanks to speech recognition, with automatic subtitle generation.

6 Quanto costa fare i sottotitoli in house?, 2019, Affde.


3. Speech recognition: definition and rapid survey of state of the art, from Audrey to Siri

Speech recognition, also known as Automatic Speech Recognition (ASR), is the process by which one or more human voices are recognised and processed by an IT system. This technology is based on Natural Language Processing (NLP), a branch of artificial intelligence.

ASR technologies went mainstream in 2011 with Siri, the voice assistant of the iPhone 4S, but you may be surprised to learn that the origins of speech recognition can be traced back to the last century.

In fact, the first speech recognition system dates from 1952 and was created in the Bell Laboratories, in the USA. Its name was Audrey, based on the initials of the words Automatic Digit Recognition. This all-analogue device was only able to recognise the numbers from 0 to 9.

First speech recognitionOrigins of speech recognition

Over time, speech recognition has progressed in giant steps, and 70 years after Audrey it offers higher and higher levels of precision and more and more innovative functions. According to analysts the speech recognition market is evolving rapidly; sales of these applications are forecast to be worth more than 27 billion dollars by 20267.

7 A Market Harness: Speech Recognition Artificial Intelligence (AI), 2021, Forbes.


4. AI at the service of marketers and their videos

Today, speech recognition systems are increasingly part of our daily lives. We can activate Google Assistant, Siri, Alexa and Cortana with our own voices to ask the navigator which route we should follow when driving, to start our favourite music playlist, or to request guidance for preparation of a recipe in the kitchen.

However, artificial intelligence also supports the processes within organisations. For example, in the sales area, speech recognition can support customer care in transcription of incoming telephone calls, or in IT security it is able to strengthen authentication protocols.

In view of the large increase in the number of videos in circulation, intended both to support brand communications and to promote new products and services, these technologies need (and will increasingly need in the future) also to be made available to marketing and content production staff.

Thanks to speech recognition, marketers avoid incurring huge subtitling costs, and can focus their time on producing more videos and improving the quality of their output. What’s more, current technology guarantees highly reliable subtitling, even in multilingual mode, extending the potential audience for every clip. Essential benefits for a successful content creation strategy.

Automatic subtitle generation


5. The importance of having a digital overview

Very briefly, therefore, speech recognition technologies relieve businesses of repetitive manual activities, enabling them to devote time to the creation of more, higher quality videos.

Videos populate the digital channels of brands, which use them to tell their story, promote their offer and explain how their products and services work. However, no video can achieve its objectives unless the process of its distribution and its use by intended users is completed efficiently.

From this point of view, it is fundamental for speech recognition functions for automatic subtitle generation to be integrated with the platform used for the management and distribution of content across all channels, such as a Digital Asset Management (DAM) system. Software of this kind is able to provide centralised control of the creation and management of many types of content, from conventional documents to images and, of course, videos.

Only the most technologically advanced DAM systems include artificial intelligence functions such as speech recognition and automatic subtitle generation. However, in order to fully exploit this tool’s potentials, it is fundamental to assess its adoption in the context of a system that also handles content distribution. A tool such as automatic subtitle generation is only really effective if it offers marketers and creators useful assistance in streamlining processes, reducing costs and optimising content processing times.

It should be remembered that the same video may be distributed on the company website, on the proprietary e-commerce site or in various marketplaces, all channels which often require files with different technical specifications. A Digital Asset Management tool that provides direct, centralised distribution of video content is able to use the same content on all touchpoints without creating multiple copies, while optimising publication for the specific destination channel.

When a DAM system of this kind also includes a speech recognition technology, subtitle generation is also optimised to the full: each video can be subtitled just once, regardless of the number of publication channels.

Digital Asset Management tool

If a solution of this kind is not available, subtitling has to be replicated for every single touchpoint, even if the content is actually the same.

The overview of the digital asset creation, management and distribution process is therefore fundamental, since it combines the power of technological innovation with the best marketers’ operational and strategic demands throughout the lifecycle of any content, including videos.



Do you want to receive content like this once a month?
NORTH is the one for you.