Read Latest Issue now

Captioning and dubbing

Posted on Feb 19, 2025 by FEED Staff

Speaking in tongues

While AI makes its mark on essentially every industry, we explore its profound impact on captioning, subtitling and dubbing

Words Katie Kasperson

If you’re watching a foreign film in the cinema, it’ll likely either have subtitles or be dubbed over with English dialogue. If you’re watching TV in public and can’t make out the audio, it might have closed captions to verbally describe whatever is being displayed. These are just two examples of when captions, subtitles or dubs might be used. For some, captions are a personal preference and for others they’re necessary for understanding – particularly for neurodivergent people or those who are hard of hearing (HOH).

Increasingly, AI and machine learning are being used to generate these audio-visual aids, with speech-to-text technology (and, conversely, text-to-speech for dubbing) cutting the time and cost of doing so manually. With companies expressing a growing attention to accessibility and interest in localisation, AI captioning, subtitling and dubbing platforms are becoming a mainstay in video production.

While big-budget studios like Netflix have the financial means to train voice actors and hire professional subtitlers (think of Stranger Things Season 4’s [Eleven pants] or [tentacles squelching wetly], which were written with a human sense of humour), AI’s appeal is in its efficiency. It doesn’t remove the human element altogether; instead, it supplements it, allowing employees to spend more time on quality control and less on the tedium of transcribing by hand. As with dubbing, this technology encourages localisation of content, making it appeal to global audiences who speak any number of languages – and catering to our increasingly connected world.

A woman sitting being interviewed on TV, speaking confidently about their achievements
While AI is making great strides, achieving truly high-quality subtitles requires a human touch

Doin’ it right

First, let’s explore the subtle yet significant differences between captions (both open and closed), subtitles and dubs, as these terms – primarily closed captions and subtitles – are often interchanged. Subtitles are transcriptions of dialogue only, and can be used to translate whatever’s being spoken into a language that audiences can read (for example, a Spanish telenovela might have English subtitles). Captions, on the other hand, describe all aspects of audio – dialogue, sound effects, background music and so on – with ‘open’ meaning always visible and ‘closed’ meaning they can be toggled on or off. Lastly, dubs are in a league of their own. Added in post-production, dubbed audio replaces the original without altering the video – a common practice when, say, creating an English-language version of a Japanese film. Dubbing can also be used in other instances, such as when James Earl Jones dubbed the voice of Darth Vader.

When captioning, subtitling or dubbing, accuracy is critical to ensure the production’s original intent is preserved rather than getting lost in translation. “An accurate transcription means that every word, tone and context is conveyed precisely, enabling audiences to understand the full meaning without ambiguity or misunderstanding,” explains Sharon Biggar, head of marketing at Happy Scribe. “AI is rapidly advancing in language processing, but it still faces challenges when it comes to translating and localising subtitles with nuanced elements like idioms, sarcasm and cultural references – a process known as adaptation.

“While AI is making great strides,” Biggar continues, “achieving truly high-quality, culturally adapted subtitles requires a human touch. At Happy Scribe, we have our own team of skilled linguists who review and refine our AI-generated content. The Scribes, as we call them, adapt jokes, idiomatic expressions and cultural references so that the full text resonates authentically with the target audience and the final output is both accurate and relevant.”

“Besides maintaining accuracy for clarity’s sake,” notes Jane Sung, COO at Cinedeck, “if AI makes mistakes in captioning, subtitling and dubbing – and these are not picked up by a media operator before the content is broadcast – this can lead to viewers disengaging or being offended. Depending on the nature of the error, this could damage a broadcaster’s brand image and, at the far end of the scale, there could also be some kind of sanction imposed if they breach regulatory requirements.”

These requirements, which include the Americans with Disabilities Act (ADA) and the Web Content Accessibility Guidelines (WCAG), are designed to accommodate all viewers, particularly those who are deaf, HOH, autistic or have ADHD. Captions and subtitles can increase comprehension for these groups, ensuring they have the same opportunity to understand the content as anyone else. “Providing top-rate captions is a must for meeting regulatory compliance as well as the high expectations of today’s quality-conscious audience,” explains Sana Afsar, management staff at Interra Systems.

Imperfections and opportunities

Besides making the occasional contextual error, AI also falters under other conditions, particularly when transcribing noisy or unclear audio – or if it hasn’t adequately learnt a certain language. At Happy Scribe, “The most common languages such as English, Spanish and German have high AI accuracy – generally 90% or above,” states Biggar. “However, there are other, less common languages for which the AI is not well trained and might have lower accuracy. For these languages, the AI can get the transcriptions wrong.” The intelligence aspect of AI comes into play here; like a human, if the tech hasn’t been exposed to a certain language, its understanding will be lower and the output may not make any sense to a native speaker.

“This is a problem if not corrected, as inaccurate AI transcriptions can result in miscommunication,” suggests Biggar. “In the media industry, it can lead to reputational damage; in fields like healthcare, errors may have more serious consequences.” To prevent an over-reliance on AI, Happy Scribe is transparent about the ‘average accuracy by language’ and offers additional ‘human review’ when that number is inadequate.

Despite its imperfections, “AI is already transforming how media companies and content producers carry out captioning,” claims Sung, touting its role in ‘making media production more efficient’. This especially benefits live broadcasts, allowing producers to add captions in real time.

Biggar echoes this optimism, adding that “nearly any audio or video content can be transcribed in minutes, enabling rapid turnaround times and significantly lowering costs compared to manual transcription. By handling time-consuming tasks, AI allows professionals to focus on higher-level work,” she concludes, presenting a key idea: when used responsibly, AI improves productivity but doesn’t necessarily replace the creative process.

New tools

AI-powered speech-to-text technology is cropping up everywhere. Apple’s Voice Memos app now offers automatic transcription as part of iOS 18. In the US, Warner Bros Discovery recently deployed Google Cloud’s Vertex AI platform to create captions, reducing both time (by 80%) and cost (by 50%) compared to manual methods. Meanwhile, AI dubbing start-ups like Metafrazo and Panjaya are entering the playing field, promising to improve dubs with better lip-syncing and more diverse voices.

While AI can encompass a number of complex technologies, Interra Systems uses automatic speech recognition (ASR) and natural language processing (NLP) to create and correct captions. “ASR transcribes spoken language to text in real time, while NLP optimises this text by interpreting nuances such as slang, tone and punctuation for greater accuracy,” describes Afsar. These technologies combine into one solution called Baton Captions, which can ‘effortlessly QC captions and subtitles, generate captions from transcribed audio, auto-correct errors, translate captions into various languages and regenerate captions for different video deliveries. Starz is just one of several companies to leverage this tool, using it primarily to streamline its English-to-Spanish subtitling process.

Happy Scribe’s AI solutions – which cover transcribing, subtitling, translating and dubbing – promise high-quality results, combining ‘advanced AI technology with human expertise and refinement’ to make this happen. “As a technology-driven company, we invest heavily in training our model and continuously enhancing AI accuracy through daily evaluations,” shares Biggar. “However, we understand that AI still has its limitations. That is why we include humans as part of the mix, to ensure our customers benefit from the speed and cost-effectiveness of AI without compromising on quality.” Happy Scribe has seen strong adoption across the media industry, working with the BBC, 3Cat and Skydance to reduce localisation times and produce subtitles in over 40 language pairs.

Finally, Cinedeck – which offers various solutions across the entire video workflow – recognised the resource-heavy nature of captioning, calling it a ‘no-brainer’ to incorporate AI technology into its all-in-one platform, Cloudflow Hub. “By integrating real-time AI captions and translations within the Cloudflow Hub solution, we’re delivering an efficient way for providers and broadcasters to make their content more accessible and ultimately reach a wider audience,” Sung summarises.

Two professionals speaking in an office, facing each other
AI is already transforming how media companies and content producers carry out captioning

Human touch

Despite AI’s abilities, humans are still in the picture (for now). Primarily involved at the QC stage – as is the case with Happy Scribe – a conscious mind (or two) is still the best tool when spotting mistakes. Put more succinctly, “While AI accelerates the process and handles much of the automation, human oversight is still integral to maintain quality, cultural relevance and contextual accuracy,” explains Afsar.

There may come a point in which AI replaces the human element, but experts largely agree that the technology is not quite there yet. “There’s still some way to go before it can fully capture the complex nuance in human language and always get it right, particularly with sensitive issues,” argues Sung.

Returning to the Stranger Things example, captioning, subtitling and dubbing can be considered an art form – something beyond a bot’s capabilities. Trained linguists tend to offer a broader vocabulary shaped by whatever’s happening on screen, such as ‘sinister’, ‘ferocious’ and ‘dissonant’ – words all used to describe the audio in Season 4. In Squid Game – originally recorded in Korean – Netflix hired professional voice actors to dub the hit show in over 20 languages, including English, French, Spanish, German, Russian, Italian and Turkish. By using genuine human voices, the dubs arguably maintain the show’s authenticity as much as possible – though watching in the original Korean remains the most reliable option.

That said, “AI capabilities are improving at a remarkable and exponential rate,” notes Biggar. “With continuous improvements driven by machine learning, it may only be a matter of time before AI reaches or even exceeds human-level quality.”

Sung echoes this cautious prophesying: “Human language and cultures are so complex that it’s hard to imagine that AI will entirely replace humans in this process, but technology is evolving so quickly that it’s hard to say what will happen in the future!”

This feature was first published in the Winter 2024 issue of FEED

The Super Election Year

January 7th, 2025

As 2024 draws to a conclusion, FEED reflects on the biggest year for elections...

Tin Can | South Korea | Start-up

June 30th, 2021

Media tech start-ups from around the world tell their stories, including AI content generation...

Building Gaming Sphere London

April 3rd, 2019

Red Bull’s Gaming Sphere in London has been built to create a flexible and...

Extended Reality Special

June 26th, 2024

Extended reality has the potential to produce eye-popping content both quicker and cheaper than...