The future of media localisation

FEED explores how AI is transforming content localisation in broadcasting and enabling faster and more cost-effective dubbing, subtitling and transcription through various technologies

Oliver Webb|28.11.2025

DECEMBER SIGNAL|9 min read

According to Shawn Zeng, AI strategy director at Accedo, the process of content localisation has been completely transformed thanks to AI. Previously this was a laborious manual process. AI enables localisation to happen more intelligently and efficiently, bringing several key benefits.

These benefits include speed and scale, cost efficiency, consistency, workflow integration and enhancing existing workflows. “Because AI enables real-time or near-real-time translation and subtitling, content can be delivered instantaneously, reaching global audiences simultaneously in any number of different languages,” begins Zeng. “This is particularly crucial for live events and breaking news.”

AI reduces manual translation costs significantly and means it is possible to quickly and cost-efficiently localise content that would previously be unprofitable to localise. “This means that video providers can easily expand into more languages and markets, so even the smallest provider can more easily compete in the global environment,” adds Zeng.

Zeng also notes that it can be particularly challenging for human teams alone to ensure consistency of terminology and brand voice across localised versions. “AI makes that possible by processing thousands of hours of content in a very short timeframe and ultimately ensuring that video providers can create unified experiences across regions.”

Perhaps most importantly for Zeng, AI is enhancing rather than replacing existing workflows. “For low-priority content, AI enables rapid, large-scale localisation that was previously impossible (content that would never have been localised due to cost constraints can now reach global audiences).”

For high-priority content, AI dramatically improves speed, scale, efficiency and accuracy, while human participation remains crucial for quality assurance, cultural nuance and creative decisions. “This hybrid approach, which combines AI automation with human expertise, is where the real value lies. The real transformation with AI should never be about replacing human experts,” says Zeng.

Jacob Arends, senior product manager of playback & AI at Bitmovin, agrees that AI is making content localisation easier and more accessible for all services to reach their users in a number of ways. “Time to market is significantly reduced because less human resources (effort, cost and time) are required,” he says. “It’s also easier to extend localisation efforts across whole existing content libraries as well as new content, which can drive higher user engagement.”

Additionally, it’s simple to combine with existing systems, such as Bitmovin’s AI Scene Analysis, to compound the usability of the content for personalisation, monetisation and content discovery.

Key technologies

A number of key technologies are continuing to drive this shift and reshape production workflows. Zeng identifies several converging technologies:

Large language models: “Both general-purpose models and specialised translation models are now capable of context-aware translation that understands not just words, but intent, tone and cultural context,” says Zeng.

Flexible platform architecture: “This model-agnostic approach ensures you’re not locked into a single provider. It allows you to optimise for quality, cost and performance as the technology landscape evolves.”

Agentic AI orchestration: “There has been a lot of talk about Agentic AI recently and not surprisingly, because this is a real game changer. In this approach, the orchestration platforms coordinate multiple specialised AI agents through intelligent workflows.”

Real-time processing infrastructure: “Streaming APIs and event-driven architectures enable localisation to happen in parallel with content production, rather than as a sequential step.”

Integration and automation: “Modern platforms integrate directly with CMS, video editing tools and distribution systems. They automatically trigger localisation tasks based on content metadata, release schedules and market priorities. This eliminates manual handoffs and reduces time-to-market.”

In addition to translation technologies, other AI tools are improving content analysis and adaptation. Similarly, Arends notes that speech-to-text and other natural language processing systems are advancing rapidly. “Although it’s important to use dedicated models for more accurate translation, most foundational models are now able to provide great results for localisation out of the box.”

However, there is more than just the translation to factor in. “Cultural references are also important for localisation so tools like Bitmovin’s AI Scene Analysis looks to extract as much detail from content as possible to give streaming services the ability to re-use depending on the cultural relevance of their audience,” says Arends.

Common errors

The most common errors in AI-generated translations and subtitles typically stem from insufficient context and cultural blind spots, although Zeng notes that human translators don’t necessarily outperform AI in cultural nuance either.

Context misunderstanding is arguably the most frequent issue. “The problem isn’t necessarily that AI can’t understand context,” says Zeng. “It’s that we often provide too little context. When you ask an AI to translate a single sentence or paragraph in isolation it’s bound to make mistakes.”

Arends also argues that context is the biggest cause of errors in AI translation. Due to this, he believes that this is why human translators will ultimately be required to maintain complete integrity. “Often idioms or slang are misused if the cultural context isn’t understood, so it’s important to use dedicated translation models to keep translations accurate,” he says.

Cultural blind spots are another common challenge, and Zeng argues that human translators also struggle with this. “What makes AI different is that these blind spots can be systematically addressed through workflow design,” he says. “By incorporating cultural validation agents and maintaining updated knowledge bases about cultural sensitivities, regional variations and historical contexts, we can significantly reduce these errors.”

Zeng says the key to solving these problems is a composable framework that allows you to flexibly, dynamically and iteratively maintain a set of resources (terminology databases, cultural knowledge bases, translation memories, style guides) for specific contexts and verticals.

A lack of cultural nuance can compromise quality. “I think it’s important to put this in perspective,” says Zeng. “We’re seeing rapid progress in general-purpose models for translation, which has actually led to a significant decline in the traditional translation industry. Many universities have even cancelled their translation programs. From the pace of development, these cultural nuance challenges are largely a matter of time before they’re addressed at a high level.”

AI translation has already reached an above-average human level of quality overall and in terms of cultural nuance, it’s rapidly approaching human-level performance.

What’s crucial in choosing a solution is selecting a platform that can dynamically incorporate your own knowledge base and even automatically update with the latest relevant knowledge. “Context is the key to success or failure,” admits Zeng. “How you manage it, the efficiency and frequency of updates and how you evaluate whether it’s high quality through workflows and methodologies, these are all critical factors.”

Zeng stresses that it is vital to never lock yourself into a single model or product. “That increases your risk of falling behind and creates significant switching costs. A powerful new model or a new hybrid workflow for cross-validation might emerge suddenly, but if your current product or model can’t adapt to it (either because of potential competition or because they can’t update their products quickly enough), you’re stuck.”

For Arends, the main question for a service to ask is how much does the localisation accuracy impact the viewer experience based on their expectations. “For example, a summary on a social media post can be overlooked for translation inaccuracies, while a premium streaming service will expect colloquial accuracy. However, if customers gain access to content they wouldn’t normally get, translation inaccuracies may be overlooked,” he says.

Future-proofing AI workflows

Zeng emphasises that the future of AI in media localisation isn’t about replacing humans. It’s about combining human expertise and cultural intelligence with speed and scale of AI to deliver a more intelligent and integrated orchestration. The real transformation in localisation isn’t just better translation models.

Accedo is developing Accedo Compose, an AI-native orchestration platform specifically designed for OTT and media operations. Accedo Compose enables organisations to maintain their own curated resources (terminology databases, cultural knowledge bases, translation memories, style guides) for specific contexts and verticals, and combine these with AI models to create unique competitive advantages.

Bitmovin currently have a project working on a sign language avatar to provide accessibility to the deaf community, which is a big challenge affecting many broadcasters because generating sign language interpretations of content requires a lot of manual work from the interpreters themselves, as well as lots of production effort to produce the content.

Sign language is incredibly complex and nuanced; previous studies have shown huge levels of dislike from adults about the accuracy of the generated AI sign language. “That said, a study in the Netherlands showed that children were delighted, even with limited (Dutch assisted) sign language, because they were able to understand their favourite characters for the first time,” explains Arends.

Data sets are also incredibly scarce, making training an AI model to generate sign language near impossible. “Amazon did a project using MIT ASL datasets, but these don’t exist for 300+ sign languages across the world,” he continues. “For the same reason, interpreters are also scarce to even create the databases.”

There are also ethical implications to consider. “The interpreter community is relatively small, and if a signer’s work is used to build a model that eventually replaces the need for, or partly replaces the need for their role, they should be fairly compensated,” concludes Arends. “Also, those that use sign language often come to trust specific sign interpreters, so creating a ‘digital twin’ that may produce lower quality signing could undermine the signer’s integrity and reputation.”

These developments are paving the way for AI localisation that is faster, more scalable and far more culturally and ethically aware.

Check out our December Signal here!

Sign up

The future of media localisation

Key technologies

Common errors

Future-proofing AI workflows

Sign up to FEED Signal

The future of media localisation

Key technologies

Common errors

Future-proofing AI workflows

Recommended

Sign up to FEED Signal