Tech Feed

Unlocking hidden treasures with AI librarians

No longer the preserve of media and entertainment, video is a universal communication tool spanning sectors from education to architecture to health. Within the marketing sector alone, 43% of marketers say they would create more video content if it weren’t for barriers like time, resources or budget. That’s where artificial intelligence and machine learning could yield dividends.

“While AI is now table stakes, fast transfer technologies [like IBM Aspera] are able to move any size or type of media content at maximum speed to enable near real-time media asset management workflows,” says David Kulczar, manager of IBM Watson Media, which provides AI-powered video services, including automatic closed captioning and highlight clipping. “As a result, AIs can be engaged from any distance without additional investments in satellite uplinks, dedicated fibre or on-site equipment or manpower.”

AI/ML tools are being integrated into editing and production asset management systems to create more efficient workflows, and curate and extract vast quantities of metadata.

Recently released media asset management software includes a sub-$4000 appliance from axle ai that integrates face and object recognition and low-res proxy collaboration, and includes a panel for Premiere Pro CC. Prime Focus Technologies has inked a partnership with metadata specialist GreyMeta. Storage vendor Elements united with Veritone for automatic speech-to-text transcriptions. And Dalet added a Content Discovery module to its asset management software for journalists in the newsroom.

However, the rise of AI tools is undercut by the lack of a standard for describing, searching and finding assets in the first place. Each AI provider has different API’s and access mechanisms for the analyses they perform. Each media asset management provider will offer a different way of integrating an AI and each corporation – whether museum, sports league or TV producer – will have a bespoke means of describing their own metadata.

“For many years the MAM industry has been starved of metadata, including the ability to harvest technical metadata from media, add intelligence from file and folder naming, and offer manual tagging,” says Dave Clack, CEO of Square Box Systems. “Now with AI tools, there is a huge volume of metadata becoming available, often of low quality. Strong MAM players can manage this metadata explosion by curating metadata quality and providing strong user experience to display it well and unlock its potential.”

A global standard for metadata is probably unfeasible, given the granularity required for each application. The more proscriptive a standard, the more restrictive it becomes. However, a flexible exchange format can bring metadata from an AI into a MAM in a useable way.

Cantemo’s cloud hub for managing video, iconik, comes with a framework capable of integrating with AI platforms. Iconik currently supports Amazon Rekognition and Google Cloud Video Intelligence but can also integrate custom-trained video analysis solutions, adapted to content and workflow.

“Iconik can be set up to analyse all or specific assets by its AI-framework and each entity, object or occurrence tagged with metadata,” explains chief executive Parham Azimi. “The suggested tags from the AI/ML are time code based and come with a confidence level, representing accuracy. Different workflows can be applied to each tag depending on the confidence level. For example, if a user wants to be shown every clip or a sub-clip with a shot of the sky they need only search for the phrase ‘sky’’.”

This form of support allows users to tap into any AI solution. “For example,” says Azimi, “a custom-trained AI capable of recognising exact content or specific products – perhaps ‘Nike Air Zoom Pegasus 34iD’, rather than simply ‘white sneaker’.”
There are potential conflicts when working with metadata originated under different AI/asset management protocols. One such flashpoint lies in assuming that the value of metadata in a file will be the same regardless of the AI-driven data set that generated it.

“As the area is still maturing, I would not assume that, for example, face recognition from one system would be equal to face recognition from another,” says David Schleifer, COO, Primestream. “In fact, one system may focus on known famous people while the other might be a learning algorithm to build collections – therefore, different data used in different ways built on similar technology.”
Nonetheless, the ability of AI/ML engines to auto‑metatag based on speech-to-text, natural language understanding, sentiment and tone analysis, and semantic scene detection is perceived as an increasingly significant benefit.

“The next two years will be full of things found in archives that we never thought we had,” says Nick Pearce-Tomenius, sales and marketing director at storage specialist Object Matrix. “One example where we will see this being useful is the trend for consuming old TV series. AI can help find programmes from a specific series, even if it was made 20 years ago, and it can also help find other programmes featuring the same actors potentially. This suddenly makes it easy for broadcasters to monetise their legacy content.”

In order for AI to extract the most value from these assets, their location need not be in the cloud. They could just as well be stored on-premises or, increasingly commonly, a mix of owned hardware linked to low-res versions, back-up copies or other data off-site.

“Many on-premises asset management systems can export proxies to cloud storage to gather additional metadata, applying these tags to high-res media maintained on on-prem primary storage,” says Jim McKenna, VP sales and marketing at Facilis.
Facilis has FastTracker, an asset tracking application that can index both on-premises and cloud-based assets when they are made available to the server as a file system (WebDAV or similar). In this way users can search for assets that exist on cloud locations, as well as on their on-premises storage.

“On locating the asset needed in a cloud location, it can be dragged to an active high-speed partition and made available to editorial staff,” explains McKenna.
Storage vendor Quantum has also implemented AI into its products by partnering with Veritone, a developer of an AI operating system and orchestration layer for cognitive engines.

“aiWARE for Xcellis is a storage appliance for on-premise or hybrid cloud AI-based workflows,” explains Jason Coari, director of scale-out storage solutions at Quantum.
“There are currently 175 AI engines within the service, 60% of which are transcription-based. Transcription, object recognition and facial recognition AI engines can all be fully deployed on-premise. aiWARE Xcellis is different in that it’s searching for the best cognitive engine for the task at hand, rather than locking a customer into a specific engine that may only be useful for 20% of their workflows.”

Automating sports

The big value of AI is its ability to help humans scale their efforts. For example, video production teams have been clipping highlights for decades, but it is a time and labour-intensive process. Now, with AI capabilities, production teams can create highlight reels at scale, quickly identifying key moments from within thousands of hours of video at gigantic events such as Wimbledon or the Olympics.

“If, for example, a user wanted to post a short clip of Zlatan Ibrahimovic’s two goals scored during the EFL final against Southampton in 2017, it would take only minutes to find the exact clips and push these to social platforms using iconik,” says Cantemo’s Azimi.

In April, IBM partnered with the Masters to bring cognitive highlights to the golf tournament. IBM’s AI identified key highlights based on cheering, high fives, commentary and TV graphics within specific video frames. As a result, video editors were able to package and distribute highlight reels in near real time.

The Recording Academy also selected IBM’s AI for the 60th annual Grammy Awards to streamline the processing of over five hours of red carpet live coverage and more than 100,000 images while also providing lyrical and fashion analysis for fans.

In a more traditional news setting, AI frameworks make it possible to immediately identify and call up clips relating to a specific topic within breaking news. In the case of a natural disaster, a broadcaster might search within a content library and use AI to identify legacy footage of what an area looked like before the disaster, or search for past incidents in the same area.

“This has been possible in the past, but in this case every piece of footage would have been tagged manually, requiring hours of labour,” explains Azimi. “Broadcast archives are traditionally massive in scale, and the more content that exists, the more difficult it is to manually tag content. Automated workflows, such as AI tagging, are therefore more important than ever.”

Automated Distribution

The next step to automated production is automatic publication of personalised media experiences. MAM vendor Tedial claims we are already there. Its sport event tool Smartlive uses ‘AI enhanced’ logging to automatically create highlight clips and pitch them to social media.

“This extends the abilities of current live production operators to manage hundreds of automatically created stories,” says Jay Batista, general manager at Tedial (US).
“Applications are being developed, especially in reality television production, where auto-sensing cameras follow motion, and AI tools such as facial recognition augment the media logging function for faster edit decisions as well as automatic social media deliveries.”

At the moment, AI is not accurate enough to completely remove the need for human intervention. In the case of iconik’s AI framework, the user can set rules so that any metadata tag with a confidence level below, say, 50% is discarded, anything above 75% is automatically approved, and anything else is sent for human approval.

Created Becomes Creator

IBM suggests that AI is intended to be a resource, rather than a replacement, and Amazon prefers the term ‘assisted intelligence’ emphasising that at the end of the day, humans are still in control.

“AI-based technology will make mistakes, but the best thing about it is that it will learn from them and become more accurate over time,” says Azimi.

Eventually, in addition to technical discovery, curation and assembly, AI will create content – and sooner rather than later. IBM Watson’s automated selection of clips for assembly of a trailer for the film Morgan is one step on the road to automating the production of scripts, storyboards, video streams and sound tracks. At first, the machine might assist producers, directors and artists in producing content, but, as in many industries, the machine could progressively assume an increasingly central role, forcing the human to redefine itself and invent new roles.

“The risk and challenge is not in our ability to move certain types of programming to an automated process, but rather the loss of editorial judgement that can change based on external factors,” suggests Schleifer.

“Systems that produce content in this manner will adhere to specific rules and as a result will produce consistent content that will never challenge us to get out of our comfort zone.

“The challenge will be to figure out how a system like this can continue to push the envelope and challenge us,” he says. “After all, media as a form of communication is focused on surprising, challenging and helping us grow.”

The risk in sectors like news is that growing polarisation in our culture could be exacerbated by automatically generated clips delivered to viewers based on their users’ preferences and social media profile.

“While the process could be tuned to give more balanced content, the money will be in giving people what they want, which will lead to reinforcing an opinion rather than informing,” says Schleifer.

Can content create itself?

If the answer to the question is yes, the follow-up should be, at what value? There are art installations where sound is generated as people walk through the exhibition. Based on pre-chosen notes it can create a mood, and it can entertain people, but does it approach the emotional impact of music that was created with purpose?

“We have access to much more sophisticated methods to create content these days, but the ultimate question is likely to be ‘to what end?’” ponders Schleifer. “Is content just a babysitter for people who have too much leisure time? Or are we purposefully informing, challenging, entertaining and educating people?”

Leave a Reply

Your email address will not be published. Required fields are marked *