Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×

How agentic AI Is redefining media workflows

Fred Petitpont, CTO and co-founder of Moments Lab, explains how the media industry could be entering a new era–one where humans and machines collaborate to build better and more powerful stories than ever before

With executives anticipating an eightfold surge in AI-enabled workflows by the end of 2025, the shift towards agentic AI is no longer hypothetical. While generative AI set the world on fire for its ability to generate new content such as text, images, and code in response to prompts, agentic AI can use this generated content, along with the ability to call upon external tools and resources, to complete complex tasks with limited supervision. This difference means that while generative AI excels at content creation, agentic AI focuses on the intelligent and autonomous processing and use of information to achieve specific goals, which makes it particularly relevant for applications like video content management and discovery.

Frederic Petitpoint

According to McKinsey research, agentic AI systems could contribute $80-$130 billion annually to the media and entertainment industry, highlighting just how significant this shift could be. To better understand what AI agents are and how they can be used in video management, this article will outline how much heavy lifting this technology will soon be able to do.

A great AI agent starts with great data

The core capabilities of agentic AI, such as its autonomy, smart reasoning, and task-driven nature, hold incredible potential for changing the way we interact with video. Technology is on the verge of redefining human-machine interaction, and rapid advancements in LLMs indicate that agents will likely become the default interface for even complex tasks like video library management.

However, there’s a catch: The quality of the output is directly proportional to the quality of the input data.

Traditional indexing assumes that the researcher knows how a video was tagged—if it was tagged at all. Many editors and producers still rely on their team’s memory to find the content they need, like recalling the best shots from a show or finding particular information from an interview. This very manual process slows down production, adds cost, and limits creativity.

The sheer volume of video content being generated daily makes a keyword-only approach increasingly inefficient and often ineffective. It’s like searching for a needle in a haystack, but only the colour of the needle–not its shape or material–is known. This inefficiency is a significant drain, with reports indicating that poor data quality alone costs organisations an average of $12.9 million annually.

On the other hand, multimodal AI-powered video indexing transforms and accelerates the process. It becomes a media library expert–capable of analysing a video, breaking it down into meaningful scenes, recognising who’s in them, what’s happening, where it’s taking place, and even what kind of shots are used, then feeds the rich metadata and timecoded human-like descriptions into the AI agent. The agent can converse with the user and answer natural-language queries using contextual understanding, supplementing this with internet searches where needed.

A new way of interacting with media libraries

Building on AI-powered video-understanding and indexing, AI agents can dramatically improve the management, navigation, repurposing, and monetisation of video content. When AI automatically sorts, describes, and manages media libraries, users can just type in what they want to find and retrieve it in seconds. It’s like having a teammate that can instantly respond to queries and carry out repetitive, time-consuming tasks, freeing up teams to focus on higher-value creative projects.

The benefits go further. Once the agent understands the video’s context and key moments through indexing, users can prompt it to suggest short summaries, create compelling highlight reels, and even find the right clips for different social media platforms or specific audiences. This drastically speeds up content repurposing, making the adaptation of videos for different uses smarter and faster, enabling media companies to reach specific viewers or quickly react to trending news.

What’s the best strategy for working with AI agents?

The implications for broadcasters, sports organisations, and content platforms are significant. For instance, widely used communication platforms like Slack or Teams can become conduits for integrating AI agents into regular workflows. Instead of manually navigating software interfaces, users could initiate tasks by conversing with an AI agent within these familiar environments, effectively making direct software interaction a secondary step.

One practical example involves an AI agent autonomously generating social media snippets from a live sports broadcast. The agent, fed with real-time metadata and event markers, could identify key plays, select the most impactful visual moments, and even draft accompanying text, significantly speeding up content delivery to various platforms. 

But it’s not just about humans talking to AI because AI agents can also talk to other computer systems. They can quickly read the instructions for how different software works (i.e., APIs or any other agent, as long as they support MCP and A2A standards) and then write their own code to make those systems talk to each other. It’s like they can understand the instruction manuals and then build the connections themselves.

AI agents can also talk to each other in plain language, just like humans do. There’s that famous demo where two AI agents worked together to book a hotel room for someone. Even though computers usually speak in code, human language is a really good and flexible way for even AI systems to understand and communicate with each other.

While all this sounds exciting, there’s that one big underlying condition: AI agents are only as good as the quality of the underlying metadata fed into them. If users don’t have timestamped metadata and an adequate search engine, there’s no way an agent is going to be able to return specific moments from their video library. So, before agents are welcomed into every media library, users need to start indexing properly!

Lastly, current agentic AI, while impressive, has limitations, particularly in areas that demand creativity, nuanced understanding of human emotions and intentions, and complex, abstract reasoning. AI agents can very accurately analyse, reason, and act upon video content data, but they are conformists. True creativity relies on human input; AI cannot do that alone. This could mark the beginning of a new era–one where humans and machines collaborate to build better and more powerful stories than ever before.