Artificial intelligence is the next ‘cloud’. Its impact across all industries and society will be seismic. However, there is a balance between what’s possible (the hype) and what’s achievable now (the reality).
The broad adoption of AI continues apace and AI technologies are embedded in everyday life as invisibly as cloud, from AI directly on chips in mobile phones to analytics driving your next online shop. As with cloud, the media industry has very specific requirements which are shaping the adoption of AI (and cloud, which is largely considered an essential infrastructure component to large scale AI deployment in businesses). The current use of AI can be broadly categorised as ‘off-the-shelf’ and ‘built’.
There are numerous off-the-shelf AI services now available, generally these are tied to SaaS although they are not exclusively so and some adopters may choose on-premise infrastructure. In most cases, when building AI, simplicity is still the best way to get to a commercially beneficial solution, using what is sometimes termed ‘good old fashioned AI’ (or GOFAI) which is largely limited to machine learning (ML), and to some extent deep learning (DL) systems. Stretching beyond this is still risky for many organisations, who find themselves struggling to get past the pilot stage and into operational solutions. Nevertheless, as well as products with AI built-in off-the-shelf, there are also a plethora of off-the-shelf ML and DL frameworks and tools that allow developer teams to start to build custom AI and embed it into their software stack.
The general outlook for AI in media in the next few years is that improvements in off-the-shelf AI services and ML/DL building blocks will lead to decreased friction for adoption and lower cost of use. Edge AI (already available in mobile devices, for instance) is likely to be an increasing trend. This will be at a number of levels. AI at the consumer edge will improve audience analytics and increase speed of response to micro and macro trends in behaviour which will start to open up the opportunities for hyper-localised and personalised content (explored later in this article). On-chip technology will enhance both edge networks and reduce centralised costs as the consumer bears the load with embedded deep neural network chips, and further into the future neuromorphic chips in everyday devices. AI driven knowledge graphs are also likely to emerge as a near horizon trend in analytics which will impact decision making. Knowledge graphs help visualise complex data sets in an easy to interpret way, and this is likely to become a common tool in business decision making particularly for larger media organisations who combine multiple phases of the content lifecycle from inception to delivery.
AI in media now
Specific areas that currently dominate the AI landscape in media are machine vision solutions, audio analysis and big data analytics. AI is starting to make its way into media workflows, and adoption here, particularly in creative areas, is likely to be a high growth area.
Metadata generation from media assets is one of the widest areas of adoption, with numerous off-the-shelf and home grown solutions deployed and in regular use across content libraries. Automated analysis of image, audio and text is rapidly transforming the ability of content owners to generate deep datasets about their catalogue which is then leveraged to create better audience experiences, improved recommendation engines and better business decisions on commissioning and acquisition. Typical core technologies employed here generate frame accurate cast/character information, object and branding information, location, languages (spoken and written), scene descriptions, sentiment and emotion analysis, and compliance information. These are heavily commercialised tools, so expect to see advances in accuracy and improvements in speed and cost in the near term here.
Task automation is an area of both opportunity and controversy. Where is the ethical line drawn on replacing human manual tasks with AI processing, particularly where AI shows a clear advantage in performance? In media, this is already well in progress with automation of simple manual tasks in ‘content factory’ type operations, such as the aforementioned metadata generation. Only a few years ago this was a task committed to teams of operators and spreadsheets, now largely replaced by AI solutions.
In the near term, we should expect to see this extend into the automation of mastering, versioning, localising, translation, dubbing, timed text, audio description (linked to computer vision and audio analysis). More human-intensive creative processes including editing, colour, sound mixing, VFX design, world building and so on will more likely benefit from AI in an ‘assist’ role where the more boring and repetitive tasks are performed quickly and automatically to help the creative teams focus on their artistic input rather than the legwork that goes with it. There have been demonstrations of AI-driven editors for instance in creating trailers and multi-camera live cutting; expect much more in this space soon. There is, however, a lack of discourse regarding the ethical implications of these technologies which will need common interest groups such as industry bodies to make efforts to address the long term concerns about the impact on existing and future workforces.
Another near horizon advance is likely to centre on hyper-localised and personalised content. Image localisation is already well established with services for replacement of branding and objects in media. This can be extended considerably using AI-driven technologies that can automatically replace objects, actors, re-voice dialogue, change time of day, etc. Such technologies will feed the ability to dynamically localise and personalise content, for instance inserting your favourite personality into an advert for goods targeted specifically to you. The same technologies employed in so-called deep-fake videos are commercially viable for such tasks now, and although they still sometimes suffer ‘uncanny valley’ issues, the automated replacement of voices and faces offer numerous applications in visual effects, localisation, personalisation, language dubbing/re-voicing and more. Many of these tools will increasingly be available as edge services and over the longer term will very likely be integrated to real-time feedback on the customer experience (see emotional experience, later in the article).
Future advances in AI
There are numerous new early stage AI technologies which may prove to have a big impact in the future, but equally may be quickly superseded or fall by the wayside. I have picked out a few which, in my opinion, offer significant advances for the media industry.
Small data is a blend of machine learning with human expertise that allows AIs to be trained to a sufficient level to be effective with only minimal data sets. This also incorporates the transfer of learnings between tasks, so that the trained AI can be applied across a wide variety of subsets of tasks without being explicitly trained. This could, for instance, lead to virtual creative assistants honed to a particular artist’s style, or quickly applying AI to a new specific task that is only useful on a particular project. Furthermore, the elimination of the need for huge training data sets will help democratise access to AI tools, putting it more easily within reach of much smaller organisations or individuals.
Things as customers: essentially SaaS services making buying decisions from other SaaS services autonomously. This will initially be via predetermined rules, but increasingly become an AI driven buying and selling marketplace. This obviously leads to the intriguing notion of AI-driven service marketing with proactive pricing, bundling of services, discounts, etc. We are already seeing early implementations of such behaviour with cloud orchestration software with automatic rules based engines that can determine which public cloud to buy processing time in on a real-time variable cost basis (for instance Amazon AWS Spot instances; dynamically demand-priced compute resource).
Emotional experience is the evaluation of an individual’s feelings, outward behaviour, and expressions to make decisions. This will come from biometrics and passive data collection (such as video, audio of the individual) that can then be used to hyper-personalise content. This could also be linked to multimodal medical analysis (bio hacking) that uses the same types of passive data collection to detect early signs of underlying health issues. In the case of content, this could lead to active therapy, for instance to combat mental health issues by curating content appropriate to benefit the state of mind of the individual. This is much further out, and would require official bodies to be involved to certify such technology, as well as organisations seeking to pursue such tech commercially. Early stage application of emotional experience technology will almost inevitably be used to determine targeted adverts based on your current emotions.
The future of AI applications in media is wide open, and numerous applications for AI are being developed, many not discussed here such as story design, automated news reporting, and standards conversion to name only a few. With such broad application areas, widely available development frameworks and huge amounts of exploration underway, it is inevitable that AI driven tools will continue to be brought to market. With such a pace and volume of innovation, AI may quickly come to dominate the application space and will drive automation deeper into workflows and inevitably impact jobs. With this in mind it will be just as important for the media industry to consider not only what can be done with AI, but what should be done.