With consumer expectations of viewing experience set through watching tier 1 sports productions, sports programmers are searching for solutions that deliver an engaging experience for a cost that reflects the budgets available.
Artificial Intelligence-driven sports production solutions have been available for some time. But to date, the ability to accurately track the sporting action combined with visual clarity has been a challenge.
This paper describes how a panoramic camera system can be implemented to monitor the entire field of play, to feed an Artificial Intelligence (AI) system that identifies the sporting action. Further, this paper describes how the AI engine can drive a high-quality pan-tilt-zoom (PTZ) camera to follow the action to deliver the program feed.
The paper will also discuss how the AI engine can accurately point the PTZ camera – taking into account video and AI processing delays, how those processing latencies can be measured, to forward predict directional vectors and correctly point the program feed camera irrespective of distance from the camera and the variation in angular velocity realized.
David Edwards | Vislink | Colchester, Essex, United Kingdom Siddhi Imming | Vislink | Colchester, Essex, United Kingdom
“Generative AI” is the next-level process of using machine learning algorithms to create new content based on “intelligence” gathered through training on a large corpus of text (even code!), images, audio, and video files. The broad category of Generative AI utilizes many different types of models, including Large Language Model (LLM). LLM is an AI model trained on a large corpus of text to predict the likelihood of a given sequence of words using statistical techniques. Companies such as OpenAI and Midjourney or open-source communities such as Stable Diffusion have invested millions of dollars in productizing groundbreaking AI papers and pre-training on large datasets available on the Internet. Once available, individuals and companies can utilize these large, pre-trained models to solve domain-specific problems, through transfer learning, without the significant computational overhead.
This paper describes some of the early applications of Generative AI and LLMs models in media broadcasting and streaming. One of the demonstrated applications is the ability to generate actionable video descriptors/tags, such as identifying a scene where a woman is standing in front of the Eiffel Tower or on a moving train.