Business Daily Media

Times Advertising

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

Heritage Spanish brand Osborne taps Singapore distributor Octopus to drive Asia-Pacific Growth

Partnership opens pathway for joint product development and regional expansion Osborne aligns interests by taking S$5m equity stake in Octopus SINGAPORE...

GSM Launches Green SM Platform, A Multi-Service Technology Platform, In Indonesia And The Philippines

JAKARTA, INDONESIA/ MANILA, PHILIPPINES - Media OutReach Newswire - 13 April 2026 - Green and Smart Mobility Joint Stock Company (GSM) today announced the launch of its shared business platform, ...

New Zealand Returns to Food and Hospitality Asia 2026 With National Pavilion Featuring 15 Premium Food and Beverage Exporters

SINGAPORE - Media OutReach Newswire - 14 April 2026 - New Zealand returns to Food and Hospitality Asia (FHA) for the first time since 2018 with a dedicated national pavilion showcasing 15 food and...

Media OutReach Newswire Appoints Pamela Phua as Managing Partner, Southeast Asia to Champion Singapore and Southeast Asian Brand Expansion into Global Markets

SINGAPORE - Media OutReach Newswire - 14 April 2026 - Media OutReach Newswire, Asia Pacific's first and only global newswire, has appointed Ms Pamela Phua as Managing Partner, Southeast Asia. This...

Skincare Brand KK Éclat Celebrates Vogue Beauty Award Triumph with Exclusive Gala in Hong Kong

Vogue Beauty Award Winner KK Éclat Hosts Exclusive Gala HONG KONG SAR - Media OutReach Newswire - 14 April 2026 - In March 2026, French luxury skincare brand KK Éclat hosted a special celebratory ...

Otis Takes Majority Stake in WeMaintain, Supporting Growth and Innovation in Service Technology

HONG KONG SAR - Media OutReach Newswire - 14 April 2026 - Otis Worldwide Corporation (NYSE: OTIS), the world's leading company for elevator and escalator manufacturing, installation, service and ...

Vietnam-China Agricultural Cooperation in a New Era: From Strategic Vision to a Sustainable and Prosperous Supply Chain

BEIJING, CHINA - Media OutReach Newswire - 14 April 2026 - At the invitation of General Secretary of the Central Committee of the Communist Party of China and President of the People's Republic of...

ISCA and LawSoc Team Up to Help Professional Services Firms Expand Regionally and Offer More Value

SINGAPORE - Media OutReach Newswire - 14 April 2026 - The Institute of Singapore Chartered Accountants (ISCA) and the Law Society of Singapore (LawSoc) today signed a Memorandum of Understanding t...

SEOExpert Launches in Singapore's Digital Marketing Space With Proprietary AI and a Results-Only Promise

SINGAPORE - Media OutReach Newswire - 14 April 2026 - Singapore's digital marketing industry has a trust problem, and SMEs are feeling it the most. The pattern is familiar: thousands a month on ...

Hainan Tourism Promotion Seminar Lands in Madrid, Paving the Way for Deeper Cultural and Tourism Ties Between China and Spain

MADRID, SPAIN - Media OutReach Newswire - 14 April 2026 - Recently, the Department of Tourism, Culture, Radio, Television and Sports of Hainan Province, China, hosted a Hainan tourism promotion se...

BizCover Brings Australia’s First AI-Based Insurance Quotes to ChatGPT

Australian small business owners can now receive and compare business insurance quotes directly inside ChatGPT, in a move that signals a major shi...

VistaPrint Research Reveals Australian Small Businesses Face a Succession Cliff

With only 16% of retiring small businesses having a succession plan, tens of thousands risk closure as one in three owners nears retirement.  Ne...

Corporate volunteering grows up: how companies are shifting to meaningful, community-led impact

As workplaces settle into the new year and look for ways to strengthen culture, capability and connection, experts say corporate volunteering is e...

The Rise of Mobile-First Venues

Global Hospitality Platform, Tabit, Reveals Five Ways to Maximise Benefits of Mobile-First Systems  As Australian hospitality venues grapple with...

Why the SME is now the primary engine of global cybercrime

For over a decade, the most practical and effective advice we could offer an employee was to spot the typo. It was practical, it was free, and it wo...

Work-life Balance Key to Solving Construction Talent Shortage

New data from leading talent company Randstad Australia shows flexible working and work-life balance could be critical to addressing ongoing talen...