Business Daily Media

Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

2026 Wealth for Good in HK Summit concludes, showcasing city's appeal as global family-office hub

HONG KONG SAR - Media OutReach Newswire - 24 March 2026 - The fourth edition of the Wealth for Good in Hong Kong (WGHK) Summit concluded today (March 24) under the theme "Building Lasting Legacies...

PT Danantara Investment Management opens DPT registration for PSEL partners

JAKARTA, INDONESIA - Media OutReach Newswire - 19 March 2026 - In its efforts to accelerate the energy transition while addressing the growing challenge of urban waste, the government has tapped P...

Comprehensive Support for International Students to Bolster Hong Kong’s Talent Attraction and Retention

ManpowerGroup Greater China and Beacon Group Partner with FGA Trust and Payment Asia to Launch "Talent in HK" Program HONG KONG SAR - Media OutReach Newswire - 25 March 2026 - ManpowerGroup Greate...

Understanding SkillsFuture Funding and the Real Cost of Upskilling

SINGAPORE - Media OutReach Newswire - 25 March 2026 - Continuous learning plays an important role in helping working professionals remain relevant in a rapidly evolving economy. However, the perce...

Save the Children Hong Kong’s "Heart to Heart Parent-Child Programme" Helps Parents Build Warmth and Boundaries for Children

From Authority to Companion: The Positive Parenting Journey of First-time Parents HONG KONG SAR - Media OutReach Newswire - 25 March 2026 - For many families in Hong Kong, parenting can feel li...

DITP Hosts Thai Night Hong Kong 2026 to Strengthen Thailand’s Entertainment Industry Networks with Global Partners

HONG KONG SAR - Media OutReach Newswire - 25 March 2026 - The Department of International Trade Promotion (DITP), Ministry of Commerce, successfully hosted "Thai Night Hong Kong 2026" on 18 March ...

NBA and The Coca-Cola Company Announce Multiyear Global Partnership

Sprite Returns as League’s Official Global Soft Drink NEW YORK, US - Media OutReach Newswire - 18 March 2026 - The National Basketball Association (NBA) and The Coca-Cola Company today announced a...

Approaching.ai Brings in Top Scientists to Capture AI’s Inference Boom

BEIJING, CHINA - Media OutReach Newswire - 25 March 2026 - Approaching.ai has announced the appointment of two leading figures in computer science to accelerate its growth in high-efficiency AI in...

Strong Growth Prospects For European in Singapore's Food Evolution

European Beef continues to grow in popularity in Singapore and is becoming one of the most sought-after products among consumers in the Asian country SINGAPORE - Media OutReach Newswire - 25 March...

Tintri and Integration Plumbers Announce Groundbreaking Open-Source Integration to Unify Storage and IT Observability

Delivering unprecedented full-stack visibility, the new OpenTelemetry-based solution eliminates storage silos, accelerates troubleshooting, and empowers IT teams with a seamless, vendor-agnostic mo...

How to Apply for More Jobs in Less Time Using AI Automation

Most job seekers spend 11 to 14 hours per week on applications and still hear nothing back. That's not a motivation problem. That's a process proble...

Why Middle Australia Is Quietly Driving the Shift Away From Car Ownership

The narrative around changing attitudes to car ownership has long focused on Gen Z. Younger Australians are often portrayed as the generation movi...

Launchd Acquires WeAreTENZING as ANZ Creator Economy Spend Nears $1 Billion

Launchd, Australia's leading talent-first creator economy group, has acquired WeAreTENZING, one of New Zealand's most respected talent agencies, b...

Time to punch above our weight and stop shadowboxing on AI

Australia prides itself on being an innovation economy. We celebrate startups, talk about productivity, and lean into our reputation for punching ...

Colter Bay Capital Launches as Australia’s Newest Institutional Private Credit Fund

Led by seasoned capital markets veteran Mark Wang, the fund is purpose-built to serve Australia’s most productive yet chronically underserved busi...

Global Thryv voices bring a sharper lens to International Women’s Day

Thryv® (NASDAQ: THRY), ANZ’s leading AI-enabled small business marketing software platform provider, marks International Women’s Day (IWD) with a bu...