Business Daily Media

Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

Mengniu 2025 Interim Report: Continued Focus, Steady Progress in International Expansion

BEIJING, CHINA - Media OutReach Newswire - 4 September 2025 - On August 27, Mengniu Dairy (2319.HK) released its interim results for 2025, reporting revenue of RMB 41.57 billion for the first half...

Hong Kong Joins the Nation in Commemorating the 80th Anniversary of Victory in the Chinese People’s War of Resistance

HONG KONG SAR - Media OutReach Newswire - 4 September 2025 - Hong Kong joined nationwide activities to mark the 80th Anniversary of Victory in the Chinese People's War of Resistance against Japane...

Build Comforting Evening Rituals with XIXILI’s Modern Sleepwear Collection

KUALA LUMPUR, MALAYSIA - Media OutReach Newswire - 5 September 2025 - XIXILI is proud to spotlight its modern sleepwear collection, offering thoughtfully crafted pieces that blend style, comfort, ...

Rediscover Nighttime Comfort and Confidence with XIXILI’s Sleepwear Collection

SINGAPORE - Media OutReach Newswire - 5 September 2025 - Renowned lingerie brand in Singapore, XIXILI, highlights its extensive sleepwear range, designed for elegance and all-night comfort. ...

BGY Fruits Exhibits at ASIA FRUIT LOGISTICA: Establishing Category-Focused Subsidiaries to Support Chinese Fruit Brands Going Global

SHENZHEN, CHINA - Media OutReach Newswire - 5 September 2025 - On September 3, ASIA FRUIT LOGISTICA opened at AsiaWorld-Expo in Hong Kong. BGY Fruits, China's leading fruit retail chain, participa...

PITAKA Unveils New iPhone 17 Aramid Fiber Accessories and Innovative PitaTap™ Technology at IFA 2025

BERLIN, GERMANY - Media OutReach Newswire - 5 September 2025 - At IFA 2025, global premium mobile accessory brand PITAKA debuts its all-new iPhone 17 aramid fiber case series, showcasing cutting-...

Jackson Wang MAGICMAN 2 WORLD TOUR 2025-2026 in Macau Presented by Galaxy Macau™ heads to Galaxy Arena

Reinforcing Macau’s “City of Performances” status and highlighting Galaxy Macau’s captivating attractiveness as Asia’s most dynamic entertainment hub for international visitorsMACAU SAR - Media Ou...

Beyfortus® (nirsevimab) approved in Singapore to protect all infants against RSV disease

Beyfortus (nirsevimab) is the only option that can offer RSV protection designed for all infants with proven high, sustained efficacy, favourable safety and public health impact demonstrate...

BGY Fruits's Global Supply Chain Strategy: Advancing B2B Operations at Home and Abroad, and Actively Building Category Brand Subsidiaries

SHENZHEN, CHINA - Media OutReach Newswire - 5 September 2025 - On September 3, ASIA FRUIT LOGISTICA opened at AsiaWorld-Expo in Hong Kong. BGY Fruits, China's leading fruit retail chain, participa...

China's first independent university run by a foreign university makes "Study in Hainan" more appealing

DANZHOU, CHINA - Media OutReach Newswire - 5 September 2025 - Recently, the permanent campus of Hainan Bielefeld University of Applied Sciences (BiUH), China's first independent university run by ...

Changing the World One Bite At a Time: IKU Turns 40

One of Australia’s first plant-based, chef-led eateries and now ready meal provider IKU is celebrating its 40 year anniversary with the business e...

Three generations marking 45 years in hot-air balloons

Australia’s leading hot-air balloon company is celebrating 45 years in the sky and its 700,000th passenger, driven by the passion of father-son du...

Workplace DMs, Reinvented: Deputy Messaging, Purpose-Built For Shift-Based Teams

Deputy, the global people platform for shift-based businesses, has launched Deputy Messaging, a fully integrated, real-time communication tool designe...

Revolutionizing Fulfillment: How Virtual Warehousing is Changing the Game?

The e-commerce landscape is evolving more rapidly than ever, and the way businesses are managing their fulfillment is also revolutionizing. At the...

SME lender Dynamoney welcomes new CEO, Brett Thomas

Strengthens growth ambitions and signals expanded offering Dynamoney, a leading commercial finance provider for Australian SMEs,  has today appoint...

The cost of ignoring AI governance in business

Artificial intelligence (AI) is no longer the promise of a distant future: it's active, embedded, and already shaping decisions across industries. H...

Sell by LayBy