Business Daily Media

Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

AIA Singapore unveils "Road to HYROX" video series, reinforcing its leadership in championing accessible and inclusive wellness

SINGAPORE - Media OutReach Newswire - 3 November 2025 - Leading health insurance provider AIA Singapore today announced the launch of "Road to HYROX", an inspiring multi-episode video series...

Inside "Beautiful Nightmare": When BXG Turned Halloween into a Luxury Fantasy with VIP EXTRA

Where beauty met the forbidden, and the night remembered every name. The Night That Redefined Halloween MACAU SAR - Media OutReach Newswire - 3 November 2025 - Halloween in Asia ha...

‘Gold, Glory & Galaxy’ Lucky Draw to Surprise Galaxy Macau Guests with Glittering Array of Prizes this November

From November 4-30, guests can scoop exclusive giveaways to celebrate Galaxy Macau’s hosting of the table tennis events for the 15th National Games of China. MACAU SAR - Media OutReach Newswire - ...

Visa and DealMe launch NanuPay, the world’s first cross-border card installment service, available in South Korea for Vietnamese cardholders

HO CHI MINH CITY, VIETNAM - Media OutReach Newswire - 4 November 2025 – Visa (NYSE: V) and fintech DealMe have launched NanuPay – the first solution that lets Vietnamese Visa credit cardholders ch...

XTransfer and Shanghai Pudong Development Bank Co. Ltd., Hong Kong Branch Sign the Strategic Agreement at Hong Kong FinTech Week

Co-building Global Trade Finance Infrastructure HONG KONG SAR - Media OutReach Newswire - 4 November 2025 – XTransfer, World's Leading B2B Cross-Border Trade Payment Platform and Shanghai Pudong...

AIA and Tottenham Hotspur Football Club Extend Partnership Through to 2032

Extension will mark nearly two decades of collaboration, making it one of the longest Premier League club sponsorships in history. AIA to become Global Training Partner of Tottenh...

The Chelsea Clinic Expands into New Flagship at Ngee Ann City

Part of SBC Medical Group, the clinic enhances patient experience and reinforces Singapore’s role as an aesthetic medicine hub SINGAPORE - Media OutReach Newswire - 4 November 2025 - The Chelsea ...

From Taiwan with Flavor, Duan Chun Zhen’s Soulful Beef Noodles Win Over Hong Kong

TAIPEI, TAIWAN - Media OutReach Newswire - 4 November 2025 - In Hong Kong, a city celebrated for its vibrant culinary diversity, beef noodles have long been a local favorite. But a new flavor is c...

Hải Phòng industry powers up with new project from Indochina Kajima’s Core5 Vietnam

HAI PHONG, VIETNAM - Media OuReach Newswire - 4 November 2025 - Indochina Kajima, the joint venture between Indochina Capital and Kajima Corporation, and ITOCHU Corporation held a ground-breaking ...

KPMG successfully concludes Digital Assets Forum: Policy, Market Dynamics and Institutional Adoption

Cross-sector collaboration drives innovation in digital assets ecosystem HONG KONG SAR - Media OutReach Newswire - 4 November 2025 - KPMG is pleased to announce the successful conclusion of the Di...

Driving smarter: how car subscription models are redefining mobility and financial flexibility

The world of mobility is changing fast, and car ownership is no longer the default. Across Australia, professionals and businesses alike are seeki...

The Future of Wealth Technology

“You shouldn’t need a large account balance to experience real-time investing. Technology should make that kind of access universal.” For decades...

Thryv wins national accolade at 2025 Australian Service Excellence Awards

  Thryv® (NASDAQ: THRY), Australia’s provider of the leading small business marketing and sales software platform, announced that Greg Nicolle, G...

pay.com.au unveils first-of-its-kind FX rewards feature, becoming the most flexible rewards solution for Aussie businesses

pay.com.au, the end-to-end payments and rewards platform, today announced the launch of International Payments, Australia’s first foreign exchange...

Yellow Canary partners with Celery to bring pre-payroll assurance technology to Australia

Wage underpayment headlines continue to put pressure on employers of all sizes, revealing how costly payroll mistakes can be for small and medium bu...

Brennan Bolsters Leadership to Accelerate Next Growth Chapter

In a move to further embed cybersecurity at the heart of its business strategy and deliver sovereign secure-by-design solutions for its customers, A...