Xiangyu Zeng

VideoChat-r1.5: visual test-time scaling to reinforce multimodal reasoning by iterative perception

Oct 11, 2025

StreamForest: efficient online video understanding with persistent event memory

Oct 11, 2025

Make your training flexible: towards deployment-efficient video models

Aug 12, 2025

Task preference optimization: improving multimodal large language models with vision task alignment

Apr 20, 2025

Online video understanding: a comprehensive benchmark and memory-augmented method

Apr 20, 2025

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Apr 15, 2025