Xiangyu Zeng

VideoChat-r1.5: visual test-time scaling to reinforce multimodal reasoning by iterative perception

2025年10月11日

StreamForest: efficient online video understanding with persistent event memory

2025年10月11日

Make your training flexible: towards deployment-efficient video models

2025年8月12日

Task preference optimization: improving multimodal large language models with vision task alignment

2025年4月20日

Online video understanding: a comprehensive benchmark and memory-augmented method

2025年4月20日

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

2025年4月15日