Paper-Conference

VideoChat-r1.5: visual test-time scaling to reinforce multimodal reasoning by iterative perception

Oct 11, 2025

StreamForest: efficient online video understanding with persistent event memory

Oct 11, 2025

MotionRAG: motion retrieval-augmented image-to-video generation

Oct 11, 2025

Loquetier: a virtualized multi-LoRA framework for unified LLM fine-tuning and serving

Oct 11, 2025

LongVPO: from anchored cues to self-reasoning for long-form video preference optimization

Oct 11, 2025

Gated integration of low-rank adaptation for continual learning of language models

Oct 11, 2025

EgoExoBench: a benchmark for first-and third-person view video understanding in MLLMs

Oct 11, 2025

Eagle 2.5: boosting long-context post-training for frontier vision-language models

Oct 11, 2025

3D interaction geometric pre-training for molecular relational learning

Oct 11, 2025

VRBench: a benchmark for multi-step reasoning in long narrative videos

Aug 12, 2025