EN ·
🌏 中文 AI Research Roundup: The Frontier of Diffusion Models (End of 2025)
Key Research Highlights
- Stream-DiffVSR (Hau-Shiang Shiu et al.): Introduces a low-latency video super-resolution method using a causal conditional diffusion framework. By utilizing an Auto-Regressive Temporal Guidance (ARTG) module and a lightweight decoder, it achieves a 130x reduction in latency, making it the first diffusion-based method suitable for online deployment.
- Diffusion Knows Transparency (Shaocong Xu et al.): Leverages video diffusion models for depth and normal estimation of transparent objects. Using lightweight LoRA adapters, this method achieves zero-shot SOTA performance on the TransPhy3D dataset, with successful applications in robotic grasping.
- Improving Reasoning for Diffusion Language Models (Kevin Rojas et al.): Proposes the Group Diffusion Policy Optimization (GDPO) algorithm. By employing semi-deterministic Monte Carlo sampling to reduce variance in ELBO estimation, it outperforms existing baselines in mathematical and code-generation tasks.
- OpenPBR (Jamie Portsmouth et al.): Details the physical rendering theory behind the OpenPBR material model, providing a standardized framework for metals, dielectrics, subsurface scattering, and thin-film interference.
- Symbolic recursion method (Igor Ermakov et al.): Introduces a symbolic recursion method for studying strongly correlated fermions, confirming the universal scaling of operator growth and the scaling law for charge diffusion constants.
- RoboPerform (Zhe Li et al.): Presents the first unified framework for music- and speech-driven humanoid dancing. It uses a ResMoE teacher strategy combined with a diffusion-based student policy to ensure physical plausibility and audio alignment.
- RoboMirror (Zhe Li et al.): Enables redirection-free humanoid locomotion based on video understanding. By using vision-language models to drive diffusion-based policies, it significantly reduces latency for telepresence applications.
- Memorization in 3D Shape Generation (Shu Pu et al.): Quantifies memory effects in 3D generative models, demonstrating that memorization scales with data diversity and condition granularity, while proposing strategies for mitigation.
- Learning to Refocus (SaiKiran Tedla et al.): Proposes a novel method for realistic post-capture refocusing using video diffusion models, enabling interactive refocusing from single blurred images.
- LiveTalk (Ethan Chern et al.): Builds a real-time multimodal interactive video generation framework. Through improved policy distillation, it resolves visual artifacts, achieving performance superior to models like Sora2 and Veo3 in multi-turn interactions.
- ThinkGen (Siyu Jiao et al.): The first Chain-of-Thought (CoT) based visual generation framework, utilizing a decoupled MLLM-DiT architecture to achieve generalized generation across diverse scenes.
- PurifyGen (Zongsheng Cao et al.): A training-free safety framework that employs semantic distance evaluation and dual-space projection to purify harmful concepts in text-to-image generation.
- AnyMS (Binhe Yu et al.): A training-free multi-subject customization framework using bottom-up attention decoupling to balance text alignment, identity preservation, and layout control.
- HY-Motion 1.0 (Yuxin Wen et al.): A large-scale text-to-3D motion generation model. With a billion-parameter DiT flow-matching architecture, it demonstrates superior motion coverage and precision in text-to-motion alignment.
Primary Research Directions
- Video Processing: Focusing on low-latency super-resolution, depth estimation, and dynamic scene generation (e.g., Stream-DiffVSR, DriveGen3D).
- Diffusion Language Models: Improving reasoning capabilities and generation quality through reinforcement learning strategies (e.g., GDPO).
- 3D Generation & Robotics: Integrating diffusion models for physically plausible humanoid control and 3D content synthesis (e.g., RoboPerform, RoboMirror, HY-Motion).
- Safety & Optimization: Developing training-free safety purification and efficient preference alignment algorithms (e.g., PurifyGen, DDSPO).
- Physical Simulation: Applying diffusion models to solve complex problems in fluid dynamics and high-dimensional PDEs (e.g., Fokker-Planck equations).
Research Trends Analysis
As of late 2025, research in diffusion models is characterized by a shift toward cross-disciplinary integration. The field is moving beyond simple image generation toward low-latency online deployment, implicit learning of physical laws, and deep integration with the reasoning capabilities of Large Language Models. Technically, researchers are increasingly utilizing attention decoupling, flow matching, and training-free lightweight adapters (LoRA) to achieve more efficient and safe generative tasks. With the rise of embodied AI and scientific computing, diffusion models are evolving from mere “visual tools” into foundational engines for intelligent decision-making and simulation.