Inception3D
Get updated via email on new publications or videos by following us on GoogleScholar, on our research blog

2025

Easi3R: Estimating Disentangled Motion from DUSt3R Without Training
X. Chen, Y. Chen, Y. Xiu, A. Geiger, A. Chen
Arxiv, 2025
Abstract: Recent advances in DUSt3R have enabled robust estimation of dense point clouds and camera parameters of static scenes, leveraging Transformer network architectures and direct supervision on large-scale 3D datasets. In contrast, the limited scale and diversity of available 4D datasets present a major bottleneck for training a highly generalizable 4D model. This constraint has driven conventional 4D methods to fine-tune 3D models on scalable dynamic video data with additional geometric priors such as optical flow and depths. In this work, we take an opposite path and introduce Easi3R, a simple yet efficient training-free method for 4D reconstruction. Our approach applies attention adaptation during inference, eliminating the need for from-scratch pre-training or network fine-tuning. We find that the attention layers in DUSt3R inherently encode rich information about camera and object motion. By carefully disentangling these attention maps, we achieve accurate dynamic region segmentation, camera pose estimation, and 4D dense point map reconstruction. Extensive experiments on real-world dynamic videos demonstrate that our lightweight attention adaptation significantly outperforms previous state-of-the-art methods that are trained or finetuned on extensive dynamic datasets.
Latex Bibtex Citation:
@inproceedings{Chen2025Easi3r,
  author = {Xingyu Chen and Yue Chen and Yuliang Xiu and Andreas Geiger and Anpei Chen},
  title = {Easi3R: Estimating Disentangled Motion from DUSt3R Without Training},
  booktitle = {arxiv},
  year = {2025}
}
GenFusion: Closing the Loop between Reconstruction and Generation via Videos
S. Wu, C. Xu, B. Huang, A. Geiger, A. Chen
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Abstract: Recently, 3D reconstruction and generation have demonstrated impressive novel view synthesis results, achieving high fidelity and efficiency. However, a notable conditioning gap can be observed between these two fields, e.g., scalable 3D scene reconstruction often requires densely captured views, whereas 3D generation typically relies on a single or no input view, which significantly limits their applications. We found that the source of this phenomenon lies in the misalignment between 3D constraints and generative priors. To address this problem, we propose a reconstruction-driven video diffusion model that learns to condition video frames on artifact-prone RGB-D renderings. Moreover, we propose a cyclical fusion pipeline that iteratively adds restoration frames from the generative model to the training set, enabling progressive expansion and addressing the viewpoint saturation limitations seen in previous reconstruction and generation pipelines. Our evaluation, including view synthesis from sparse view and masked input, validates the effectiveness of our approach.
Latex Bibtex Citation:
@inproceedings{Wu2025GenFusion,
  author = {Sibo Wu and Congrong Xu and Binbin Huang and Geiger Andreas and Anpei Chen},
  title = {GenFusion: Closing the Loop between Reconstruction and Generation via Videos},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025}
}
Ref-GS: Directional Factorization for 2D Gaussian Splatting
Y. Zhang, A. Chen, Y. Wan, Z. Song, J. Yu, Y. Luo, W. Yang
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Abstract: In this paper, we introduce Ref-GS, a novel approach for directional light factorization in 2D Gaussian splatting, which enables photorealistic view-dependent appearance rendering and precise geometry recovery. Ref-GS builds upon the deferred rendering of Gaussian splatting and applies directional encoding to the deferred-rendered surface, effectively reducing the ambiguity between orientation and viewing angle. Next, we introduce a spherical mip-grid to capture varying levels of surface roughness, enabling roughness-aware Gaussian shading. Additionally, we propose a simple yet efficient geometry-lighting factorization that connects geometry and lighting via the vector outer product, significantly reducing renderer overhead when integrating volumetric attributes. Our method achieves superior photorealistic rendering for a range of open-world scenes while also accurately recovering geometry.
Latex Bibtex Citation:
@inproceedings{Zhang2025Refgs,
  author = {Youjia Zhang and Anpei Chen and Yumin Wan and Zikai Song and Junqing Yu and Yawei Luo and Wei Yang},
  title = {Ref-GS: Directional Factorization for 2D Gaussian Splatting},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025}
}
Feat2GS: Probing Visual Foundation Models with Gaussian Splatting
Y. Chen, X. Chen, A. Chen, G. Pons-Moll and Y. Xiu
Conference on Computer Vision and Pattern Recognition (CVPR), 2025
Abstract: Given that visual foundation models (VFMs) are trained on extensive datasets but often limited to 2D images, a natural question arises: how well do they understand the 3D world? With the differences in architecture and training protocols (i.e., objectives, proxy tasks), a unified framework to fairly and comprehensively probe their 3D awareness is urgently needed. Existing works on 3D probing suggest single-view 2.5D estimation (e.g., depth and normal) or two-view sparse 2D correspondence (e.g., matching and tracking). Unfortunately, these tasks ignore texture awareness, and require 3D data as ground-truth, which limits the scale and diversity of their evaluation set. To address these issues, we introduce Feat2GS, which readout 3D Gaussians attributes from VFM features extracted from unposed images. This allows us to probe 3D awareness for geometry and texture via novel view synthesis, without requiring 3D data. Additionally, the disentanglement of 3DGS parameters - geometry (x,α,Σ) and texture (c) - enables separate analysis of texture and geometry awareness. Under Feat2GS, we conduct extensive experiments to probe the 3D awareness of several VFMs, and investigate the ingredients that lead to a 3D aware VFM. Building on these findings, we develop several variants that achieve state-of-the-art across diverse datasets. This makes Feat2GS useful for probing VFMs, and as a simple-yet-effective baseline for novel-view synthesis.
Latex Bibtex Citation:
@inproceedings{Chen2025Probing,
  author = {Yue Chen and Xingyu Chen and Anpei Chen and Gerard Pons-Moll and Yuliang Xiu},
  title = {Feat2GS: Probing Visual Foundation Models with Gaussian Splatting},
  booktitle = {Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = {2025}
}


eXTReMe Tracker