Novel View Pose Synthesis with Geometry-Aware Regularization for Enhanced 3D Gaussian Splatting

1Ewha Womans University, 2POSTECH
†Project Lead

Project Overview



This study proposes a method to enhance the quality of indoor 3D reconstruction based on 3D Gaussian Splatting (3DGS) using Polycam. The approach generates novel view camera poses, improves them with DIFIX, and incorporates geometry-aware loss terms to further refine reconstruction quality. The geometry-aware loss includes a perceptual loss applied only to novel views and normal and depth consistency losses applied to all views. These improvements enhance the accuracy of geometry reconstruction, strengthen multi-view consistency, and reduce artifacts in the reconstructed scenes. Experimental results show that the proposed method increases PSNR from 20.423 to 21.675 and SSIM from 0.856 to 0.862 compared to the original 3DGS.



Dataset

Captured the indoor space using the Polycam application and downloaded the raw dataset required for 3D reconstruction.

[iPhone 13 Pro]


Download
Raw data

capture-folder/ ├── raw.glb ├── thumbnail.jpg ├── polycam.mp4 ├── mesh_info.json └── keyframes/   ├── images/   ├── corrected_images/   ├── cameras/   ├── corrected_cameras/   ├── depth/   └── confidence/

raw.glb, corrected_images/, corrected_cameras/ was used for this project.

Pipeline

pipeline1


pipeline2

Method

Novel View Camera Pose

We propose a method to generate novel view camera poses by interpolating training camera poses.


Novel View Camera Pose

Intermediate positions along the paths between the original viewpoints are calculated to create new viewpoints. This approach enables smooth camera transitions, making rendering changes appear more natural.



① Translation — Linear Interpolation

$$\mathbf{p}(t) = (1 - t)\,\mathbf{p}_a + t\,\mathbf{p}_b,\quad t \in [0,1]$$



② Rotation — Spherical Linear Interpolation

Spherical Linear Interpolation (SLERP) illustration

$$\mathbf{q}(t)=\frac{\sin((1-t)\theta)}{\sin\theta}\,\mathbf{q}_a +\frac{\sin(t\theta)}{\sin\theta}\,\mathbf{q}_b$$

$$where, \theta=\cos^{-1}(\mathbf{q}_a \cdot \mathbf{q}_b)$$



Novel view camera pose frustum visualization


Depth Smoothness Loss illustration



Novel view camera poses often contain artifacts; therefore, we enhance their quality using DIFIX , a diffusion-based model. The improved novel views are then used as inputs when retraining 3DGS.




Novel view camera pose vs. DIFIX enhancement


Loss Terms


① Perceptual Loss

The perceptual loss (LPIPS) is applied only to novel views. While DIFIX enhances novel view images by removing artifacts, it can cause smoothing that reduces fine details. LPIPS encourages structural similarity between the enhanced novel view and the target image.

$$\mathcal{L}_{\text{LPIPS}} = \lambda_{\text{LPIPS}} \cdot \text{LPIPS}(I, \hat{I})$$

$$\text{LPIPS}(x, \hat{x}) = \sum_l \frac{1}{H_l W_l} \sum_{h=1}^{H_l} \sum_{w=1}^{W_l} \| w_l \odot (f_l(x)_{h,w} - f_l(\hat{x})_{h,w}) \|_2^2$$



② Depth Smoothness Loss

Encourages the depth map to vary smoothly, producing more consistent surfaces for objects and scenes.

Depth Smoothness Loss illustration

$$\mathcal{L}^w_{\text{smooth}} = \lambda_{\text{smooth}} \left[ \frac{1}{N_x} \sum_{i,j} w_{i,j} \cdot \left| D_{i,j} - D_{i,j+1} \right| + \frac{1}{N_y} \sum_{i,j} w_{i,j} \cdot \left| D_{i,j} - D_{i+1,j} \right| \right]$$



③ Normal Consistency Loss

Depth Smoothness Loss illustration

$$ \mathcal{L}^w_{\text{normal}} = \lambda_{\text{normal}} \left[ \frac{1}{N_x} \sum_{i,j} w_{i,j} \cdot \left\| \mathbf{n}_{i,j} - \mathbf{n}_{i,j+1} \right\|_1 + \frac{1}{N_y} \sum_{i,j} w_{i,j} \cdot \left\| \mathbf{n}_{i,j} - \mathbf{n}_{i+1,j} \right\|_1 \right] $$



Depth & Normal Loss Weight

1. Inverse Depth Normalization

$$ \tilde{D}_{i,j} = \frac{D_{i,j}}{\mathrm{median}(D) + 10^{-6}} $$

2. Near Weight

$$ w_{\mathrm{near},i,j} = \frac{\tilde{D}_{i,j}}{1 + \tilde{D}_{i,j}} $$

3. Far Weight

$$ w_{\mathrm{far},i,j} = \frac{1}{1 + \tilde{D}_{i,j}} $$

4. Edge Aware Gating (RGB on) - Gradient

$$ g_{i,j} = \frac{\left\| I_{:,i,j+1} - I_{:,i,j} \right\|_1 + \left\| I_{:,i+1,j} - I_{:,i,j} \right\|_1}{2} $$

5. Edge Gate

$$ \text{edge\_gate}_{i,j} = \max \left( 0.1, e^{-\gamma g_{i,j}} \right) $$

6. Final Weight

$$ w_{i,j} \leftarrow w_{i,j} \cdot \text{edge\_gate}_{i,j} $$


Final Loss

$$ \mathcal{L}_{\text{total}} = (1 - \lambda) \cdot \mathcal{L}_1 + \lambda_{\text{SSIM}} \mathcal{L}_{D\text{-}SSIM} + \mathbf{1}_{\text{novel}} \cdot \lambda_{\text{LPIPS}} \cdot \mathcal{L}_{\text{LPIPS}} + \lambda_{\text{smooth}} \cdot \mathcal{L}_{\text{smooth}} + \lambda_{\text{normal}} \cdot \mathcal{L}_{\text{normal}} $$



Results

Evaluation Metrics

method initial point# PSNR↑ SSIM↑ Training time frame#
3DGS 100000 20.423 0.856 2h 13m 168
2DGS 100000 19.219 0.828 2h 1m 168
2DGS_novel 100000 20.375 0.842 1h 59m 208
Ours_novel 100000 21.605 0.861 2h 6m 208
Ours_novel_loss 100000 21.675 0.862 3h 55m 208


3D Reconstruction Results




Scene Rendering

Render