3DGS Quality

Project Overview

This study proposes a method to enhance the quality of indoor 3D reconstruction based on 3D Gaussian Splatting (3DGS) using Polycam. The approach generates novel view camera poses, improves them with DIFIX, and incorporates geometry-aware loss terms to further refine reconstruction quality. The geometry-aware loss includes a perceptual loss applied only to novel views and normal and depth consistency losses applied to all views. These improvements enhance the accuracy of geometry reconstruction, strengthen multi-view consistency, and reduce artifacts in the reconstructed scenes. Experimental results show that the proposed method increases PSNR from 20.423 to 21.675 and SSIM from 0.856 to 0.862 compared to the original 3DGS.

[iPhone 13 Pro]

→

Download
Raw data

capture-folder/
├── raw.glb
├── thumbnail.jpg
├── polycam.mp4
├── mesh_info.json
└── keyframes/
  ├── images/
  ├── corrected_images/
  ├── cameras/
  ├── corrected_cameras/
  ├── depth/
  └── confidence/
    

raw.glb, corrected_images/, corrected_cameras/ was used for this project.

Method

Novel View Camera Pose

We propose a method to generate novel view camera poses by interpolating training camera poses.

Intermediate positions along the paths between the original viewpoints are calculated to create new viewpoints. This approach enables smooth camera transitions, making rendering changes appear more natural.

① Translation — Linear Interpolation

$$\mathbf{p}(t) = (1 - t)\,\mathbf{p}_a + t\,\mathbf{p}_b,\quad t \in [0,1]$$

② Rotation — Spherical Linear Interpolation

Spherical Linear Interpolation (SLERP) illustration

$$\mathbf{q}(t)=\frac{\sin((1-t)\theta)}{\sin\theta}\,\mathbf{q}_a +\frac{\sin(t\theta)}{\sin\theta}\,\mathbf{q}_b$$

$$where, \theta=\cos^{-1}(\mathbf{q}_a \cdot \mathbf{q}_b)$$

Novel view camera pose frustum visualization

Red: Novel view camera pose
Blue: Original camera pose

Novel view camera poses often contain artifacts; therefore, we enhance their quality using DIFIX , a diffusion-based model. The improved novel views are then used as inputs when retraining 3DGS.

Novel view camera pose vs. DIFIX enhancement

Loss Terms

① Perceptual Loss

The perceptual loss (LPIPS) is applied only to novel views. While DIFIX enhances novel view images by removing artifacts, it can cause smoothing that reduces fine details. LPIPS encourages structural similarity between the enhanced novel view and the target image.

$$\mathcal{L}_{\text{LPIPS}} = \lambda_{\text{LPIPS}} \cdot \text{LPIPS}(I, \hat{I})$$

$$\text{LPIPS}(x, \hat{x}) = \sum_l \frac{1}{H_l W_l} \sum_{h=1}^{H_l} \sum_{w=1}^{W_l} \| w_l \odot (f_l(x)_{h,w} - f_l(\hat{x})_{h,w}) \|_2^2$$

② Depth Smoothness Loss

Encourages the depth map to vary smoothly, producing more consistent surfaces for objects and scenes.

$$\mathcal{L}^w_{\text{smooth}} = \lambda_{\text{smooth}} \left[ \frac{1}{N_x} \sum_{i,j} w_{i,j} \cdot \left| D_{i,j} - D_{i,j+1} \right| + \frac{1}{N_y} \sum_{i,j} w_{i,j} \cdot \left| D_{i,j} - D_{i+1,j} \right| \right]$$

③ Normal Consistency Loss

$$ \mathcal{L}^w_{\text{normal}} = \lambda_{\text{normal}} \left[ \frac{1}{N_x} \sum_{i,j} w_{i,j} \cdot \left\| \mathbf{n}_{i,j} - \mathbf{n}_{i,j+1} \right\|_1 + \frac{1}{N_y} \sum_{i,j} w_{i,j} \cdot \left\| \mathbf{n}_{i,j} - \mathbf{n}_{i+1,j} \right\|_1 \right] $$

Depth & Normal Loss Weight

1. Inverse Depth Normalization

$$ \tilde{D}_{i,j} = \frac{D_{i,j}}{\mathrm{median}(D) + 10^{-6}} $$

2. Near Weight

$$ w_{\mathrm{near},i,j} = \frac{\tilde{D}_{i,j}}{1 + \tilde{D}_{i,j}} $$

3. Far Weight

$$ w_{\mathrm{far},i,j} = \frac{1}{1 + \tilde{D}_{i,j}} $$

4. Edge Aware Gating (RGB on) - Gradient

$$ g_{i,j} = \frac{\left\| I_{:,i,j+1} - I_{:,i,j} \right\|_1 + \left\| I_{:,i+1,j} - I_{:,i,j} \right\|_1}{2} $$

5. Edge Gate

$$ \text{edge\_gate}_{i,j} = \max \left( 0.1, e^{-\gamma g_{i,j}} \right) $$

6. Final Weight

$$ w_{i,j} \leftarrow w_{i,j} \cdot \text{edge\_gate}_{i,j} $$

Final Loss

$$ \mathcal{L}_{\text{total}} = (1 - \lambda) \cdot \mathcal{L}_1 + \lambda_{\text{SSIM}} \mathcal{L}_{D\text{-}SSIM} + \mathbf{1}_{\text{novel}} \cdot \lambda_{\text{LPIPS}} \cdot \mathcal{L}_{\text{LPIPS}} + \lambda_{\text{smooth}} \cdot \mathcal{L}_{\text{smooth}} + \lambda_{\text{normal}} \cdot \mathcal{L}_{\text{normal}} $$

method	initial point#	PSNR↑	SSIM↑	Training time	frame#
3DGS	100000	20.423	0.856	2h 13m	168
2DGS	100000	19.219	0.828	2h 1m	168
2DGS_novel	100000	20.375	0.842	1h 59m	208
Ours_novel	100000	21.605	0.861	2h 6m	208
Ours_novel_loss	100000	21.675	0.862	3h 55m	208

Novel View Pose Synthesis with Geometry-Aware Regularization for Enhanced 3D Gaussian Splatting

Project Overview

Dataset

Pipeline