Memory-Efficient Voxelized Renderable Neural 3D Spatial Representation for Vision-Based Robotics

Overview

In this paper, we introduce a novel approach for modeling a memory-efficient spatial representation with 3D Gaussian splatting. The proposed method, named 3DSR, is an efficient voxelized renderable neural 3D spatial representation that utilizes 3D Gaussian splatting. 3DSR leverages the strengths of both voxelization (memory efficiency) and 3D Gaussian splatting (high-quality image reconstruction).

Method Overview

Overview of the proposed method. The primary objective of the 3DSR approach is to produce a memory-efficient 3D Gaussian splatting representation suitable for robotic applications. The method comprises two main steps: base 3D Gaussians creation and image upsampling. We assume that the original dense 3D Gaussian splatting representation is given beforehand. First, the 3D Gaussian splatting-based point clouds are downsampled through voxelization to significantly reduce memory usage. This voxelized representation, referred to as the ‘base 3D Gaussians’, serves as the foundation for rendering coarse images. When a 6-DoF pose is provided, a coarse image is rendered from the base 3D Gaussians. Subsequently, the coarse image is upsampled to its original resolution and quality using an upsampling network, which is trained offline.

Upsampling Comparisons

Rendering novel view images from base 3D Gaussians produces a coarse depiction of the scene, as the base 3D Gaussians lacks detailed 3D Gaussian points necessary for creating high-quality renderings. To address this, we incorporate an upsampling neural network layer to reconstruct the high-quality images. Unlike existing super-resolution methods, which typically target blurred or low-resolution images, our approach focuses on low-quality rendered images that retain coarse detail. To address this, we generate our own dataset consisting of low-quality rendered images from the voxelized base 3D Gaussians Mb and their corresponding high-quality ground-truth images, and train the network specifically for our purpose. As illustrated, the low-quality images exhibit sphere-like renderings due to the coarse representation of the base 3D Gaussians. Additionally, these images are rendered at a low resolution of the ground-truth images.

After Upsample

Before

After Upsample

Before

After Upsample

Before

After Upsample

Before

BibTeX

@inproceedings{jun20253dsr,
    author  = {Jun, Howoong and Ha, Seongbo and Lee, Jaewon and Yu, Hyeonwoo and Oh, Songhwai},
    title   = {Memory Efficient Voxelized Renderable Neural 3D Spatial Representation for Vision-Based Robotics},
    journal={IEEE Robotics and Automation Letters},
    year    = {2025},
    publisher = {IEEE}
}

Memory-Efficient Voxelized Renderable Neural 3D Spatial Representation for Vision-Based Robotics

RA-L 2026

Overview

Method Overview

Upsampling Comparisons

BibTeX