Paper Abstracts
Neural Stereo (Rafael)
- RomniStereo: Recurrent Omnidirectional Stereo Matching [Helvipad]
Omnidirectional stereo matching (OSM) is an essential and reliable
means for 360◦ depth sensing. However, prior methods rely on a
3D encoder-decoder to regularize the cost volume.
Recently, the Recurrent All-pairs Field Transforms (RAFT)
approach employs has efficiently improved image-matching tasks,
i.e., optical flow, and stereo matching.
To bridge the gap between OSM and RAFT, we propose an adaptive
weighting scheme to transform the outputs of spherical sweeping
of OSM into the inputs for the recurrent update, thus creating a
recurrent omnidirectional stereo matching (RomniStereo) algorithm.
Neural Splat (Daniel / Davi)
- Splatter-360: Generalizable 360 Gaussian Splatting for Wide-baseline PERF
Wide-baseline panoramic images are frequently used in
applications like VR and simulations. However, synthesizing
novel views from these panoramic images in real time
remains a significant challenge.
Although existing 3D Gaussian splatting (3DGS) methods can
produce photo-realistic views under narrow baselines, they
often overfit the training views when dealing with wide-
baseline panoramic images due to the difficulty in learning
precise geometry from sparse 360◦ views. This paper presents
Splatter-360, a novel end-to-end generalizable 3DGS framework
designed to handle wide-baseline panoramic images.
Splatter-360 performs multi-view matching directly in the spherical
domain by constructing a spherical cost volume through a
spherical sweep algorithm, enhancing the network’s depth
perception and geometry estimation.
- 3DGUT: Enabling Distorted Cameras and Secondary Rays in Gaussian Splatting
3D Gaussian Splatting (3DGS) enables efficient reconstruction.
However, due to its rasterization based formulation, 3DGS is
constrained to ideal pinhole cameras and lacks support for secondary
lighting effects. Recent methods address these limitations by tracing
the particles instead, at the cost of significantly slower rendering.
In this work, we propose 3D Gaussian Unscented Transform (3DGUT),
replacing the EWA splatting formulation with the Unscented Transform
that approximates the particles through sigma points, which can be
projected exactly under any nonlinear projection function.
This modification enables trivial support of distorted cameras
with time dependent effects such as rolling shutter, while
retaining the efficiency of rasterization. Additionally, our
rendering formulation enables secondary ray tracing required
to represent phenomena such as reflections and refraction within
the same 3D representation
Neural CNNs (Leo)
- DarSwin-Unet: Distortion Aware Encoder-Decoder Architecture
Wide-angle fisheye images are becoming increasingly
common for perception tasks in applications such as robotics,
security, and mobility (e.g. drones, avionics). However, cur-
rent models often either ignore the distortions in wide-angle
images or are not suitable to perform pixel-level tasks.
In this paper, we present an encoder-decoder model based on a
radial transformer architecture that adapts to distortions in
wide-angle lenses by leveraging the physical characteristics
defined by the radial distortion profile. We introduce a U-Net
architecture, DarSwin-Unet, designed for pixel level tasks.
We demonstrate its performance on depth estimation and show
that DarSwin-Unet can perform zero-shot adaptation to un-
seen distortions of different wide-angle lenses.
Neural Interiors (Bruno)
- Pano2Room: Novel View Synthesis from a Single Indoor Panorama
Recent single-view 3D generative methods have made significant
advancements by leveraging knowledge distilled from extensive 3D
object datasets. However, challenges persist in the synthesis of 3D
scenes from a single view. In this paper, we introduce a novel approach
Pano2Room, designed to automatically reconstruct high-quality 3D
indoor scenes from a single panoramic image. These panoramic images
can be easily generated using a panoramic RGBD inpainter from captures
at a single location with any camera. The key idea is to initially
construct a preliminary mesh from the input panorama, and iteratively
refine this mesh using a panoramic RGBD inpainter while collecting
photo-realistic 3D-consistent pseudo novel views. Finally, the refined
mesh is converted into a 3D Gaussian Splatting field and trained with
the collected pseudo novel views.
Neural Diffusion (Enylton)
- Matrix-3D: Omnidirectional Explorable 3D World Generationg
Explorable 3D world generation from a single image or text prompt forms
a cornerstone of spatial intelligence. Recent works utilize video to achieve
wide-scope and generalizable 3D world generation. However, existing approaches
often suffer from a limited scope in the generated scenes. Here, we propose
Matrix-3D, a framework that utilize panoramic representation for wide-coverage
omnidirectional explorable 3D world generation that combines conditional video
generation and panoramic 3D reconstruction. We first train a trajectory-guided
panoramic video diffusion model that employs scene mesh renders as condition, to
enable high-quality and geometrically consistent scene video generation. To lift
the panorama scene video to 3D world, we propose two separate methods: (1) a
feed-forward large panorama reconstruction model for rapid 3D scene reconstruction
and (2) an optimization-based pipeline for accurate and detailed 3D scene
reconstruction.
Copyright ©
Luiz Velho