Scaling 3D Scene Perception via Probabilistic Programming

Gothoskar, Nishad

dc.contributor.advisor	Mansinghka, Vikash K.
dc.contributor.advisor	Tenenbaum, Joshua B.
dc.contributor.author	Gothoskar, Nishad
dc.date.accessioned	2025-11-25T19:37:28Z
dc.date.available	2025-11-25T19:37:28Z
dc.date.issued	2025-05
dc.date.submitted	2025-08-14T19:38:13.360Z
dc.identifier.uri	https://hdl.handle.net/1721.1/164030
dc.description.abstract	Understanding and interpreting the 3D structure of the world is a central challenge in artificial intelligence. Our physical world is 3D, yet our AI systems often “see” that world through pixels and images. In order to build truly intelligent AI systems, we must go beyond pixels and images and build 3D vision systems that can build meaningful and useful 3D representations of the world. This is the problem of 3D scene perception. How do we transform raw visual input into 3D representations of the world? 3D scene perception has numerous applications from robotics to augmented reality. Despite the advances over the last decade, 3D perception remains a major bottleneck in real-world robotics applications. The challenge stems from the immense variability in real-world conditions, e.g. lighting, color, viewpoint, camera properties, object appearance, the incompleteness of visual data due to limited resolution, noise, and occlusions, and the approximations in our models of visual data. Developing more robust and generalizable 3D perception systems would be an important step towards more general-purpose robotics. In this thesis, we explore a probabilistic architecture for 3D perception based on structured generative models and probabilistic programs. We begin with 3DP3, the first iteration of our approach, which infers 3D scene graphs from real-world depth image data. 3DP3 demonstrates that our method could work on real-world benchmarks and correct commonsense errors from deep learning systems. Building on this foundation, we develop Bayes3D, which scaled up these ideas using a GPU-accelerated image likelihood and generative model alongside a parallel coarse-to-fine inference algorithm. Next, we explore two approaches for incorporating RGB image data into generative 3D graphics programs, expanding their applicability. We then introduce DurableVS, which extends inverse-graphics techniques to model scenes involving a robot and multiple cameras, enabling precise control of a robot. Finally, we present Gen3D, which integrates all the key ideas from this thesis into a real-time 3D perception system that uses multi-resolution probabilistic models of 3D matter to enable real-time tracking that is competitive with vision transformers and 3D Gaussian splatting, state-of-the-art methods in computer vision and computer graphics.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Scaling 3D Scene Perception via Probabilistic Programming
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: gothoskar-nishadg-phd-eecs-202 ...
Size:: 124.9Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record