Show simple item record

dc.contributor.advisorMansinghka, Vikash K.
dc.contributor.advisorTenenbaum, Joshua B.
dc.contributor.authorGothoskar, Nishad
dc.date.accessioned2025-11-25T19:37:28Z
dc.date.available2025-11-25T19:37:28Z
dc.date.issued2025-05
dc.date.submitted2025-08-14T19:38:13.360Z
dc.identifier.urihttps://hdl.handle.net/1721.1/164030
dc.description.abstractUnderstanding and interpreting the 3D structure of the world is a central challenge in artificial intelligence. Our physical world is 3D, yet our AI systems often “see” that world through pixels and images. In order to build truly intelligent AI systems, we must go beyond pixels and images and build 3D vision systems that can build meaningful and useful 3D representations of the world. This is the problem of 3D scene perception. How do we transform raw visual input into 3D representations of the world? 3D scene perception has numerous applications from robotics to augmented reality. Despite the advances over the last decade, 3D perception remains a major bottleneck in real-world robotics applications. The challenge stems from the immense variability in real-world conditions, e.g. lighting, color, viewpoint, camera properties, object appearance, the incompleteness of visual data due to limited resolution, noise, and occlusions, and the approximations in our models of visual data. Developing more robust and generalizable 3D perception systems would be an important step towards more general-purpose robotics. In this thesis, we explore a probabilistic architecture for 3D perception based on structured generative models and probabilistic programs. We begin with 3DP3, the first iteration of our approach, which infers 3D scene graphs from real-world depth image data. 3DP3 demonstrates that our method could work on real-world benchmarks and correct commonsense errors from deep learning systems. Building on this foundation, we develop Bayes3D, which scaled up these ideas using a GPU-accelerated image likelihood and generative model alongside a parallel coarse-to-fine inference algorithm. Next, we explore two approaches for incorporating RGB image data into generative 3D graphics programs, expanding their applicability. We then introduce DurableVS, which extends inverse-graphics techniques to model scenes involving a robot and multiple cameras, enabling precise control of a robot. Finally, we present Gen3D, which integrates all the key ideas from this thesis into a real-time 3D perception system that uses multi-resolution probabilistic models of 3D matter to enable real-time tracking that is competitive with vision transformers and 3D Gaussian splatting, state-of-the-art methods in computer vision and computer graphics.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleScaling 3D Scene Perception via Probabilistic Programming
dc.typeThesis
dc.description.degreePh.D.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeDoctoral
thesis.degree.nameDoctor of Philosophy


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record