Show simple item record

dc.contributor.advisorLieberman, Zachary
dc.contributor.authorGosalia, Mehek
dc.date.accessioned2026-01-29T15:05:53Z
dc.date.available2026-01-29T15:05:53Z
dc.date.issued2025-09
dc.date.submitted2025-09-15T14:56:24.960Z
dc.identifier.urihttps://hdl.handle.net/1721.1/164649
dc.description.abstractThis work introduces a novel pipeline for scene reconstruction that jointly prioritizes semantic accuracy and visual fidelity, addressing a gap in current approaches. Prior pipelines often emphasize either semantic analysis or photorealistic rendering, but rarely both. This method combines scene analysis, segmentation, and retexturing to yield reconstructions that preserve structural semantics, while convincingly reflecting the visual qualities of the original image. The motivation lies in the limitations of existing systems. Existing databaseassisted approaches depend on proprietary datasets that restrict stylistic diversity or using in-the-wild assets. This constrains expressiveness and often produces results that are visually misaligned. Conversely, pipelines optimized for visual realism neglect semantic correctness, generating outputs that may appear plausible but lack categorical or structural grounding. Our framework addresses this by first enforcing semantic accuracy via selecting database assets, then editing those assets to be stylistically faithful to the reference, producing reconstructions that are both interpretable and expressive. We begin with database-assisted scene analysis, using an open-source asset database containing chairs, lamps, sofas, tables, and benches. Input images are depth-mapped, segmented, and parsed into object masks, which are matched to database assets based on semantic labels and visual correspondence. Each asset is broken into semantic segments and rescaled per-component using vision-language model predictions to match the reference object better. Finally the asset is retextured based on the image mask of the reference object in the input image. Evaluation on six diverse scenes—both photographs and artworks—shows the pipeline produces semantically grounded, visually accurate reconstructions under non-research conditions. Future work will focus on expanding the asset database, reducing reliance on proprietary texturing, and releasing an open-source implementation to broaden accessibility.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleVisually Accurate Database-Enabled Reconstructions of Scenes (VADERS)
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record