Show simple item record

dc.contributor.advisorJegelka, Stefanie
dc.contributor.authorGupta, Sharut
dc.date.accessioned2025-03-27T17:01:08Z
dc.date.available2025-03-27T17:01:08Z
dc.date.issued2025-02
dc.date.submitted2025-03-04T17:28:01.837Z
dc.identifier.urihttps://hdl.handle.net/1721.1/158966
dc.description.abstractThe central promise of deep learning is to learn a map 𝑓 : 𝒳 → ℝ_𝑑 that transforms objects 𝒳—represented in their raw perceptual forms, such as images or molecular strings—into a representation space ℝ_𝑑 where everything that is hard to do with raw perceptual data becomes easy. For instance, measuring the similarity between two objects [scientific notation] expressed as tensors of pixel intensities is non-trivial in their raw form, but becomes straightforward if 𝑓 maps these objects to a space where simple Euclidean distances, β€–𝑓(𝑥₁) − 𝑓(𝑥β‚‚)β€–β‚‚ are meaningful measures of similarity. While this simple recipe has shown standout success in a range of tasks, certain applications require representations that encode richer structural relationships beyond pairwise similarity. For instance, tasks that encode relational information— such as “𝑋 is a parent of 𝑌 ” or “𝐴 is a treatment for 𝐵”—require embedding spaces that capture richer structural relationships. In this thesis, we explore what 𝑓 should encode in order to be useful for a range of unknown downstream tasks, from the point of view of the geometric structure of representation space. We investigate this question in the context of self-supervised learning, a paradigm that extracts meaningful representations by leveraging the structure of the data itself without relying on explicit labels. Specifically, we propose adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations in the embedding space. To this end, we introduce an equivariance objective and theoretically prove that its minima forces transformations on input space to correspond to rotations on the spherical embedding space. Our proposed method significantly improves performance on downstream tasks, and ensures sensitivity in embedding space to important variations in data (e.g., color, rotation) that existing contrastive methods do not achieve.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright retained by author(s)
dc.rights.urihttps://rightsstatements.org/page/InC-EDU/1.0/
dc.titleStructuring Representation Geometry in Self-Supervised Learning
dc.typeThesis
dc.description.degreeS.M.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Science in Electrical Engineering and Computer Science
ο»Ώ

Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record