Structuring Representation Geometry in Self-Supervised Learning

Gupta, Sharut

dc.contributor.advisor	Jegelka, Stefanie
dc.contributor.author	Gupta, Sharut
dc.date.accessioned	2025-03-27T17:01:08Z
dc.date.available	2025-03-27T17:01:08Z
dc.date.issued	2025-02
dc.date.submitted	2025-03-04T17:28:01.837Z
dc.identifier.uri	https://hdl.handle.net/1721.1/158966
dc.description.abstract	The central promise of deep learning is to learn a map 𝑓 : 𝒳 → ℝ_𝑑 that transforms objects 𝒳—represented in their raw perceptual forms, such as images or molecular strings—into a representation space ℝ_𝑑 where everything that is hard to do with raw perceptual data becomes easy. For instance, measuring the similarity between two objects [scientific notation] expressed as tensors of pixel intensities is non-trivial in their raw form, but becomes straightforward if 𝑓 maps these objects to a space where simple Euclidean distances, ‖𝑓(𝑥₁) − 𝑓(𝑥₂)‖₂ are meaningful measures of similarity. While this simple recipe has shown standout success in a range of tasks, certain applications require representations that encode richer structural relationships beyond pairwise similarity. For instance, tasks that encode relational information— such as “𝑋 is a parent of 𝑌 ” or “𝐴 is a treatment for 𝐵”—require embedding spaces that capture richer structural relationships. In this thesis, we explore what 𝑓 should encode in order to be useful for a range of unknown downstream tasks, from the point of view of the geometric structure of representation space. We investigate this question in the context of self-supervised learning, a paradigm that extracts meaningful representations by leveraging the structure of the data itself without relying on explicit labels. Specifically, we propose adding additional geometric structure to the embedding space by enforcing transformations of input space to correspond to simple (i.e., linear) transformations in the embedding space. To this end, we introduce an equivariance objective and theoretically prove that its minima forces transformations on input space to correspond to rotations on the spherical embedding space. Our proposed method significantly improves performance on downstream tasks, and ensures sensitivity in embedding space to important variations in data (e.g., color, rotation) that existing contrastive methods do not achieve.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Structuring Representation Geometry in Self-Supervised Learning
dc.type	Thesis
dc.description.degree	S.M.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Science in Electrical Engineering and Computer Science

Files in this item

Name:: gupta-sharut-sm-eecs-2025-thes ...
Size:: 28.48Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record