MIT Libraries logoDSpace@MIT

MIT
View Item 
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
  • DSpace@MIT Home
  • MIT Libraries
  • MIT Theses
  • Doctoral Theses
  • View Item
JavaScript is disabled for your browser. Some features of this site may not work without it.

Efficient Systems for Large-Scale Graph Representation Learning

Author(s)
Huang, Tianhao
Thumbnail
DownloadThesis PDF (2.792Mb)
Advisor
Devadas, Srinivas
Terms of use
In Copyright - Educational Use Permitted Copyright retained by author(s) https://rightsstatements.org/page/InC-EDU/1.0/
Metadata
Show full item record
Abstract
Graph representation learning has gained significant traction in critical domains including finance, social networks, and transportation systems due to its successful application to graphstructured data. Graph neural networks (GNNs), which integrate the power of deep learning with graph structures, have emerged as the leading methods in this field, delivering superior performance across diverse graph related tasks. However, training graph neural networks on large-scale datasets encounters scalability challenges on current system architectures. First, the sparse, non-localized structures of real-world graphs lead to inefficiencies in data sampling and movement. This characteristic heavily stresses system input/output (I/O), particularly burdening the peripheral buses during the sampling phase of GNN training. Second, the suboptimal mapping of training procedure to GPU kernels leads to compute inefficiencies, including substantial kernel orchestration overhead and redundant computations. Addressing these challenges requires a comprehensive, full-stack optimization approach that fully leverages hardware capabilities. This thesis presents two complementary works to achieve the goal. The first work, Hanoi, unblocks the data loading bottleneck in out-of-core GNN training by co-designing the sampling algorithms to align with the hierarchical memory organization of commodity hardware. Hanoi drastically reduces I/O traffic to external storage, delivering up to 4.2× speedup over strong baselines with negligible impacts on the model quality. Notably, Hanoi is able to obtain competitive performance close to in-memory training with only a fraction of memory requirements. Building on this foundation, the second work, Joestar, introduces a unified framework for optimized GNN training on GPUs. Joestar adapts the multistage sampling approach from Hanoi to in-memory training which frees CPUs from heavy data loading workloads. Joestar also identifies novel kernel fusion opportunities and formulates better execution schedules by jointly considering the sampling and compute stages. Combined with compiler infrastructure in PyTorch, Joestar achieves state-of-the-art GNN training throughputs for billion-edge graph datasets on a single GPU.
Date issued
2025-05
URI
https://hdl.handle.net/1721.1/164035
Department
Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
Publisher
Massachusetts Institute of Technology

Collections
  • Doctoral Theses

Browse

All of DSpaceCommunities & CollectionsBy Issue DateAuthorsTitlesSubjectsThis CollectionBy Issue DateAuthorsTitlesSubjects

My Account

Login

Statistics

OA StatisticsStatistics by CountryStatistics by Department
MIT Libraries
PrivacyPermissionsAccessibilityContact us
MIT
Content created by the MIT Libraries, CC BY-NC unless otherwise noted. Notify us about copyright concerns.