Efficient Systems for Large-Scale Graph Representation Learning

Huang, Tianhao

dc.contributor.advisor	Devadas, Srinivas
dc.contributor.author	Huang, Tianhao
dc.date.accessioned	2025-11-25T19:37:51Z
dc.date.available	2025-11-25T19:37:51Z
dc.date.issued	2025-05
dc.date.submitted	2025-08-14T19:39:47.058Z
dc.identifier.uri	https://hdl.handle.net/1721.1/164035
dc.description.abstract	Graph representation learning has gained significant traction in critical domains including finance, social networks, and transportation systems due to its successful application to graphstructured data. Graph neural networks (GNNs), which integrate the power of deep learning with graph structures, have emerged as the leading methods in this field, delivering superior performance across diverse graph related tasks. However, training graph neural networks on large-scale datasets encounters scalability challenges on current system architectures. First, the sparse, non-localized structures of real-world graphs lead to inefficiencies in data sampling and movement. This characteristic heavily stresses system input/output (I/O), particularly burdening the peripheral buses during the sampling phase of GNN training. Second, the suboptimal mapping of training procedure to GPU kernels leads to compute inefficiencies, including substantial kernel orchestration overhead and redundant computations. Addressing these challenges requires a comprehensive, full-stack optimization approach that fully leverages hardware capabilities. This thesis presents two complementary works to achieve the goal. The first work, Hanoi, unblocks the data loading bottleneck in out-of-core GNN training by co-designing the sampling algorithms to align with the hierarchical memory organization of commodity hardware. Hanoi drastically reduces I/O traffic to external storage, delivering up to 4.2× speedup over strong baselines with negligible impacts on the model quality. Notably, Hanoi is able to obtain competitive performance close to in-memory training with only a fraction of memory requirements. Building on this foundation, the second work, Joestar, introduces a unified framework for optimized GNN training on GPUs. Joestar adapts the multistage sampling approach from Hanoi to in-memory training which frees CPUs from heavy data loading workloads. Joestar also identifies novel kernel fusion opportunities and formulates better execution schedules by jointly considering the sampling and compute stages. Combined with compiler infrastructure in PyTorch, Joestar achieves state-of-the-art GNN training throughputs for billion-edge graph datasets on a single GPU.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright retained by author(s)
dc.rights.uri	https://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Efficient Systems for Large-Scale Graph Representation Learning
dc.type	Thesis
dc.description.degree	Ph.D.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Doctoral
thesis.degree.name	Doctor of Philosophy

Files in this item

Name:: huang-tianhaoh-phd-eecs-2025-t ...
Size:: 2.792Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Doctoral Theses

Show simple item record