Show simple item record

dc.contributor.advisorLeiserson, Charles E.
dc.contributor.advisorKaler, Timothy
dc.contributor.advisorIliopoulos, Alexandros-Stavros
dc.contributor.authorAlkhatib, Obada
dc.date.accessioned2026-01-12T19:40:26Z
dc.date.available2026-01-12T19:40:26Z
dc.date.issued2022-09
dc.date.submitted2022-09-16T20:24:03.030Z
dc.identifier.urihttps://hdl.handle.net/1721.1/164495
dc.description.abstractGraph neural networks (GNNs) have become a commonly used class of machine learning models that achieve state-of-the-art performance in various applications. A prevalent and effective approach for applying GNNs on large datasets involves mini-batch training with sampled neighborhoods. Numerous sampling algorithms have emerged, some tailored for specific GNN applications. In this thesis, I explore ways to improve the efficiency and expressivity of existing and emerging sampling schemes. First, I explore system solutions to facilitate the development of fast implementations of different sampling methods. I introduce FlexSample, a system for efficiently incorporating custom sampling algorithms into GNN training. FlexSample leverages the types of performance optimizations found in SALIENT, a state-of-the-art system for fast training of GNNs with node-wise sampling. In experiments with 4 GNN models which use layer-wise and subgraph sampling, FlexSample achieves up to 1.3× speed-up for end-to-end training over PyTorch Geometric with the same sampling code. Furthermore, FlexSample extends SALIENT with highly-optimized C++ implementations of FastGCN and LADIES layer-wise sampling, which achieve 2×–5× speed-up over their respective Python implementations. Second, I introduce a novel framework for learning neighbor sampling distributions as part of GNN training. Key components of this framework, which I name PertinenceSample, are: (i) a differentiable approximation of node-wise sampling for GNNs; and (ii) a parametrization of node sampling distributions as node- or edge-wise weights of attention-like GNN layers. I present an initial exploration of the potential of PertinenceSample for improving node classification accuracy in the presence of noisy edges. Specifically, in two synthetic experiments where roughly half of a node’s neighbors may have similar features but different labels, I demonstrate that extending a GraphSAGE model with a 2-layer perceptron for learning the PertinenceSample weights can improve classification accuracy from 50%–75% to (nearly) 100%.
dc.publisherMassachusetts Institute of Technology
dc.rightsIn Copyright - Educational Use Permitted
dc.rightsCopyright MIT
dc.rights.urihttp://rightsstatements.org/page/InC-EDU/1.0/
dc.titleSampling Methods for Fast and Versatile GNN Training
dc.typeThesis
dc.description.degreeM.Eng.
dc.contributor.departmentMassachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degreeMaster
thesis.degree.nameMaster of Engineering in Electrical Engineering and Computer Science


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record