Sampling Methods for Fast and Versatile GNN Training

Alkhatib, Obada

dc.contributor.advisor	Leiserson, Charles E.
dc.contributor.advisor	Kaler, Timothy
dc.contributor.advisor	Iliopoulos, Alexandros-Stavros
dc.contributor.author	Alkhatib, Obada
dc.date.accessioned	2026-01-12T19:40:26Z
dc.date.available	2026-01-12T19:40:26Z
dc.date.issued	2022-09
dc.date.submitted	2022-09-16T20:24:03.030Z
dc.identifier.uri	https://hdl.handle.net/1721.1/164495
dc.description.abstract	Graph neural networks (GNNs) have become a commonly used class of machine learning models that achieve state-of-the-art performance in various applications. A prevalent and effective approach for applying GNNs on large datasets involves mini-batch training with sampled neighborhoods. Numerous sampling algorithms have emerged, some tailored for specific GNN applications. In this thesis, I explore ways to improve the efficiency and expressivity of existing and emerging sampling schemes. First, I explore system solutions to facilitate the development of fast implementations of different sampling methods. I introduce FlexSample, a system for efficiently incorporating custom sampling algorithms into GNN training. FlexSample leverages the types of performance optimizations found in SALIENT, a state-of-the-art system for fast training of GNNs with node-wise sampling. In experiments with 4 GNN models which use layer-wise and subgraph sampling, FlexSample achieves up to 1.3× speed-up for end-to-end training over PyTorch Geometric with the same sampling code. Furthermore, FlexSample extends SALIENT with highly-optimized C++ implementations of FastGCN and LADIES layer-wise sampling, which achieve 2×–5× speed-up over their respective Python implementations. Second, I introduce a novel framework for learning neighbor sampling distributions as part of GNN training. Key components of this framework, which I name PertinenceSample, are: (i) a differentiable approximation of node-wise sampling for GNNs; and (ii) a parametrization of node sampling distributions as node- or edge-wise weights of attention-like GNN layers. I present an initial exploration of the potential of PertinenceSample for improving node classification accuracy in the presence of noisy edges. Specifically, in two synthetic experiments where roughly half of a node’s neighbors may have similar features but different labels, I demonstrate that extending a GraphSAGE model with a 2-layer perceptron for learning the PertinenceSample weights can improve classification accuracy from 50%–75% to (nearly) 100%.
dc.publisher	Massachusetts Institute of Technology
dc.rights	In Copyright - Educational Use Permitted
dc.rights	Copyright MIT
dc.rights.uri	http://rightsstatements.org/page/InC-EDU/1.0/
dc.title	Sampling Methods for Fast and Versatile GNN Training
dc.type	Thesis
dc.description.degree	M.Eng.
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science
mit.thesis.degree	Master
thesis.degree.name	Master of Engineering in Electrical Engineering and Computer Science

Files in this item

Name:: Alkhatib-obada-meng-eecs-2022- ...
Size:: 3.723Mb
Format:: PDF
Description:: Thesis PDF

View/Open

This item appears in the following Collection(s)

Graduate Theses

Show simple item record