When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark

Lin, Junhong; Guo, Xiaojie; Zhang, Shuaicheng; Zhu, Yada; Shun, Julian

dc.contributor.author	Lin, Junhong
dc.contributor.author	Guo, Xiaojie
dc.contributor.author	Zhang, Shuaicheng
dc.contributor.author	Zhu, Yada
dc.contributor.author	Shun, Julian
dc.date.accessioned	2025-09-09T19:51:25Z
dc.date.available	2025-09-09T19:51:25Z
dc.date.issued	2025-08-03
dc.identifier.isbn	979-8-4007-1454-2
dc.identifier.uri	https://hdl.handle.net/1721.1/162620
dc.description	KDD ’25, Toronto, ON, Canada	en_US
dc.description.abstract	Graph mining has become crucial in fields such as social science, finance, and cybersecurity. Many large-scale real-world networks exhibit both heterogeneity, where multiple node and edge types exist in the graph, and heterophily, where connected nodes may have dissimilar labels and attributes. However, existing benchmarks primarily focus on either heterophilic homogeneous graphs or homophilic heterogeneous graphs, leaving a significant gap in understanding how models perform on graphs with both heterogeneity and heterophily. To bridge this gap, we introduce H2GB, a large-scale node-classification graph benchmark that brings together the complexities of both the heterophily and heterogeneity properties of real-world graphs. H2GB encompasses 9 real-world datasets spanning 5 diverse domains, 28 baseline models, and a unified benchmarking library with a standardized data loader, evaluator, unified modeling framework, and an extensible framework for reproducibility. We establish a standardized workflow supporting both model selection and development, enabling researchers to easily benchmark graph learning methods. Extensive experiments across 28 baselines reveal that current methods struggle with heterophilic and heterogeneous graphs, underscoring the need for improved approaches. Finally, we present a new variant of the model, H2G-former, developed following our standardized workflow, that excels at this challenging benchmark. Both the benchmark and the framework are publicly available at Github and PyPI, with documentation hosted at https://junhongmit.github.io/H2GB.	en_US
dc.publisher	ACM\|Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2	en_US
dc.relation.isversionof	https://doi.org/10.1145/3711896.3737421	en_US
dc.rights	Creative Commons Attribution	en_US
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en_US
dc.source	Association for Computing Machinery	en_US
dc.title	When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark	en_US
dc.type	Article	en_US
dc.identifier.citation	Junhong Lin, Xiaojie Guo, Shuaicheng Zhang, Yada Zhu, and Julian Shun. 2025. When Heterophily Meets Heterogeneity: Challenges and a New Large-Scale Graph Benchmark. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '25). Association for Computing Machinery, New York, NY, USA, 5607–5618.	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Electrical Engineering and Computer Science	en_US
dc.identifier.mitlicense	PUBLISHER_POLICY
dc.eprint.version	Final published version	en_US
dc.type.uri	http://purl.org/eprint/type/ConferencePaper	en_US
eprint.status	http://purl.org/eprint/status/NonPeerReviewed	en_US
dc.date.updated	2025-09-01T07:51:39Z
dc.language.rfc3066	en
dc.rights.holder	The author(s)
dspace.date.submission	2025-09-01T07:51:40Z
mit.license	PUBLISHER_CC
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 3711896.3737421.pdf
Size:: 6.445Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record