Always be Pre-Training: Representation Learning for Network Intrusion Detection with GNNs

Zhengyao Gu1, Diego Lopez1, Lilas Alrahis2, Ozgur Sinanoglu2
1New York University, 2New York University Abu Dhabi


Graph Neural Network (GNN)-based Network Intrusion Detection Systems (NIDS) have recently demonstrated state-of-the-art performance on benchmark datasets. Nevertheless, these methods suffer from a reliance on target encoding for data pre-processing, limiting widespread adoption due to the associated need for annotated labels—a cost-prohibitive requirement. In this work, we first summarize related work on GNN-based NIDS, discussing their limitations. Moreover, we propose a solution involving in-context pre-training and the utilization of dense representations for categorical features to jointly overcome the label-dependency limitation. Our approach exhibits remarkable data efficiency, achieving over 98% of the performance of the supervised state-of-the-art with less than 4% labeled data on the NF-UQ-NIDS-V2 dataset. Furthermore, we also shed light on the avenues for future research in this direction.