Heterogeneous Directed Hypergraph Neural Network over abstract syntax tree (AST) for Code Classification

by   Guang Yang, et al.

Code classification is a difficult issue in program understanding and automatic coding. Due to the elusive syntax and complicated semantics in programs, most existing studies use techniques based on abstract syntax tree (AST) and graph neural network (GNN) to create code representations for code classification. These techniques utilize the structure and semantic information of the code, but they only take into account pairwise associations and neglect the high-order correlations that already exist between nodes in the AST, which may result in the loss of code structural information. On the other hand, while a general hypergraph can encode high-order data correlations, it is homogeneous and undirected which will result in a lack of semantic and structural information such as node types, edge types, and directions between child nodes and parent nodes when modeling AST. In this study, we propose to represent AST as a heterogeneous directed hypergraph (HDHG) and process the graph by heterogeneous directed hypergraph neural network (HDHGN) for code classification. Our method improves code understanding and can represent high-order data correlations beyond paired interactions. We assess heterogeneous directed hypergraph neural network (HDHGN) on public datasets of Python and Java programs. Our method outperforms previous AST-based and GNN-based methods, which demonstrates the capability of our model.


SGAT: Simplicial Graph Attention Network

Heterogeneous graphs have multiple node and edge types and are semantica...

Scientific Paper Classification Based on Graph Neural Network with Hypergraph Self-attention Mechanism

The number of scientific papers has increased rapidly in recent years. H...

Directed hypergraph neural network

To deal with irregular data structure, graph convolution neural networks...

Exploring Representation of Horn Clauses using GNNs (technique report)

Learning program semantics from raw source code is challenging due to th...

Detecting Code Clones with Graph Neural Networkand Flow-Augmented Abstract Syntax Tree

Code clones are semantically similar code fragments pairs that are synta...

Learning from Heterogeneity: A Dynamic Learning Framework for Hypergraphs

Graph neural network (GNN) has gained increasing popularity in recent ye...

Supervised Hypergraph Reconstruction

We study an issue commonly seen with graph data analysis: many real-worl...

Please sign up or login with your details

Forgot password? Click here to reset