GraphSAGE
Last updated
Last updated
tags: graph-networks, nips-2017
Instead of training individual embeddings for each node, GraphSAGE learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood.
Inductive setting:
Previous work focused on embeddings nodes from a single fixed graph.
Real-world application requires embeddings to be quickly generated
for unseen words or entirely new graphs.
operate on evolving graphs and constantly encounter unseen nodes.
Generalize to unseen nodes requires "aligning" newly observed subgraphs to node embeddings that the algorithm has already optimized on. - An inductive framework must learn to recognize structural properties of a node's neighborhood that reveal the node's local role in the graph and its global position.
Factorization-based embedding approaches.
Supervised Learning over graphs.
Kernel-based approaches.
Graph convolutional networks.
The forward propagation assumes that the model has already been trained and that the parameters are fixed.
Assume that we have learned the parameters of aggregator functions, which aggregate information from node neighbors, as well as a set of weight matrices .
The final representations output at depth is
Instead of using full neighborhood set, they uniformly sample a fixed-size set of neighbors:
in order to keep the computational footprint of each batch fixed. Per-batch space and time complexity for GraphSAGE is .
Practical setting: , .
Compared to mean aggregator, the convolutional aggregator does not perform the concatenation operation in line 5 of the algorithm, which could be viewed as a skip-connection.
Larger expressive capability.
Not inherently symmetric(not permutation invariant)
They adapt LSTMs to operate on an unordered set by simply applying the LSTMs to a random permutation of the node's neighbors.
The loss function encourages nearby nodes to have similar representations, while enforcing that the representations of disparate nodes are highly distinct.
is a node that co-occurs near on a fixed-length random walk. defines the number of negative samples.
Unlike previous embedding approaches, no "context embedding" is required.
It's trivial to test whether two graphs are isomorphism if all vertices have unique attributes.
The idea of WL test is to assign fingerprints to vertices and their neighborhoods repeatedly. To start with, we assign each node with attribute :
Compute a hash of where are neighbors of vertex .
Use the hash as attribute for next iteration.
Note that the WL test fails when the graph has some kind of symmetry: if and are symmetric in the graph, WL test would assign they with the same attribute.
The isomorphism test could be viewed as applying a simplified GraphSAGE on graphs:
Set
Set the weight matrices as the identity.
Use hash function as the aggregator.
Classifying academic papers into different subjects using WOS citation dataset.
Classifying reddit posts as belonging to different communities.
Classifying protein functions across various biological PPI graphs.
For unsupervised objective, GraphSAGE ran random walks of length for each node in order to obtain the pairs needed for negative sampling.
To make predictions on the embeddings output from the unsupervised models, GraphSAGE use logistic SGD Classifier.
Citation
The authors train all algorithms on 2000-2004 data and use 2005 data for testing.
Label: 6 different field labels.
Features: node degrees, sentence embedding.
First 20 days for training and others for testing.
Label: community(subreddit)
Features: Concatenation of average embedding of post title, average embedding of post's comments, post's score & number of comments.
In this experiment the authors conduct experiments on protein-protein interactions, this requires learning about node roles rather then community structure.