Notes
  • README
  • Roadmap
  • Graph
    • GraphSAGE
    • DiffPool
    • RRN
    • Relational RL
    • Layerwise Adaptive Sampling
    • Representation Lerning on Graphs: Methods and Applications
    • GAT
    • How Powerful are Graph Neural Networks?
    • Pitfalls of Graph Neural Network Evaluation
    • Spectral Networks and Deep Locally Connected Networks on Graphs
    • Deep Convolutional Networks on Graph-Structured Data
  • Optimizations
    • Neural ODE
  • Tags
Powered by GitBook
On this page
  • GraphSAGE
  • Difference with previous works
  • Difficulty of Inductive Learning
  • Related Works
  • Algorithm
  • Forward Propagation
  • Neighborhood
  • Aggregators
  • Loss (In Unsupervised Setting)
  • Relation to Weisfeiler-Lehman Isomorphism Test
  • Experiments
  • Unsupervised Setting
  • Inductive learning on evolving graphs
  • Generalizing across graphs: PPI
  • Results
  1. Graph

GraphSAGE

PreviousGraphNextDiffPool

Last updated 6 years ago

tags: graph-networks, nips-2017

GraphSAGE

Difference with previous works

  • Instead of training individual embeddings for each node, GraphSAGE learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood.

  • Inductive setting:

    • Previous work focused on embeddings nodes from a single fixed graph.

    • Real-world application requires embeddings to be quickly generated

      for unseen words or entirely new graphs.

    • operate on evolving graphs and constantly encounter unseen nodes.

Difficulty of Inductive Learning

  • Generalize to unseen nodes requires "aligning" newly observed subgraphs to node embeddings that the algorithm has already optimized on. - An inductive framework must learn to recognize structural properties of a node's neighborhood that reveal the node's local role in the graph and its global position.

img

Related Works

  • Factorization-based embedding approaches.

  • Supervised Learning over graphs.

    • Kernel-based approaches.

  • Graph convolutional networks.

Algorithm

Forward Propagation

The forward propagation assumes that the model has already been trained and that the parameters are fixed.

Assume that we have learned the parameters of KKK aggregator functions, which aggregate information from node neighbors, as well as a set of weight matrices Wk\mathbf{W}^kWk.

The final representations output at depth KKK is

zv≡hvK,∀v∈V\mathbf { z } _ { v } \equiv \mathbf { h } _ { v } ^ { K } , \forall v \in \mathcal { V }zv​≡hvK​,∀v∈V

Neighborhood

Instead of using full neighborhood set, they uniformly sample a fixed-size set of neighbors:

N(v)={u∈V:(u,v)∈E}\mathcal{N}(v) = \{u\in \mathcal{V}: (u, v)\in \mathcal{E} \}N(v)={u∈V:(u,v)∈E}

in order to keep the computational footprint of each batch fixed. Per-batch space and time complexity for GraphSAGE is O(∏i=1KSi)O(\prod_{i=1}^{K}S_i)O(∏i=1K​Si​).

Practical setting: K=2K=2K=2, S1⋅S2≤500S_1 \cdot S_2 \leq 500S1​⋅S2​≤500.

Aggregators

Mean aggregator

AGGREGATE⁡kmean=∑u∈N(v)huk−1∣N(v)∣\operatorname { AGGREGATE } _ {k} ^ {\mathrm {mean}} = \sum_{u\in \mathcal{N}(v)}\frac{\mathbf{h}_u^{k-1}}{|\mathcal{N}(v)|}AGGREGATEkmean​=u∈N(v)∑​∣N(v)∣huk−1​​

Convolutional aggregator

hvk←σ(W⋅MEAN⁡({hvk−1}∪{huk−1,∀u∈N(v)})\mathbf { h } _ { v } ^ { k } \leftarrow \sigma \left( \mathbf { W } \cdot \operatorname { MEAN } \left( \left\{ \mathbf { h } _ { v } ^ { k - 1 } \right\} \cup \left\{ \mathbf { h } _ { u } ^ { k - 1 } , \forall u \in \mathcal { N } ( v ) \right\} \right)\right.hvk​←σ(W⋅MEAN({hvk−1​}∪{huk−1​,∀u∈N(v)})
  • Compared to mean aggregator, the convolutional aggregator does not perform the concatenation operation in line 5 of the algorithm, which could be viewed as a skip-connection.

LSTM aggregator

  • Larger expressive capability.

  • Not inherently symmetric(not permutation invariant)

  • They adapt LSTMs to operate on an unordered set by simply applying the LSTMs to a random permutation of the node's neighbors.

Pooling aggregator

AGGREGATE⁡kpool=max⁡({σ(Wpoolhuik+b),∀ui∈N(v)})\operatorname { AGGREGATE } _ { k } ^ { \mathrm { pool } } = \max \left( \left\{ \sigma \left( \mathbf { W } _ { \mathrm { pool } } \mathbf { h } _ { u _ { i } } ^ { k } + \mathbf { b } \right) , \forall u _ { i } \in \mathcal { N } ( v ) \right\} \right)AGGREGATEkpool​=max({σ(Wpool​hui​k​+b),∀ui​∈N(v)})

Loss (In Unsupervised Setting)

The loss function encourages nearby nodes to have similar representations, while enforcing that the representations of disparate nodes are highly distinct.

JG(zu)=−log⁡(σ(zu⊤zv))−Q⋅Evn∼Pn(v)log⁡(σ(−zu⊤zvn))J _ { \mathcal { G } } \left( \mathbf { z } _ { u } \right) = - \log \left( \sigma \left( \mathbf { z } _ { u } ^ { \top } \mathbf { z } _ { v } \right) \right) - Q \cdot \mathbb { E } _ { v _ { n } \sim P _ { n } ( v ) } \log \left( \sigma \left( - \mathbf { z } _ { u } ^ { \top } \mathbf { z } _ { v _ { n } } \right) \right)JG​(zu​)=−log(σ(zu⊤​zv​))−Q⋅Evn​∼Pn​(v)​log(σ(−zu⊤​zvn​​))

vvv is a node that co-occurs near uuu on a fixed-length random walk. QQQ defines the number of negative samples.

Unlike previous embedding approaches, no "context embedding" is required.

Relation to Weisfeiler-Lehman Isomorphism Test

It's trivial to test whether two graphs are isomorphism if all vertices have unique attributes.

The idea of WL test is to assign fingerprints to vertices and their neighborhoods repeatedly. To start with, we assign each node with attribute 111:

  1. Compute a hash of (av,av1,…,avn)\left( a _ { v } , a _ { v _ { 1 } } , \dots , a _ { v _ { n } } \right)(av​,av1​​,…,avn​​) where viv_ivi​ are neighbors of vertex vvv.

  2. Use the hash as attribute for next iteration.

Note that the WL test fails when the graph has some kind of symmetry: if uuu and vvv are symmetric in the graph, WL test would assign they with the same attribute.

WL test in GraphSAGE

The isomorphism test could be viewed as applying a simplified GraphSAGE on graphs:

  1. Set K=∣V∣K = | \mathcal { V } |K=∣V∣

  2. Set the weight matrices as the identity.

  3. Use hash function as the aggregator.

Experiments

  1. Classifying academic papers into different subjects using WOS citation dataset.

  2. Classifying reddit posts as belonging to different communities.

  3. Classifying protein functions across various biological PPI graphs.

Unsupervised Setting

For unsupervised objective, GraphSAGE ran 505050 random walks of length 555 for each node in order to obtain the pairs needed for negative sampling.

To make predictions on the embeddings output from the unsupervised models, GraphSAGE use logistic SGD Classifier.

Inductive learning on evolving graphs

  • Citation

    • The authors train all algorithms on 2000-2004 data and use 2005 data for testing.

    • Label: 6 different field labels.

    • Features: node degrees, sentence embedding.

  • Reddit

    • First 20 days for training and others for testing.

    • Label: community(subreddit)

    • Features: Concatenation of average embedding of post title, average embedding of post's comments, post's score & number of comments.

Generalizing across graphs: PPI

In this experiment the authors conduct experiments on protein-protein interactions, this requires learning about node roles rather then community structure.

Results

img

img
The Weisfeiler-Lehman algorithm and estimation on graphs