Spectral Networks and Deep Locally Connected Networks on Graphs
Motivation
Generalize ConvNet to graphs.
Spatial Construction
Consider an undirected weighted graph denoted by where and is a symmetric nonnegative matrix.
Locality via W
The weights in a graph determine a notion of locality like the one in images, audio and text. We can restrict attention to sparse "filters" with receptive fields given by neighborhoods, thus reducing the number of parameters in a filter to where is the average neighborhood size and is the input feature size.
Multiresolution Analysis in Graphs
The authors mentioned that "[f]inding multiscale clusterings that are provably guaranteed to behave well w.r.t. Laplacian on the graph is still an open area of research" (the publication was in 2014).
Deep Locally Connected Networks
The spatial construction starts with a multiscale clustering of the graph. We consider scales. We set , and for each , we define a partition of into clusters; and a collection of neighborhoods around each element of :
With these in hand, we can now define the -th layer of the network as a mapping: , defined as
where:
is the th column of and is the th column of
is a sparse matrix with nonzero entries in the locations given by
outputs the result of a pooling operation over each cluster in .
This construction is illustrated below:
Started with , for all , is constructed by merging all nodes whose pairwise distance (determined by the corresponding entry in ) are less than . and are then obtained with
where is the th node in , is the support of , i.e. .
Spectral Construction
The global structure of the graph can be exploited with the spectrum of its graph-Laplacian to generalize the convolution operator.
Harmonic Analysis on Weighted Graphs
The frequency and smoothness relative to are interrelated through the combinatorial Laplacian or graph Laplacian . For simplicity, the authors simply use the combinatorial Laplacian.
If is an -dimensional vector, a natural definition of the smoothness functional at a node is
This definition is also known as the Dirichlet energy. With this definition, the smoothest vector is a constant:
Note here is also the eigenvector of corresponding to eigenvalue .
Each successive
is an eigenvalue of , and the eigenvalues allow the smoothness of a vector to be read off from the coefficients of in , equivalently as the Fourier coefficients of a signal defined in a grid. The filters are therefore multipliers on the eigenvalues of and reduces the number of parameters of a filter from to .
Two related results are: 1. Functions that are smooth relative to the grid metric have coefficients with quick decay in the basis of eigenvectors of . 2. The eigenvectors of the subsampled Laplacian are the low frequency eigenvectors of .
Extending Convolutions via the Laplacian Spectrum
Theoretical Formulation
Given two functions , a Euclidean domain, the convolution of and is defined to be
where .
A famous result related to convolution is the convolution theorem: , where denotes the fourier transform of a function and denotes the elementwise multiplication.
The fourier transform of is defined by
The description above is for the Euclidean case. To generalize the notion of convolution to non-Euclidean case, one possibility is by using the convolution theorem as a definition.
Since admits on a compact domain an eigendecomposition with a discrete set of orthonormal eigenfunctions and they are the smoothest functions in the sense of the Dirichlet energy, they can be interpreted as a generalization of the standard Fourier basis.
We can then define the convolution of and on a graph as
where is a graph.
For two vectors and , the convolution of and can be calculated by
where . Note that this formulation depends on the choice of .
Practice
Let be the eigenvectors of , ordered by eigenvalue. A convolutional layer transforming node features with , can be defined as
where is a diagonal matrix and is a real-valued nonlinearity. Note with our previous theory the update rule corresponds to filters and parameters.
Often only the first eigenvectors of the Laplacian are useful, which carry the smooth geometry of the graph. We can treat as a hyperparameter and replace by in the equation above, which only keeps the first columns of .
This construction can suffer when the individual high frequency eigenvectors are not meaningful but a cohort of high frequency eigenvectors contain meaningful information. It is also not obvious how to do either the forwardprop or the backprop efficiently while applying the nonlinearity on the space side.
The construction above requires parameters. One possibility to use parameters instead is to employ
where is a fixed qubic spline kernel and spline coefficients. The theory for this part seems to be ambiguous.
Last updated