GNN Wednesday
GNN Wednesday
GNN Wednesday
● Interesting task to predict is, for example, whether the molecule is a potent drug.
○ Can do binary classification on whether the drug will inhibit certain bacteria. (E.coli)
○ Train on a curated dataset for compounds where response is known.
GNN
Halicin
...Achieve wide acclaim!
Arguably the most popularised success story of graph neural networks to date!
Arguably the most popularised success story of graph neural networks to date!
Arguably the most popularised success story of graph neural networks to date!
Arguably the most popularised success story of graph neural networks to date!
Transportation maps (e.g. the ones found on Google Maps) naturally modelled as graphs.
Nodes could be intersections, and edges could be roads. (Relevant node features: road length,
current speeds, historical speeds)
DeepMind’s ETA Prediction using GNNs in Google Maps
Run GNN on supersegment graph to estimate time of arrival (ETA) (graph regression).
Already deployed in several major cities, significantly reducing negative ETA outcomes!
GNNs are a very hot research topic
github.com/rusty1s/pytorch_geometric graphneural.network
dgl.ai
github.com/deepmind/graph_nets github.com/deepmind/jraph
Rich ecosystem of datasets
ogb.stanford.edu graphlearning.io
https://pytorch-geometric.readthedocs.
io/en/latest/modules/datasets.html
github.com/graphdeeplearning/benchmarking-gnns
2 Talk roadmap
What will we cover today?
● Hopefully I’ve given you a convincing argument for why GNNs are useful to study
○ For more details and applications, please see e.g. my EEML 2020 talk
● My aim for today: provide good blueprints and contexts for studying the field
○ Derive GNNs from first principles
○ Position this in context of several independently-studied derivations of GNNs
■ Often using drastically different mathematical tools…
○ Look to the past: how GNN-like models emerged in historical ML research
○ Look to the present: some immediate lines of research interest
○ Look to the future: how our blueprint generalises beyond graph-structured inputs
● Various contexts of GNN study inspired by Will Hamilton’s GRL Textbook (esp. Chapter 7)
○ https://www.cs.mcgill.ca/~wlh/grl_book/
○ Highly recommended!
● When you feel ready, I highly recommend Aleksa Gordić’s GitHub repository on GATs:
○ https://github.com/gordicaleksa/pytorch-GAT
○ Arguably the most gentle introduction to GNN implementations
3 Towards GNNs
from first
principles
Towards a neural network for graphs
● Locality: neighbouring pixels relate much more strongly than distant ones
● That is, we would like to get the same results for two isomorphic graphs
● For now, assume the graph has no edges (e.g. set of nodes, V).
● It will be useful to think about the operations that change the node order
○ Such operations are known as permutations (there are n! of them)
○ e.g. a permutation (2, 4, 1, 3) means y1 ← x2, y2 ← x4, y3 ← x1, y4 ← x3.
● We want to design functions f(X) over sets that will not depend on the order
● One very generic form is the Deep Sets model (Zaheer et al., NeurIPS’17):
where 𝜓 and 𝜙 are (learnable) functions, e.g. MLPs.
○ The sum aggregation is critical! (other choices possible, e.g. max or avg)
Permutation equivariance
● We may instead seek functions that don’t change the node order
○ i.e. if we permute the nodes, it doesn’t matter if we do it before or after the function!
● Accordingly, we say that f(X) is permutation equivariant if, for all permutation matrices P:
General blueprint for learning on sets
● Equivariance mandates that each node’s row is unchanged by f. That is, we can think of
equivariant set functions as transforming each node input xi into a latent vector hi:
where 𝜓 is any function, applied in isolation to every node. Stacking hi yields H = f(X).
(remark: this is typically as far as we can get with sets, without assuming or inferring additional structure)
5 Learning on
graphs
Learning on graphs
● Further additions (e.g. edge features) are possible but ignored for simplicity.
● The main difference: node permutations now also accordingly act on the edges
Invariance:
Equivariance:
Locality on graphs: neighbourhoods
and define a local function, g, as operating over this multiset: g(xi, XNi).
A recipe for graph neural networks
● Now we can construct permutation equivariant functions, f(X, A), by appropriately applying
the local function, g, over all neighbourhoods:
● To ensure equivariance, we need g to not depend on the order of the vertices in XNi
○ Hence, g should be permutation invariant!
A recipe for graph neural networks, visualised
How to use GNNs?
How to use GNNs?
How to use GNNs?
How to use GNNs?
How to use GNNs?
6 Message
passing on
graphs
What’s in a GNN layer?
● Fortunately, almost all proposed layers can be classified as one of three spatial “flavours”.
The three “flavours” of GNN layers
Convolutional GNN
● To give you a feel of the scale of diversity, I will now survey several prior and concurrent
approaches to graph representation learning + to what extent they map to this blueprint.
● If you’ve read up on graph machine learning before, there’s a good chance you will have
seen at least some of these.
I Node
embedding
techniques
Node embedding techniques
● Some of the earliest “successes” of deep learning on graphs relied on finding good ways to
embed nodes into vectors hu (below: zu) using an encoder function
○ At the time, implemented as a look-up table!
● Redefine the condition from (i, j) ∈ E to i and j co-occur in a (short) random walk.
● Note clear correspondence between node embedding techniques and word embedding
techniques in NLP
○ nodes ~ words
○ random walks ~ sentences
○ “node2vec” ~ “word2vec”
○ The optimisation objectives are near-equal!
(“Strategies for pre-training graph neural networks”; Hu, Liu et al., ICLR’20)
● Speaking of NLP...
II Natural
Language
Processing
Parallels from NLP
● This can be easily computed by studying C([0, 1, 0, 0, 0, …]), which is the shift matrix.
The DFT and the convolution theorem
● Now, as long as we know 𝚽, we can express our convolution using rather than
What we have covered so far
Key idea: we don’t need to know the circulant if we know its eigenvalues! Credits to Michael Bronstein
What about graphs?
● Instead, use the graph Laplacian matrix, L = D - A, where D is the degree matrix.
○ Captures all adjacency properties in mathematically convenient way!
Example Laplacian
Graph Fourier Transform
● Now, to convolve with some feature matrix X, do as follows (the diagonal can be learnable):
Spectral GNNs in practice
● Transformers signal that the input is a sequence of words by using positional embeddings
○ Sines/cosines sampled depending on position
● We can use this idea to run Transformers over general graph structures!
○ Just feed some eigenvectors of the graph Laplacian (columns of 𝚽)
○ See the Graph Transformer from Dwivedi & Bresson (2021)
IV Probabilistic
Graphical
Models
Probabilistic modelling
● We’ve so far used edges in a graph to mean any kind of relation between nodes
● Taking a more probabilistic view, we can treat nodes as random variables, and interpret
edges as dependencies between their distributions.
○ This gives rise to probabilistic graphical models (PGMs)
○ They help us ignore relations between variables when computing joint probabilities
Markov random fields
● One particular PGM of interest here is the Markov random field (MRF).
○ It allows us to decompose the joint into a product of edge potentials
where q is a well-defined density, that is easy to compute and sample (e.g. Gaussian).
● We then obtain the parameters of q by minimising the distance (e.g. KL-divergence) to the
true posterior, KL(𝚷u q(hu) || p(H | X))
● Using variational inference techniques (out of scope), we can iteratively update q, starting
from some initial guess q(0)(h), as follows:
● Using variational inference techniques (out of scope), we can iteratively update q, starting
from some initial guess q(0)(h), as follows:
● Based on this idea, structure2vec (Dai et al., ICML’16) embed mean-field inference within
computations of a GNN.
○ Key difference: in PGMs, we expect potential functions specified and known upfront
○ Here, they are defined implicitly, within the latents of a GNN.
● Recently, there are other approaches that unify GNNs with PGM-like computations:
○ CRF-GNNs (Gao et al., KDD’19)
○ GMNNs (Qu et al., ICML’19)
○ ExpressGNN (Zhang et al., ICLR’20)
○ Tail-GNNs (Spalević et al., ICML’20 GRL+)
V Graph
Isomorphism
Testing
How powerful are Graph Neural Networks?
● Simple but powerful way of distinguishing: pass random hashes of sums along the edges
○ Connection to conv-GNNs spotted very early; e.g. by GCN (Kipf & Welling, ICLR’17)
● Over discrete features, GNNs can only be as powerful as the 1-WL test described before!
● One important condition for maximal power is an injective aggregator (e.g. sum)
● For example, just like 1-WL, GNNs cannot detect closed triangles
○ Augment nodes with randomised/positional features
■ Explored by RP-GNN (Murphy et al., ICML’19) and P-GNN (You et al., ICML’19)
■ See also: Sato et al. (SDM’21)
○ Can also literally count interesting subgraphs (Bouritsas et al., 2020)
● What happens when features are continuous? (real-world apps / latent GNN states)
○ … the proof for injectivity of sum (hence GINs’ expressivity) falls apart
Which is best? Neither.
● In fact, we prove in the PNA paper (Corso, Cavalleri et al., NeurIPS’20) that there isn’t one!
● Early forms can be traced to the early 1990s, often involving DAG structures.
○ Labeling RAAM (Sperduti, NeurIPS’94)
○ Backpropagation through structure (Goller & Küchler, ICNN’96)
○ Adaptive structure processing (Sperduti & Starita, TNN’97; Frasconi et al., TNN’98)
● First proper treatment of generic graph structure processing happens in the 2000s:
○ The GNN framework (Gori et al., IJCNN’05; Scarselli et al., TNN’08)
○ The NN4G framework (Micheli, TNN’09)
● The GNN model of Gori, Scarselli et al. used primarily recurrent-style updates
○ Updated for modern best practices by gated GNNs (Li et al., ICLR’16)
VIII Computational
Chemistry
“Chemistry disrupts ML, not the other way around”
● Important and concurrent GNN development line came from computational chemistry
○ Very relevant to the area, as molecules are naturally modelled as graphs
● “Molecular Graph Networks” (Merkwirth and Lengauer, CIM’05) already propose many
elements commonly found in modern MPNNs
● Lastly, recall (Stokes et al., Cell’20): chemistry is to-this-day a leading outlet for GNNs!
Thank you!
Questions?
petarv@google.com | https://petar-v.com
With many thanks to Will Hamilton, Joan Bruna, Michael Bronstein and Taco Cohen