[2021-01-17] [CS224W] Graph Repesentation Learning

2 minute read

Published:

Network embedding

Task: We map each node in a network into a low-dimensional space Goal: encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.

  1. Define an encoder (i.e., a mapping from nodes to embeddings) \(ENV(v) = z_v\)
  2. Define a node similarity function (i.e., a measure of similarity in the original network)
  3. Optimize the parameters of the encoder so that: \(similarity(u,v) = z_v^Tz_u\)

Random-walk Embeddings

  1. Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R: $P_R(uv)$
  2. Optimize embeddings to encode these random walk statistics: $similarity = cos(\theta) \propto P_R(uv)$

Unsupervised Feature Learning Idea: Learn node embedding such that nearby nodes are close together in the network Given a node u, how do we define nearby nodes? $N_R(U)$ neighbourhood of u obtained by some strategy R

Log-lokelihood objective: \(max_z \sum_{u \in V} log P(N_R(u) | z_u)\) where $N_R(u)$ is the neighborhood of node $u$ by strategy $R$.

For random walk optimization:

  1. Run short fixed-length random walks starting from each node on the graph using some strategy R
  2. For each node u collect $N_R(U)$, the multiset* of nodes visited on random walks starting from u
  3. Optimize embeddings according to: Given node u, predict its neighbors $N_R(U)$ \(max_z \sum_{u \in V} log P(N_R(u) | z_u)\)

\(L = \sum_{u \in V} \sum_{v \in N_R(u)} -log P(v | z_u)\) Parameterize $P(v | z_u)$ using softmax: \(P(v | z_u) = \frac{exp(z_u^Tz_v)}{\sum_{n \in V} exp(z_u^Tz_n)}\) Why softmax? Intuition: $\sum_i exp(x_i) \approx \max_iexp(x_i)$

But it is computationally expensive.

Solution: Negative Sampling \(log(\frac{exp(z_u^Tz_v)}{\sum_{n \in V}exp(z_u^Tz_n)}) \\ \approx log(\sigma(z_u^Tz_v)) - \sum_{i=1}^k log(\sigma(z_u^Tz_{n_i})), ni \sim P_v\) where $\sigma()$ is the sigmoid function

node2vec: Biased Walks

Idea: use flexible, biased random walks that can trade off between local and global views of the network BFS: micro-view of neighbourhood DFS: macro-view of neighbourhood Two parameters:

  • Return parameter p: Return back to the previous node
  • In-out parameter q: Moving outwards (DFS) vs. inwards (BFS)

Algorithm: 1) Compute random walk probabilities 2) Simulate $r$ random walks of length $l$ starting from each node $u$ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity all 4 steps are individually parallelizable

Translating Embeddings for Modeling Multi-relational Data

knowledge graph completion - link prediction