[2021-01-17] [CS224W] Graph Repesentation Learning
Published:
Network embedding
Task: We map each node in a network into a low-dimensional space Goal: encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.
- Define an encoder (i.e., a mapping from nodes to embeddings) \(ENV(v) = z_v\)
- Define a node similarity function (i.e., a measure of similarity in the original network)
- Optimize the parameters of the encoder so that: \(similarity(u,v) = z_v^Tz_u\)
Random-walk Embeddings
Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R: $P_R(u v)$ Optimize embeddings to encode these random walk statistics: $similarity = cos(\theta) \propto P_R(u v)$
Unsupervised Feature Learning Idea: Learn node embedding such that nearby nodes are close together in the network Given a node u, how do we define nearby nodes? $N_R(U)$ neighbourhood of u obtained by some strategy R
Log-lokelihood objective: \(max_z \sum_{u \in V} log P(N_R(u) | z_u)\) where $N_R(u)$ is the neighborhood of node $u$ by strategy $R$.
For random walk optimization:
- Run short fixed-length random walks starting from each node on the graph using some strategy R
- For each node u collect $N_R(U)$, the multiset* of nodes visited on random walks starting from u
- Optimize embeddings according to: Given node u, predict its neighbors $N_R(U)$ \(max_z \sum_{u \in V} log P(N_R(u) | z_u)\)
\(L = \sum_{u \in V} \sum_{v \in N_R(u)} -log P(v | z_u)\) Parameterize $P(v | z_u)$ using softmax: \(P(v | z_u) = \frac{exp(z_u^Tz_v)}{\sum_{n \in V} exp(z_u^Tz_n)}\) Why softmax? Intuition: $\sum_i exp(x_i) \approx \max_iexp(x_i)$
But it is computationally expensive.
Solution: Negative Sampling \(log(\frac{exp(z_u^Tz_v)}{\sum_{n \in V}exp(z_u^Tz_n)}) \\ \approx log(\sigma(z_u^Tz_v)) - \sum_{i=1}^k log(\sigma(z_u^Tz_{n_i})), ni \sim P_v\) where $\sigma()$ is the sigmoid function
node2vec: Biased Walks
Idea: use flexible, biased random walks that can trade off between local and global views of the network BFS: micro-view of neighbourhood DFS: macro-view of neighbourhood Two parameters:
- Return parameter p: Return back to the previous node
- In-out parameter q: Moving outwards (DFS) vs. inwards (BFS)
Algorithm: 1) Compute random walk probabilities 2) Simulate $r$ random walks of length $l$ starting from each node $u$ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity all 4 steps are individually parallelizable
Translating Embeddings for Modeling Multi-relational Data
knowledge graph completion - link prediction