[2021-01-17] [CS224W] Graph Repesentation Learning

2 minute read

Published: January 17, 2021

Network embedding

Task: We map each node in a network into a low-dimensional space Goal: encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.

Define an encoder (i.e., a mapping from nodes to embeddings) $ENV(v) = z_v$
Define a node similarity function (i.e., a measure of similarity in the original network)
Optimize the parameters of the encoder so that: $similarity(u,v) = z_v^Tz_u$

Random-walk Embeddings

Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R: $P_R(u v)$
Optimize embeddings to encode these random walk statistics: $similarity = cos(\theta) \propto P_R(u v)$

Unsupervised Feature Learning Idea: Learn node embedding such that nearby nodes are close together in the network Given a node u, how do we define nearby nodes? $N_R(U)$ neighbourhood of u obtained by some strategy R

Log-lokelihood objective: $max_z \sum_{u \in V} log P(N_R(u) | z_u)$ where $N_R(u)$ is the neighborhood of node $u$ by strategy $R$.

For random walk optimization:

Run short fixed-length random walks starting from each node on the graph using some strategy R
For each node u collect $N_R(U)$, the multiset* of nodes visited on random walks starting from u
Optimize embeddings according to: Given node u, predict its neighbors $N_R(U)$ $max_z \sum_{u \in V} log P(N_R(u) | z_u)$

$L = \sum_{u \in V} \sum_{v \in N_R(u)} -log P(v | z_u)$ Parameterize $P(v | z_u)$ using softmax: $P(v | z_u) = \frac{exp(z_u^Tz_v)}{\sum_{n \in V} exp(z_u^Tz_n)}$ Why softmax? Intuition: $\sum_i exp(x_i) \approx \max_iexp(x_i)$

But it is computationally expensive.

Solution: Negative Sampling $log(\frac{exp(z_u^Tz_v)}{\sum_{n \in V}exp(z_u^Tz_n)}) \\ \approx log(\sigma(z_u^Tz_v)) - \sum_{i=1}^k log(\sigma(z_u^Tz_{n_i})), ni \sim P_v$ where $\sigma()$ is the sigmoid function

node2vec: Biased Walks

Idea: use flexible, biased random walks that can trade off between local and global views of the network BFS: micro-view of neighbourhood DFS: macro-view of neighbourhood Two parameters:

Return parameter p: Return back to the previous node
In-out parameter q: Moving outwards (DFS) vs. inwards (BFS)

Algorithm: 1) Compute random walk probabilities 2) Simulate $r$ random walks of length $l$ starting from each node $u$ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity all 4 steps are individually parallelizable

Translating Embeddings for Modeling Multi-relational Data

knowledge graph completion - link prediction

Share on

Twitter Facebook LinkedIn

Yunjie Pan

[2021-01-17] [CS224W] Graph Repesentation Learning

Network embedding

Random-walk Embeddings

node2vec: Biased Walks

Translating Embeddings for Modeling Multi-relational Data

Share on

You May Also Enjoy

[2021-10-12] ISCA 2021 Section 3A: Machine Learning 1

RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference

[2021-09-24] Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product

[2021-04-20] DiAG: A Dataflow-Inspired Architecture for General-Purpose Processors

[2021-02-20] Analyzing and Mitigating Data Stalls in DNN Training

Background