2 minute read

Published:

# Network embedding

Task: We map each node in a network into a low-dimensional space Goal: encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.

1. Define an encoder (i.e., a mapping from nodes to embeddings) $ENV(v) = z_v$
2. Define a node similarity function (i.e., a measure of similarity in the original network)
3. Optimize the parameters of the encoder so that: $similarity(u,v) = z_v^Tz_u$

# Random-walk Embeddings

1.  Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R: $P_R(u v)$
2.  Optimize embeddings to encode these random walk statistics: $similarity = cos(\theta) \propto P_R(u v)$

Unsupervised Feature Learning Idea: Learn node embedding such that nearby nodes are close together in the network Given a node u, how do we define nearby nodes? $N_R(U)$ neighbourhood of u obtained by some strategy R

Log-lokelihood objective: $max_z \sum_{u \in V} log P(N_R(u) | z_u)$ where $N_R(u)$ is the neighborhood of node $u$ by strategy $R$.

For random walk optimization:

1. Run short fixed-length random walks starting from each node on the graph using some strategy R
2. For each node u collect $N_R(U)$, the multiset* of nodes visited on random walks starting from u
3. Optimize embeddings according to: Given node u, predict its neighbors $N_R(U)$ $max_z \sum_{u \in V} log P(N_R(u) | z_u)$

$L = \sum_{u \in V} \sum_{v \in N_R(u)} -log P(v | z_u)$ Parameterize $P(v | z_u)$ using softmax: $P(v | z_u) = \frac{exp(z_u^Tz_v)}{\sum_{n \in V} exp(z_u^Tz_n)}$ Why softmax? Intuition: $\sum_i exp(x_i) \approx \max_iexp(x_i)$

But it is computationally expensive.

Solution: Negative Sampling $log(\frac{exp(z_u^Tz_v)}{\sum_{n \in V}exp(z_u^Tz_n)}) \\ \approx log(\sigma(z_u^Tz_v)) - \sum_{i=1}^k log(\sigma(z_u^Tz_{n_i})), ni \sim P_v$ where $\sigma()$ is the sigmoid function

# node2vec: Biased Walks

Idea: use flexible, biased random walks that can trade off between local and global views of the network BFS: micro-view of neighbourhood DFS: macro-view of neighbourhood Two parameters:

• Return parameter p: Return back to the previous node
• In-out parameter q: Moving outwards (DFS) vs. inwards (BFS)

Algorithm: 1) Compute random walk probabilities 2) Simulate $r$ random walks of length $l$ starting from each node $u$ 3) Optimize the node2vec objective using Stochastic Gradient Descent Linear-time complexity all 4 steps are individually parallelizable

# Translating Embeddings for Modeling Multi-relational Data

knowledge graph completion - link prediction

Tags: