Jekyll2023-11-10T12:20:38-05:00https://pyjhzwh.github.io/feed.xmlYunjie Pan’s Personal Websitepersonal websiteYunjie Panpanyj@umich.edu[2021-10-12] ISCA 2021 Section 3A: Machine Learning 12021-10-12T00:00:00-04:002021-10-12T00:00:00-04:00https://pyjhzwh.github.io/posts/reading-paper<p>This reading blog is about three papers in Section 3A: Machine Learning 1 of ISCA 2021.</p>
<h1 id="rapid-ai-accelerator-for-ultra-low-precision-training-and-inference">RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference</h1>
<p>This accelerator supports mixed precisions for both training and inference: 16 and 8-bit floating-point and 4 and 2-bit fixed point. It imporves both performance(TOPS) and energy efficiency(TOPS/W) at ultra-low preceision. In my opinion, this work contributes more on the engineering part (architecture).</p>
<h2 id="mpe-array-mixed-precision-pe-array">MPE Array: Mixed-Precision PE Array</h2>
<p><img src="../../images/posts/RaPiD_MPE.png" alt="RaPiD MPE block design" />
Here are the challenges and solutions for scaled precisions.
1) Challenge: Support both INT and FP pipelines. Solution: Seperation of the integer and floating point pipelines.
2) Challenge: FP8 for training has two format, one for forward(1,5,3), another for backward(1,4,3). Solution: On-the-fly conversion. Both ceonver to 9bit (1,5,3)
3) Challenge: performance scaling from FP16 to FP8. Solution: sub-SIMD partition.
4) Challenge: IN4/INT2 inference circuit-level optimizations. Solution: double INT4/INT2 engines; Operand Reuse: Sub-SIMD</p>
<h2 id="sparsity-aware-zero-gating-and-frequency-throttling">Sparsity-aware Zero-gating and Frequency Throttling</h2>
<p>1) Zero-gating for zero operands
2) Sparsity-aware frequency throttling. Use clock throttling rather than DVFS. (Although I don’t know what clock throttling); not in critical path</p>
<h2 id="data-communication-among-cores-and-memory">Data Communication Among Cores and Memory</h2>
<p>Bi-directional ring interconnnection to communicate data between cores and memory. Data fetch latency can be hidden by double-buffering data in L1 overlapped with computations;
Assign unique identification tags; Support multi-cast comminucations;
Support for “request aggregation”</p>Yunjie Panpanyj@umich.eduThis reading blog is about three papers in Section 3A: Machine Learning 1 of ISCA 2021. RaPiD: AI Accelerator for Ultra-Low Precision Training and Inference This accelerator supports mixed precisions for both training and inference: 16 and 8-bit floating-point and 4 and 2-bit fixed point. It imporves both performance(TOPS) and energy efficiency(TOPS/W) at ultra-low preceision. In my opinion, this work contributes more on the engineering part (architecture).[2021-09-24] Ten Lessons From Three Generations Shaped Google’s TPUv4i : Industrial Product2021-09-24T00:00:00-04:002021-09-24T00:00:00-04:00https://pyjhzwh.github.io/posts/reading-paper<p>I was quite busy during this year’s ISCA, so I did not have the chance to read the papers or watch the videos carefully. I will try to read the ISCA 2021 papers that I am interested in, hopefull before MICRO 2021 starts.</p>
<p>This paper is a industry-track paper that is written by the Google TPU team. This paper views TPU genrations and learns lessions from it in the perspective of industry, which is quite interesting and gives me more industry insights.</p>
<h1 id="ten-lessons">Ten lessons</h1>
<ol>
<li>Logic,wires,SRAM,& DRAM improves unequally:
<ul>
<li>Wires, SRAM has smaller gains when scling from 45nm to 7nm.</li>
<li>Logic improves much faster than wires and SRAM, so logic is relatively “free”, like HBM.</li>
</ul>
</li>
<li>Leverage prior compiler optimizations
<ul>
<li>TPU use XLA (Accelerated Linear Algebra) compiler</li>
</ul>
</li>
<li>Design for performane per TCO vs per CapEx
<ul>
<li>Captical Expense (CapEx) is the price for an item</li>
<li>Operation Expense (OpEx) is the cost of operation</li>
<li>TCO = CapEx + 3 * OpEx</li>
</ul>
</li>
<li>Support backward ML compatibility</li>
<li>Inference DSAs need air cooling for global scale</li>
<li>Some inference apps need floatin point arithmetic
<ul>
<li>ML acclerator usually use quantized models for inference to save area and power while using fp for training</li>
<li>But some apps like segmentation don’t work well when quantized</li>
<li>In TPUv1, app developers said 1\% acc drop is acceptable, but DNN overall quality improved so 1\% is large.</li>
</ul>
</li>
<li>Production inference normally needs multi-tenancy
<ul>
<li>Sharing can lower cost and reduce latency</li>
<li>Support multiple batch sizes to balance throughput and latency</li>
<li>need fast switching time between models, so DSAs need local memory</li>
<li>I am curious about the security issues for sharing hardware for different apps. How to isolate them?</li>
</ul>
</li>
<li>DNNs grow ~1.5x/year in memory and compute</li>
<li>DNN workloads evolve with DNN breakthroughs</li>
<li>Inference SLO(Service Level Objectives) limit is P99 latency, not batch size</li>
</ol>
<h1 id="tpuv4i">TPUv4i</h1>
<p>The block diagram is shown below.
<img src="../../images/posts/TPUv4i_block_diagram.png" alt="TPUv4i chip block diagram" /></p>
<ul>
<li>A single-core chip for inference (like TPUv1) and a dual-core chip for training (like TPUv3)</li>
<li>Be compiler compatible rather than binary compatible; XLA produces High Level Operations (HLO) that are machine indepedent and Low-Level Operations (LLO) that are machine dependent.</li>
<li>Increase on-chip SRAM storage with common memory (CMEM); choose 128MB.</li>
<li>DMA engines distrbiuted throughout the chip’s uncore to mitigate interconnect latency and wire scaling challenges; 4D tensor DMA supports arbitrary steps-per-stride and positive/negative stride distances in each dimension. (I do not quite understand the meaning of 4D here)</li>
<li>Supports bf16 and int8, so quantization is optional</li>
</ul>
<h1 id="performance">Performance</h1>
<p>TPUv4i has similar performance with TPUv3, but it has better perf/TCO at 2.3x vs TPUv3.</p>Yunjie Panpanyj@umich.eduI was quite busy during this year’s ISCA, so I did not have the chance to read the papers or watch the videos carefully. I will try to read the ISCA 2021 papers that I am interested in, hopefull before MICRO 2021 starts.[2021-04-20] DiAG: A Dataflow-Inspired Architecture for General-Purpose Processors2021-04-20T00:00:00-04:002021-04-20T00:00:00-04:00https://pyjhzwh.github.io/posts/reading-paper<p>In my point of view, it is basically a spatial dataflow architecture with more general compatiblity.</p>
<h2 id="difference-from-dataflow-architecture">Difference from dataflow architecture</h2>
<ol>
<li>instructions are assigned in program order, but execute out-of-order</li>
<li>use register lanes instead of register file to contruct a dataflow graph and serve as a reorder buffer</li>
<li>PEs are chained rather than a 2D array</li>
</ol>
<p><img src="../../images/posts/DiAG_dataflow.png" alt="Dataflow in DiAG architecture" /></p>
<h2 id="pros-and-cons">Pros and Cons</h2>
<ul>
<li>
<p>Dynamic datapaths constructed by DiAG are reusable, thus loop iterations can execute at an efficiency close to accelerators</p>
</li>
<li>On the other hand, applications that are memory-centric or contain significant control divergence perform poorly since most cycles are wasted on stalls</li>
<li>area overhead</li>
</ul>
<h2 id="my-questions">My questions</h2>
<p>Why CGRA could not be a main processor, but only be a co-processors?</p>Yunjie Panpanyj@umich.eduIn my point of view, it is basically a spatial dataflow architecture with more general compatiblity.[2021-02-20] Analyzing and Mitigating Data Stalls in DNN Training2021-02-20T00:00:00-05:002021-02-20T00:00:00-05:00https://pyjhzwh.github.io/posts/reading-paper<p>Most DNN acclerator papers I read focus on DNN inference rather than training. From this paper, I learned that the bottleneck for DNN training is I/O for fetching data and CPU side for preprocessing.</p>
<h1 id="background">Background</h1>
<p><img src="../../images/dnn_training_data_pipeline.png" alt="Data Pipeline in DNN taining" /></p>
<p>The figure above shows the data pipeline in DNN training.
(1) A minibatch of data items is fetched from storage.
(2) The data items are pre-processed, for e.g.,, for image classifica- tion, data items are decompressed, and then randomly cropped, resized, and flipped.
(3) The minibatch is then processed at the GPU to obtain the model’s prediction
(4) A loss function is used to determine how much the prediction deviates from the right answer
(5) Model weights are updated using computed gradients</p>
<h1 id="analyzing-data-stalls">Analyzing data stalls</h1>
<h2 id="technique">Technique</h2>
<p>Existing profiling data stalls frameworks like Pytorch and Tensorflow are inaccurate and insufficient:
1) They cannot accurately provide the split up of time spent in data fetch (from disk or cache) and pre-processing operations
2) Frameworks like PyTorch and libraries like DALI use several concurrent processes (or threads) to fetch and pre-process data; But GPU processes wait to synchronize weight upates at batch boundaries, so a data stall may affect the GPU compute time for other GPUs</p>
<p>This paper develop a tool, DS-Analyzer to overcome these limitations by using a dofferential approach:
1) Measure ingestion rate with no fetch or prep stalls
2) Measure prep stalls with a subset of given dataset which is entirely cached in memory.
3) Measure fetch stalls by clearing all caches and compare the difference between 2)</p>
<h2 id="results">Results</h2>
<ul>
<li>Pay a one-time download cost for the dataset, and reap benefits of local-SSD accesses thereafter. Because the cost of downloading the entire dataset in the first epoch is amortized.</li>
</ul>
<p>When datasets cannot be fully cached:</p>
<ul>
<li>Fetch stalls are common if the dataset is not fully cached in memory, which is obvious.</li>
<li>OS Page Cache is inefficient for DNN training because it leads to trashing.</li>
<li>Lack of coordination among caches leads to redundant I/O in distributed training.</li>
</ul>
<p>When datasets could fit in memory:</p>
<ul>
<li>DNNs need 3–24 CPU cores per GPU for pre-processing.</li>
<li>DALI is able to reduce, but not eliminate prep stalls.</li>
<li>As compute gets faster (either due to large batch sizes, or the GPU getting faster), data stalls squander the benefits due to fast compute.</li>
<li>Redundant pre-processing for concurrent jobs in HP search results in high prep stalls</li>
</ul>
<h1 id="mitigate-data-stalls">Mitigate data stalls</h1>
<ul>
<li>MinIO cache (single-server training)
<ul>
<li>DNN data access pattern: repetitive across epochs and random within an epoch.</li>
<li>items, once cached, are never replaced in the DNN cache</li>
</ul>
</li>
<li>Patitioned MinIO cache (distributed-server training)
<ul>
<li>Data transfer over commodity TCP stack is much faster than fetch- ing a data item from its local storage, on a cache miss.</li>
<li>Whenever a local cache miss happens in the subsequent epoch at any server, the item is first looked up in the metadata; if present, it is fetched from the respective server over TCP, else from its local storage.</li>
</ul>
</li>
<li>Coordinated Prep (single-server training)
<ul>
<li>each job processes the entire dataset exactly once per epoch</li>
</ul>
</li>
</ul>Yunjie Panpanyj@umich.eduMost DNN acclerator papers I read focus on DNN inference rather than training. From this paper, I learned that the bottleneck for DNN training is I/O for fetching data and CPU side for preprocessing. Background[2021-01-26] GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms2021-01-26T00:00:00-05:002021-01-26T00:00:00-05:00https://pyjhzwh.github.io/posts/reading-paper<p>This paper is a CPU-FPGA heterogenrous platform for GCN training. CPU will do the communication intensive operations, and leave the computation intensive parts to CPU.</p>
<h1 id="background-and-motivation">Background and Motivation</h1>
<p>It is challenging to accelerate Graph Convolutional Networks because:
(1) substantial and irregular data communication to propagate information within the graph
(2) intensive computation to propagate information along the neural network layers
(3) Degree-imbalance ofgraph nodes can significantly degrade the performance of feature propagation.</p>
<p>GCN acceleration compared with existing graph analytic problems
(1) traditional graph analytics often propagate scalars along the graph edges, while GCNs propagate long feature vectors
(2) traditional graph analytics often propagate information within the full graph, while GCNs propagate within minibatches.</p>
<h1 id="optimizations">Optimizations</h1>
<p>Training Algorithm Selection:</p>
<ul>
<li>minibatch by sampling training graph. The algorithm will samples subgraph instead of GCN layers</li>
</ul>
<p>Redundancy Reduction:</p>
<ul>
<li>perform pre-processing to compute the partial sum</li>
<li>common pairs of neighbors, size of 2 (Could it be larger, or dynamic according the graph topolopy?)</li>
</ul>
<h1 id="architecture-design">Architecture Design</h1>
<p><img src="../../images/GraphACT_arch.png" alt="GraphACT FPGA overview" />
CPU: communication intensive part, including graph sampling</p>
<p>FPGA: computation intensive part, including forward and backward pass</p>
<p>How to improve training throughput:</p>
<ul>
<li>reduce the overhead in external memory access: set the minibatch size so that the subgraph could fit in BRAM capacity</li>
<li>increase the utilization of the on-chip resources
<ul>
<li>feature aggrefation module: 1D accumulator array - parallelizing on the feature dimension</li>
<li>weight transformation module: 2D systolic array to compute the dense matrix product</li>
</ul>
</li>
</ul>
<h1 id="evaluation">Evaluation</h1>
<p>Compared with CPU baseline, 12x to 15x speedup; Compared with GPU baseline, 1.1x to 1.5x faster convergence rate.
The authors said that their work has higher accuracy comapred with one previous work. But I wonder if all the work use the same subgraph sampling algorithm, would the accuracy be different.</p>
<h1 id="insights">Insights</h1>
<p>Although I feel like this work focus more on the engineering part rather than the novel architecture design. The challenge it proposes is insightful for me: memory access and load-balance.
I wonder if this design is scalable. It seems that the BRAM size is the bottleneck, could we do more optimization on the memory access?</p>Yunjie Panpanyj@umich.eduThis paper is a CPU-FPGA heterogenrous platform for GCN training. CPU will do the communication intensive operations, and leave the computation intensive parts to CPU. Background and Motivation It is challenging to accelerate Graph Convolutional Networks because: (1) substantial and irregular data communication to propagate information within the graph (2) intensive computation to propagate information along the neural network layers (3) Degree-imbalance ofgraph nodes can significantly degrade the performance of feature propagation.[2021-01-18] [CS224W] Graph Neural Network2021-01-18T00:00:00-05:002021-01-18T00:00:00-05:00https://pyjhzwh.github.io/posts/studying<h1 id="basics-of-graph-neural-network">Basics of Graph Neural Network</h1>
<p>Idea: Generate node embeddings based on local network neighborhoods
Neighborhood aggregation: Average information from neighbors and apply a neural network</p>
<p>\(h_v^0 = x_v \\
h_v^k = \sigma(W_k \sum_{u \in N(v)} \frac{h_u^{k-1}}{|N(v)|} + B_k h_v^{k-1}), \forall k \in {1, \cdots, K}\\
z_v = h_v^K\)
$W_k$ and $B_k$ are trainable parameters</p>
<h1 id="graph-convolutional-networks-and-graphsage">Graph Convolutional Networks and GraphSAGE</h1>
\[h_v^k = \sigma([W_k \cdot AGG({h_u^{k-1}, \forall u \in N(v)}) , B_k h_v^{k-1}]), k \in {1, \cdots, K}\\\]
<p>AGG variants:
mean, pool, LSTM</p>
<p>Efficient Implementation:</p>
<ul>
<li>sparse matrix operations</li>
</ul>
<h1 id="graph-attention-networks">Graph Attention Networks</h1>
<p>Specify arbitrary importances to different neighbors of each node in the graph
Let $\alpha_{vu}$ be computed as a byproduct of an attention mechanism $a$
\(e_{vu} = a(W_kh_u^{k-1}, W_kh_v^{k-1}) \\
\alpha_{vu} = \frac{exp(e_{vu})}{\sum_{k \in N(v)} exp(e_{vk})} \\
h_v^k = \sigma(\sum_{u \in N(v)} \alpha_{vu}W_kh_u^{k-1})\)
where $e_{vu}$ indicates the importance of node u’s message to node v</p>
<p>Attention mechanism $a$</p>
<ul>
<li>e.g. use a simple single-layer neural network</li>
<li>parameters of $a$ are trained jointly</li>
</ul>Yunjie Panpanyj@umich.eduBasics of Graph Neural Network[2021-01-17] [CS224W] Graph Repesentation Learning2021-01-17T00:00:00-05:002021-01-17T00:00:00-05:00https://pyjhzwh.github.io/posts/studying<h1 id="network-embedding">Network embedding</h1>
<p>Task: We map each node in a network into a low-dimensional space
Goal: encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.</p>
<ol>
<li>Define an encoder (i.e., a mapping from nodes to embeddings)
\(ENV(v) = z_v\)</li>
<li>Define a node similarity function (i.e., a measure of similarity in the original network)</li>
<li>Optimize the parameters of the encoder so that:
\(similarity(u,v) = z_v^Tz_u\)</li>
</ol>
<h1 id="random-walk-embeddings">Random-walk Embeddings</h1>
<ol>
<li>
<table>
<tbody>
<tr>
<td>Estimate probability of visiting node v on a random walk starting from node u using some random walk strategy R: $P_R(u</td>
<td>v)$</td>
</tr>
</tbody>
</table>
</li>
<li>
<table>
<tbody>
<tr>
<td>Optimize embeddings to encode these random walk statistics: $similarity = cos(\theta) \propto P_R(u</td>
<td>v)$</td>
</tr>
</tbody>
</table>
</li>
</ol>
<p>Unsupervised Feature Learning
Idea: Learn node embedding such that nearby nodes are close together in the network
Given a node u, how do we define nearby nodes? $N_R(U)$ neighbourhood of u obtained by some strategy R</p>
<p>Log-lokelihood objective:
\(max_z \sum_{u \in V} log P(N_R(u) | z_u)\)
where $N_R(u)$ is the neighborhood of node $u$ by strategy $R$.</p>
<p>For random walk optimization:</p>
<ol>
<li>Run short fixed-length random walks starting from each node on the graph using some strategy R</li>
<li>For each node u collect $N_R(U)$, the multiset* of nodes visited on random walks starting from u</li>
<li>Optimize embeddings according to: Given node u, predict its neighbors $N_R(U)$
\(max_z \sum_{u \in V} log P(N_R(u) | z_u)\)</li>
</ol>
<p>\(L = \sum_{u \in V} \sum_{v \in N_R(u)} -log P(v | z_u)\)
Parameterize $P(v | z_u)$ using softmax:
\(P(v | z_u) = \frac{exp(z_u^Tz_v)}{\sum_{n \in V} exp(z_u^Tz_n)}\)
Why softmax? Intuition: $\sum_i exp(x_i) \approx \max_iexp(x_i)$</p>
<p>But it is computationally expensive.</p>
<p>Solution: Negative Sampling
\(log(\frac{exp(z_u^Tz_v)}{\sum_{n \in V}exp(z_u^Tz_n)}) \\
\approx log(\sigma(z_u^Tz_v)) - \sum_{i=1}^k log(\sigma(z_u^Tz_{n_i})), ni \sim P_v\)
where $\sigma()$ is the sigmoid function</p>
<h1 id="node2vec-biased-walks">node2vec: Biased Walks</h1>
<p>Idea: use flexible, biased random walks that can trade off between local and global views of the network
BFS: micro-view of neighbourhood
DFS: macro-view of neighbourhood
Two parameters:</p>
<ul>
<li>Return parameter p: Return back to the previous node</li>
<li>In-out parameter q: Moving outwards (DFS) vs. inwards (BFS)</li>
</ul>
<p>Algorithm:
1) Compute random walk probabilities
2) Simulate $r$ random walks of length $l$ starting from each node $u$
3) Optimize the node2vec objective using Stochastic Gradient Descent
Linear-time complexity
all 4 steps are individually parallelizable</p>
<h1 id="translating-embeddings-for-modeling-multi-relational-data">Translating Embeddings for Modeling Multi-relational Data</h1>
<p>knowledge graph completion - link prediction</p>Yunjie Panpanyj@umich.eduNetwork embedding Task: We map each node in a network into a low-dimensional space Goal: encode nodes so that similarity in the embedding space (e.g., dot product) approximates similarity in the original network.[2021-01-15] [CS224W] Spectral Clustering2021-01-15T00:00:00-05:002021-01-15T00:00:00-05:00https://pyjhzwh.github.io/posts/studying<h1 id="graph-partitioning">Graph Partitioning</h1>
<p>Graph cut: Set of edges with one endpoint in each group
\(cut(A,B) = \sum_{i \in A, j \in B} w_{ij}\)
where $w_{ij}$ is the weighted edges between i and j</p>
<p>Graph Cut Criterion:</p>
<ul>
<li>Minumin cut
<ul>
<li>problems: only consider external cluster connections</li>
</ul>
</li>
<li>Conductance
<ul>
<li>$\phi(A,B) = \frac{cut(A,B)}{min(vol(A), vol(B))}$, where $vol(A)$ is the total weighted degreee of nodes in A</li>
<li>Produces more balanced partitions</li>
<li>problem: Computing the best cut is NP-hard</li>
</ul>
</li>
</ul>
<p>Adjacency matrix(A)
Degree Matrix(D)
Laplacian matrix(L): $L = D - A$</p>
<p>We would like to find the 2nd smallest eigenvalues and eigenvectors of $L$
\(\lambda_2 = min_{x: x^Tw_1 = 0} \frac{x^TMx}{x^Tx} = min_{\sumx_i=0} \frac{\sum_{(i,j) \in E} (x_i - x_j)^2}{\sum_i x_i^2}\)</p>
<h1 id="spectral-clustering-algorithm">Spectral Clustering Algorithm</h1>
<p>1) Pre-processing
Construct a matrix representation of the graph
2) Decomposition
Compute eigenvalues and eigenvectors of the matrix (only care the 2nd smallest eigenvalues)
Map each point to a lower-dimensional representation based on one or more eigenvectors
3) Grouping
Assign points to two or more clusters, based on the new representation</p>
<h1 id="motif-based-spectral-clustering">Motif-based spectral clustering</h1>
<p>motifs cut
$vol_M(S)$ = #(motif end-points in S)
\(\phi(S) = \frac{#(motifs cut)}{vol_M(S)}\)</p>
<p>Three steps
1) Pre-processing
$W_{i,j}^(M)$= # times edge (i,j) participates in the motif $M$
2) Decomposition (standard sprctral clustering)
set $L^(M) = D^(M) - W^(M)$, get 2nd eigenvalues and eigenvectors
3) Grouping
Sort nodes by their values in $x$: x1, x2, …xn. Let Sr = {x1, …, xr} and compute the motif conductance of each $S_r$</p>Yunjie Panpanyj@umich.eduGraph Partitioning[2021-01-13] [CS224W] Community Structure in Networks2021-01-13T00:00:00-05:002021-01-13T00:00:00-05:00https://pyjhzwh.github.io/posts/studying<h1 id="communities">Communities</h1>
<p>Triadic closure = high clustering coefficient
Edge overleap:
\(O_{i,j} = \frac{N(i) \cap N(j)\\{i,j}}{N(i) \cup N(j)\\{i,j}}\)
where N(i) is the set of neighbors of node i</p>
<p>Network communities: sets of tightly connected nodes</p>
<p>Modularity Q: A measure of how well a network is partitioned into communities
\(Q \propto \sum_{s \in S} [(# edges within group s) - \underset{need a null model}{(expected # edges within group s)} ]\)</p>
<p>Null Model - Configuration Model
Given real $G$ on $n$ nodes and $m$ edges, construct rewired network $G\prime$
The expected number of edges between nodes $i$ and $j$ of degrees $k_i$ and $k_j$ is $k_ik_j/(2m)$</p>
<p>\(Q(G,S) = 1/2m \sum_{s \in S}\sum_{i \in s} \sum_{j \in s} (A_{ij} - k_ik_j/(2m))\)
$A_{ij} = 1$ if i,j has edges</p>
<h1 id="louvain-algorithm---greedy-algorithm-for-community-detection">Louvain Algorithm - Greedy algorithm for community detection</h1>
<p>O($n\log n$) run time
Each pass is made of 2 phases:
Phase 1: Modularity is optimized by allowing only local changes to node-communities memberships
Phase 2: The identified communities are aggregated into super-nodes to build a new network</p>
<h1 id="bigclam---detecting-overlapping-communities">BigCLAM - Detecting Overlapping Communities</h1>
<p>Step 1)
Define a generative model for graphs that is based on node community affiliations
Community Affiliation Graph Model (AGM)
Step 2)
Given graph $G$, make the assumption that $G$ was generated by AGM
Find the best AGM that could have generated $G$, maximize graph likelihood $P(G|F)$</p>Yunjie Panpanyj@umich.eduCommunities[2021-01-12] [CS224W] Motifs and Structural Roles in Networks2021-01-12T00:00:00-05:002021-01-12T00:00:00-05:00https://pyjhzwh.github.io/posts/studying<h1 id="subgraphs-motifs">Subgraphs, Motifs</h1>
<p>Network motifs: recurring, significant patterns of interconnections</p>
<ul>
<li>induced subgraphs - consider all edges connecting pairs of vertices in subset</li>
<li>recurrence - allow overlapping of motifs</li>
<li>signficance of a motif: Motifs are overrepresented in a network when compared to randomized networks</li>
</ul>
<p>How to get the randomized networks? Configuration model: : Generate a random graph with a given degree sequence k1, k2, … kN ( the same degree as the real network) Each $G^{rand}$ has the same #(nodes), #(edges) and #(degree distribution) as $G^{real}$</p>
<ul>
<li>Spokes: Nodes with spokes; randomly pair up mini-nodes</li>
<li>Swithing: Select a pair of edges A->B, C->D at random; Exchange the endpoints to give A->D, C->B</li>
</ul>
<h1 id="graphlets">Graphlets</h1>
<p>Definition: connected non-isomorphic subgraphs
Graphlet degree vector counts #(graphlets) that a node touches at a particular orbit (takes into account the symmetries of a subgraph)
Graphlet degree vector provides a measure of a node’s local network topology</p>
<h1 id="structural-roles-in-networks">Structural Roles in Networks</h1>
<p>Role: A collection of nodes which have similar positions in a network
Structural equivalence: Nodes u and v are structurally equivalent if they have the same relationships to all other nodes</p>
<h1 id="structure-role-discovery-method---roix">Structure Role Discovery Method - RoIX</h1>
<p>Recursive feature extraction turns network connectivity into structural features. Use the neighborhood features to generate new recursive features.</p>
<ul>
<li>Neighborhood features:
<ul>
<li>Local features: all measures of the node degreee (in-, out- degree, total degree, etc.)</li>
<li>Egonet features: Egonet includes the node, its neighbors, and any edges in the induced subgraph on these nodes. #(within egonet edges), #(edges entering/leaving egonet)</li>
</ul>
</li>
<li>recursive features:
<ul>
<li>Use the set of current node features to generate additional features; Two types of aggregate functions: mean and sum</li>
</ul>
</li>
</ul>Yunjie Panpanyj@umich.eduSubgraphs, Motifs Network motifs: recurring, significant patterns of interconnections induced subgraphs - consider all edges connecting pairs of vertices in subset recurrence - allow overlapping of motifs signficance of a motif: Motifs are overrepresented in a network when compared to randomized networks