Presented by Danushka Bollegala An Introduction to Spectral Graph Theory
Spectral Graph Theory Spectrum = the set of eigenvalues By looking at the spectrum we can know about the graph itself! A way of normalizing data (canonical form) and then perform clustering (e.g. via k-means) on this normalized/reduced space. Input: A similarity matrix Output: A set of (non-overlapping/hard) clusters.
Definitions Undirected Graph G(V, E) V: set of vertices (nodes in the network) E: set of edges (links in the network) ▪ Weight w is the weight of the edge connecting ij vertex I and j (represented by the affinity matrix.) Degree: sum of weights on outgoing edges of a vertex. Measuring the size of a subset A of V
How to create W? How to create the affinity matrix W from the similarity matrix S? ε-neighborhood graph ▪ Connect all vertices that have similarity greater than ε k-nearest neighbor graph ▪ Connect the k-nearest neighbors of each vertex. ▪ Mutual k-nearest neighbor graphs for asymmetric S. Fully connected graph ▪ Use the Gaussian similarity function (kernel)
Unnormalized Graph Laplacian L = D – W D: degree matrix. A diagonal matrix diag(d ,...,d ) 1 n Properties For every vector L is symmetric and positive semi-definite The smallest eigenvalue of L is zero and the corresponding eigenvector is 1 = (1,...,1)T L has n non-negative, real-valued eigenvalues
Normalized Graph Laplacians Two versions exist L = D-1/2LD-1/2 = I - D-1/2WD-1/2 sym L = D-1L = I - D-1W rw
Spectral Clustering (L)
Spectral Clustering (L ) rw
Spectral Clustering (L ) sym
Graph cut point of view The partition (A1,...,Ak) induces a cut on the graph Two types of graph cuts exist Spectral clustering solves a relaxed version of the mincut problem (therefore it is an approximation)
Example: RatioCut, k=2 By the Rayleigh-Ritz theorem it follows that the second eigenvalue is the minimum.
Relaxation can be problematic!
Random walks point of view Transition probability matrix and Laplacian are related! P = D-1W L = I - P rw
Recommendations Lrw based spectral clustering (Shi & Malik,2000) is better (especially when the degree distribution is uneven). Use k-nearest neighbor graphs How to set the number of clusters: k=log(n) Use the eigengap heuristic If using Gaussian kernel how to set sigma Mean distance of a point to its log(n)+1 nearest neighbors.
Matrix Approximation using SVD Eckart-Young Theorem The low-rank approximation B for a matrix A s.t. rank(B) = r < rank(A) is given by, B = USV*, where A = UZV* and S is the same as Z except the (r+1) and above singular values of Z are set to zero. Approximation is done by minimizing the Frobenius norm ▪ min ||A – B|| , subject to rank(B) = r B F
Approximation Error 1. 1 Creat e e at e a 10 1 0 0 0 x 10 1 0 0 0 0 rando d m matr mat ix x A. 2. 2 B = = appr a ox(A,r) 3. 3 RM R S M E E = = ||A – B||F 4. 4 Plot t RM R S M E E aga a i ga nst r.