This question is on graph clustering. In its simplest form, the eigenvector corresponding to the second smallest eigenvalue of the normalized Laplacian of a graph provides a relaxed solution to the problem of partioning the graph into two sets of vertices so that there are few connecting links between the two sets(mincut problem). However there are also works that use the personalized pagerank of a graph starting from a certain vertex and finds a cut based on the largest components of the pagerank whose conductance is sufficiently close to the minimum possible. The work of Local Graph Partitioning using PageRank Vectors, by Reid Andersen et. al., is one such work where they derive a Cheeger inequality for the conductance of a set derived this way. I understand at least the intuition behind why the Fiedler vector works for partitioning - when the graph has two disconnected components, the second eigenvector is in the space spanned by $1_S, 1_{S^c}$ and is perpendicular to the all one vector, so it makes sense to consider this vector. The PageRank is however the left eigenvector of the largest eigenvalue of a modified Markov matrix. Is there a way to make a link between the pagerank and the Fiedler vector? As an optimisation problem, pagerank-like result arises as the solution to the Fiedler eigenvector with an additional constraint that the vector is highly correlated with a vector v, which represents prior knowledge(semi-supervised clustering)
1 Answer
The PageRank vector with seed $s$ and jumping constant $\alpha$ is by definition the solution to the equation $$pr(s,\alpha) = \alpha s + (1-\alpha)pr(s,\alpha)W$$ where $W$ is the random walk matrix $D^{-1}A$ ($D$ is the degree matrix, $A$ the adjacency matrix). Note that I'm right multiplying by $W$ here, as is conventional in the literature on this stuff.
A bit of analysis shows that $pr(s,\alpha) = s R_\alpha$ where: $$R_\alpha = \alpha \sum_{i=0}^\infty (1-\alpha)^i W^i$$
I think that the best way to get intuition about the isoperimetric properties of PageRank vectors is to stare at this sum for awhile. Assume that $s$ is the indicator function of a single vertex $v$ (so that the corresponding PageRank vector is "personalized" to $v$). Then the vector $s W^i$ is a probability distribution whose entries correspond to the probabilities that an $i$-step random walk starting at $v$ will end up at other vertices in the graph.
So the entries of $sR_\alpha$ can be interpreted as probabilities that "exponentially short" random walks starting at $v$ will visit other vertices. Here "exponentially short" means that we account for random walks of all possible lengths (the index $i$ ranges from $0$ to $\infty$) but we exponentially punish the long walks via the factor $(1-\alpha)^i$.
Now take a graph $G$ which is connected but "almost disconnected" meaning there is a set $S$ of vertices whose isoperimetric ratio (conductance) is very small. This means that if an edge has one vertex in $S$ then its other vertex is much more likely to be in $S$ than not, and thus a short random walk which starts in $S$ is unlikely to reach $S^c$. So for many vertices in $S$ we expect that the values of $pr(s,\alpha)$ will drop off considerably (more than you would expect from looking at degrees) on vertices not in $S$. This is the basic principle that Andersen-Chung-Lang use to (nearly) recover $S$ from $pr(s,\alpha)$.
Note that the underlying idea of using exponentially short random walks to locally cluster large graphs is really due to Spielman-Teng, who worked directly with random walks instead of PageRank vectors. Chung has also exploited this idea using heat kernels instead of PageRank vectors. All formulations of the idea lead to relatively similar performance and theoretical guarantees, so I think it mostly comes down to preference.
-
$\begingroup$ Thanks for answer. Sorry for not making it clearer, but I was looking for an explanation that starts from the Fiedler vector and establishes a connexion between it and the pagerank Is it an accident that they both happen to give a cut in a graph? $\endgroup$– ArunJul 14, 2016 at 6:09