Python Data Science 简明教程

Python - Graph Data

CSGraph 表示 Compressed Sparse Graph ,专注于基于稀疏矩阵表示的快速图算法。

CSGraph stands for Compressed Sparse Graph, which focuses on Fast graph algorithms based on sparse matrix representations.

Graph Representations

首先,我们了解一下稀疏图是什么,以及它在图表示中如何提供帮助。

To begin with, let us understand what a sparse graph is and how it helps in graph representations.

What exactly is a sparse graph?

图只是一组节点,它们之间存在链接。图可以表示几乎任何内容 − 社交网络连接,其中每个节点都是一个人,并连接到熟人;图像,其中每个节点都是一个像素,并连接到相邻像素;高维分布中的点,其中每个节点都连接到其最近邻点,以及实际上你能想象到的任何其他内容。

A graph is just a collection of nodes, which have links between them. Graphs can represent nearly anything − social network connections, where each node is a person and is connected to acquaintances; images, where each node is a pixel and is connected to neighbouring pixels; points in a high-dimensional distribution, where each node is connected to its nearest neighbours and practically anything else you can imagine.

用稀疏矩阵表示图数据是一种非常有效的方法:让我们称之为 G。矩阵 G 的大小为 N x N,G[i, j] 给出了节点“i”和节点“j”之间连接的值。稀疏图主要包含零 − 也就是说,大多数节点只有少数连接。此特性在大多数情况下都是成立的。

One very efficient way to represent graph data is in a sparse matrix: let us call it G. The matrix G is of size N x N, and G[i, j] gives the value of the connection between node ‘i' and node ‘j’. A sparse graph contains mostly zeros − that is, most nodes have only a few connections. This property turns out to be true in most cases of interest.

创建稀疏图子模块的动机来自 scikit-learn 中使用的几种算法,包括以下算法 −

The creation of the sparse graph submodule was motivated by several algorithms used in scikit-learn that included the following −

  1. Isomap − A manifold learning algorithm, which requires finding the shortest paths in a graph.

  2. Hierarchical clustering − A clustering algorithm based on a minimum spanning tree.

  3. Spectral Decomposition − A projection algorithm based on sparse graph laplacians.

举个具体的例子,假设我们要表示以下无向图 −

As a concrete example, imagine that we would like to represent the following undirected graph −

undirected graph

此图有三个节点,其中节点 0 和 1 由权重为 2 的边连接,节点 0 和 2 由权重为 1 的边连接。我们可以构造稠密、掩码和稀疏表示,如以下示例所示,同时谨记无向图由对称矩阵表示。

This graph has three nodes, where node 0 and 1 are connected by an edge of weight 2, and nodes 0 and 2 are connected by an edge of weight 1. We can construct the dense, masked and sparse representations as shown in the following example, keeping in mind that an undirected graph is represented by a symmetric matrix.

G_dense = np.array([ [0, 2, 1],
                     [2, 0, 0],
                     [1, 0, 0] ])

G_masked = np.ma.masked_values(G_dense, 0)
from scipy.sparse import csr_matrix

G_sparse = csr_matrix(G_dense)
print G_sparse.data

上述程序将生成以下输出。

The above program will generate the following output.

array([2, 1, 2, 1])
undirected graph using symmetric matrix

这与之前的图相同,只是节点 0 和 2 由权重为 0 的边连接。在这种情况下,以上的稠密表示会导致歧义 − 如果 0 是一个有意义的值,那么如何表示非边。在这种情况下,必须使用掩码或稀疏表示来消除歧义。

This is identical to the previous graph, except nodes 0 and 2 are connected by an edge of zero weight. In this case, the dense representation above leads to ambiguities − how can non-edges be represented, if zero is a meaningful value. In this case, either a masked or a sparse representation must be used to eliminate the ambiguity.

让我们考虑以下示例。

Let us consider the following example.

from scipy.sparse.csgraph import csgraph_from_dense
G2_data = np.array
([
   [np.inf, 2, 0 ],
   [2, np.inf, np.inf],
   [0, np.inf, np.inf]
])
G2_sparse = csgraph_from_dense(G2_data, null_value=np.inf)
print G2_sparse.data

上述程序将生成以下输出。

The above program will generate the following output.

array([ 2., 0., 2., 0.])