admin 管理员组

文章数量: 1086877

谱聚类python代码

没有太多光谱聚类的经验,只是按照文档进行(结果请跳到最后!)以下内容:

代码:import numpy as np

import networkx as nx

from sklearn.cluster import SpectralClustering

from sklearn import metrics

np.random.seed(1)

# Get your mentioned graph

G = nx.karate_club_graph()

# Get ground-truth: club-labels -> transform to 0/1 np-array

# (possible overcomplicated networkx usage here)

gt_dict = nx.get_node_attributes(G, 'club')

gt = [gt_dict[i] for i in G.nodes()]

gt = np.array([0 if i == 'Mr. Hi' else 1 for i in gt])

# Get adjacency-matrix as numpy-array

adj_mat = nx.to_numpy_matrix(G)

print('ground truth')

print(gt)

# Cluster

sc = SpectralClustering(2, affinity='precomputed', n_init=100)

sc.fit(adj_mat)

# Compare ground-truth and clustering-results

print('spectral clustering')

print(sc.labels_)

print('just for better-visualization: invert clusters (permutation)')

print(np.abs(sc.labels_ - 1))

# Calculate some clustering metrics

print(metrics.adjusted_rand_score(gt, sc.labels_))

print(metrics.adjusted_mutual_info_score(gt, sc.labels_))

输出:ground truth

[0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]

spectral clustering

[1 1 0 1 1 1 1 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]

just for better-visualization: invert clusters (permutation)

[0 0 1 0 0 0 0 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1]

0.204094758281

0.271689477828

总体思路:

介绍here中的数据和任务:The nodes in the graph represent the 34 members in a college Karate club. (Zachary is a sociologist, and he was one of the members.) An edge between two nodes indicates that the two members spent significant time together outside normal club meetings. The dataset is interesting because while Zachary was collecting his data, there was a dispute in the Karate club, and it split into two factions: one led by “Mr. Hi”, and one led by “John A”. It turns out that using only the connectivity information (the edges), it is possible to recover the two factions.

使用sklearn&spectral集群解决此问题:If affinity is the adjacency matrix of a graph, this method can be used to find normalized graph cuts.

This将规范化图切割描述为:Find two disjoint partitions A and B of the vertices V of a graph, so

that A ∪ B = V and A ∩ B = ∅

Given a similarity measure w(i,j) between two vertices (e.g. identity

when they are connected) a cut value (and its normalized version) is defined as:

cut(A, B) = SUM u in A, v in B: w(u, v)

...

we seek the minimization of disassociation

between the groups A and B and the maximization of the association

within each group

听起来不错。因此,我们创建邻接矩阵(nx.to_numpy_matrix(G)),并将参数affinity设置为预计算的(因为邻接矩阵是我们预计算的相似性度量)。Alternatively, using precomputed, a user-provided affinity matrix can be used.

编辑:虽然对此不熟悉,但我查找了要调整的The strategy to use to assign labels in the embedding space. There are two ways to assign labels after the laplacian embedding. k-means can be applied and is a popular choice. But it can also be sensitive to initialization. Discretization is another approach which is less sensitive to random initialization.

所以尝试不那么敏感的方法:sc = SpectralClustering(2, affinity='precomputed', n_init=100, assign_labels='discretize')

输出:ground truth

[0 0 0 0 0 0 0 0 0 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]

spectral clustering

[0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 1 0 0 1 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1]

just for better-visualization: invert clusters (permutation)

[1 1 0 1 1 1 1 1 0 0 1 1 1 1 0 0 1 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0]

0.771725032425

0.722546051351

这是一个非常符合实际的事实!

本文标签: 谱聚类python代码