ML算法：均值漂移聚类详细介绍

2021年5月6日16:42:07 发表评论 1,135 次浏览

内核密度估计–

``````import numpy as np
import pandas as pd
from sklearn.cluster import MeanShift
from sklearn.datasets.samples_generator import make_blobs
from matplotlib import pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

# We will be using the make_blobs method
# in order to generate our own data.

clusters = [[ 2 , 2 , 2 ], [ 7 , 7 , 7 ], [ 5 , 13 , 13 ]]

X, _ = make_blobs(n_samples = 150 , centers = clusters, cluster_std = 0.60 )

# After training the model, We store the
# coordinates for the cluster centers
ms = MeanShift()
ms.fit(X)
cluster_centers = ms.cluster_centers_

# Finally We plot the data points
# and centroids in a 3D graph.
fig = plt.figure()

ax = fig.add_subplot( 111 , projection = '3d' )

ax.scatter(X[:, 0 ], X[:, 1 ], X[:, 2 ], marker = 'o' )

ax.scatter(cluster_centers[:, 0 ], cluster_centers[:, 1 ], cluster_centers[:, 2 ], marker = 'x' , color = 'red' , s = 300 , linewidth = 5 , zorder = 10 )

plt.show()``````

->首先需要确保我们的估算被标准化。 ->第二个与我们空间的对称性有关。

``````1. Initialize random seed and window W.
2. Calculate the center of gravity (mean) of W.
3. Shift the search window to the mean.
4. Repeat Step 2 until convergence.``````

``````for p in copied_points:
while not at_kde_peak:
p = shift(p, original_points)``````

Shift功能看起来像这样–

``````def shift(p, original_points):
shift_x = float ( 0 )
shift_y = float ( 0 )
scale_factor = float ( 0 )

for p_temp in original_points:
# numerator
dist = euclidean_dist(p, p_temp)
weight = kernel(dist, kernel_bandwidth)
shift_x + = p_temp[ 0 ] * weight
shift_y + = p_temp[ 1 ] * weight
# denominator
scale_factor + = weight

shift_x = shift_x /scale_factor
shift_y = shift_y /scale_factor
return [shift_x, shift_y]``````

• 查找可变数量的模式
• 对异常值的鲁棒性
• 通用的, 独立于应用程序的工具
• 无需模型, 不会在数据群集上采用球形, 椭圆形等任何先前的形状
• 只是一个参数(窗口大小h), 其中h具有物理含义(与k均值不同)

• 输出取决于窗口大小
• 窗口大小(带宽)选择并非无关紧要
• 计算上(相对)昂贵(约2秒/张图片)
• 要素空间的尺寸无法很好地缩放。