當(dāng)前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

机器学习sklearn19.0聚类算法——Kmeans算法

發(fā)布時(shí)間：2025/3/15 编程问答 20 豆豆

生活随笔收集整理的這篇文章主要介紹了机器学习sklearn19.0聚类算法——Kmeans算法小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

一、關(guān)于聚類及相似度、距離的知識點(diǎn)

二、k-means算法思想與流程

三、sklearn中對于kmeans算法的參數(shù)

四、代碼示例以及應(yīng)用的知識點(diǎn)簡介

（1）make_blobs：聚類數(shù)據(jù)生成器

sklearn.datasets.make_blobs(n_samples=100, n_features=2,centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None)[source]

返回值為：

（2）np.vstack方法作用——堆疊數(shù)組

詳細(xì)介紹參照博客鏈接：http://blog.csdn.net/csdn15698845876/article/details/73380803

[python]?view plaincopy

#!/usr/bin/env?python??

#?-*-?coding:utf-8?-*-??

#?Author:ZhengzhengLiu??

#k-means聚類算法??

import?numpy?as?np??

import?pandas?as?pd??

import?matplotlib?as?mpl??

import?matplotlib.pyplot?as?plt??

import?matplotlib.colors??

import?sklearn.datasets?as?ds??

from?sklearn.cluster?import?KMeans??????#引入kmeans??

#解決中文顯示問題??

mpl.rcParams['font.sans-serif']?=?[u'SimHei']??

mpl.rcParams['axes.unicode_minus']?=?False??

#產(chǎn)生模擬數(shù)據(jù)??

N?=?1500??

centers?=?4??

#make_blobs:聚類數(shù)據(jù)生成器??

data,y?=?ds.make_blobs(N,n_features=2,centers=centers,random_state=28)??

data2,y2?=?ds.make_blobs(N,n_features=2,centers=centers,random_state=28)??

data3?=?np.vstack((data[y==0][:200],data[y==1][:100],data[y==2][:10],data[y==3][:50]))??

y3?=?np.array([0]*200+[1]*100+[2]*10+[3]*50)??

#模型的構(gòu)建??

km?=?KMeans(n_clusters=centers,random_state=28)??

km.fit(data,y)??

y_hat?=?km.predict(data)??

print("所有樣本距離聚簇中心點(diǎn)的總距離和:",km.inertia_)??

print("距離聚簇中心點(diǎn)的平均距離:",(km.inertia_/N))??

print("聚簇中心點(diǎn):",km.cluster_centers_)??

y_hat2?=?km.fit_predict(data2)??

y_hat3?=?km.fit_predict(data3)??

def?expandBorder(a,?b):??

????d?=?(b?-?a)?*?0.1??

????return?a-d,?b+d??

#畫圖??

cm?=?mpl.colors.ListedColormap(list("rgbmyc"))??

plt.figure(figsize=(15,9),facecolor="w")??

plt.subplot(241)??

plt.scatter(data[:,0],data[:,1],c=y,s=30,cmap=cm,edgecolors="none")??

x1_min,x2_min?=?np.min(data,axis=0)??

x1_max,x2_max?=?np.max(data,axis=0)??

x1_min,x1_max?=?expandBorder(x1_min,x1_max)??

x2_min,x2_max?=?expandBorder(x2_min,x2_max)??

plt.xlim((x1_min,x1_max))??

plt.ylim((x2_min,x2_max))??

plt.title("原始數(shù)據(jù)")??

plt.grid(True)??

plt.subplot(242)??

plt.scatter(data[:,?0],?data[:,?1],?c=y_hat,?s=30,?cmap=cm,?edgecolors='none')??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'K-Means算法聚類結(jié)果')??

plt.grid(True)??

m?=?np.array(((1,?1),?(0.5,?5)))??

data_r?=?data.dot(m)??

y_r_hat?=?km.fit_predict(data_r)??

plt.subplot(243)??

plt.scatter(data_r[:,?0],?data_r[:,?1],?c=y,?s=30,?cmap=cm,?edgecolors='none')??

x1_min,?x2_min?=?np.min(data_r,?axis=0)??

x1_max,?x2_max?=?np.max(data_r,?axis=0)??

x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??

x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'數(shù)據(jù)旋轉(zhuǎn)后原始數(shù)據(jù)圖')??

plt.grid(True)??

plt.subplot(244)??

plt.scatter(data_r[:,?0],?data_r[:,?1],?c=y_r_hat,?s=30,?cmap=cm,?edgecolors='none')??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'數(shù)據(jù)旋轉(zhuǎn)后預(yù)測圖')??

plt.grid(True)??

plt.subplot(245)??

plt.scatter(data2[:,?0],?data2[:,?1],?c=y2,?s=30,?cmap=cm,?edgecolors='none')??

x1_min,?x2_min?=?np.min(data2,?axis=0)??

x1_max,?x2_max?=?np.max(data2,?axis=0)??

x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??

x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'不同方差的原始數(shù)據(jù)')??

plt.grid(True)??

plt.subplot(246)??

plt.scatter(data2[:,?0],?data2[:,?1],?c=y_hat2,?s=30,?cmap=cm,?edgecolors='none')??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'不同方差簇?cái)?shù)據(jù)的K-Means算法聚類結(jié)果')??

plt.grid(True)??

plt.subplot(247)??

plt.scatter(data3[:,?0],?data3[:,?1],?c=y3,?s=30,?cmap=cm,?edgecolors='none')??

x1_min,?x2_min?=?np.min(data3,?axis=0)??

x1_max,?x2_max?=?np.max(data3,?axis=0)??

x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??

x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'不同簇樣本數(shù)量原始數(shù)據(jù)圖')??

plt.grid(True)??

plt.subplot(248)??

plt.scatter(data3[:,?0],?data3[:,?1],?c=y_hat3,?s=30,?cmap=cm,?edgecolors='none')??

plt.xlim((x1_min,?x1_max))??

plt.ylim((x2_min,?x2_max))??

plt.title(u'不同簇樣本數(shù)量的K-Means算法聚類結(jié)果')??

plt.grid(True)??

plt.tight_layout(2,?rect=(0,?0,?1,?0.97))??

plt.suptitle(u'數(shù)據(jù)分布對KMeans聚類的影響',?fontsize=18)??

plt.savefig("k-means聚類算法.png")??

plt.show()??

#運(yùn)行結(jié)果：??

所有樣本距離聚簇中心點(diǎn)的總距離和:?2592.9990199??

距離聚簇中心點(diǎn)的平均距離:?1.72866601327??

聚簇中心點(diǎn):?[[?-7.44342199e+00??-2.00152176e+00]??

?[??5.80338598e+00???2.75272962e-03]??

?[?-6.36176159e+00???6.94997331e+00]??

?[??4.34372837e+00???1.33977807e+00]]??

代碼中用到的知識點(diǎn)：

[python]?view plaincopy

#!/usr/bin/env?python??

#?-*-?coding:utf-8?-*-??

#?Author:ZhengzhengLiu??

#kmean與mini?batch?kmeans?算法的比較??

import?time??

import?numpy?as?np??

import?matplotlib?as?mpl??

import?matplotlib.pyplot?as?plt??

import?matplotlib.colors??

from?sklearn.cluster?import?KMeans,MiniBatchKMeans??

from?sklearn.datasets.samples_generator?import?make_blobs??

from?sklearn.metrics.pairwise?import?pairwise_distances_argmin??

#解決中文顯示問題??

mpl.rcParams['font.sans-serif']?=?[u'SimHei']??

mpl.rcParams['axes.unicode_minus']?=?False??

#初始化三個(gè)中心??

centers?=?[[1,1],[-1,-1],[1,-1]]??

clusters?=?len(centers)?????#聚類數(shù)目為3??

#產(chǎn)生3000組二維數(shù)據(jù)樣本，三個(gè)中心點(diǎn)，標(biāo)準(zhǔn)差是0.7??

X,Y?=?make_blobs(n_samples=300,centers=centers,cluster_std=0.7,random_state=28)??

#構(gòu)建kmeans算法??

k_means?=??KMeans(init="k-means++",n_clusters=clusters,random_state=28)??

t0?=?time.time()??

k_means.fit(X)??????#模型訓(xùn)練??

km_batch?=?time.time()-t0???????#使用kmeans訓(xùn)練數(shù)據(jù)消耗的時(shí)間??

print("K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%km_batch)??

#構(gòu)建mini?batch?kmeans算法??

batch_size?=?100????????#采樣集的大小??

mbk?=?MiniBatchKMeans(init="k-means++",n_clusters=clusters,batch_size=batch_size,random_state=28)??

t0?=?time.time()??

mbk.fit(X)??

mbk_batch?=?time.time()-t0??

print("Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%mbk_batch)??

#預(yù)測結(jié)果??

km_y_hat?=?k_means.predict(X)??

mbk_y_hat?=?mbk.predict(X)??

#獲取聚類中心點(diǎn)并對其排序??

k_means_cluster_center?=?k_means.cluster_centers_??

mbk_cluster_center?=?mbk.cluster_centers_??

print("K-Means算法聚類中心點(diǎn):\n?center=",k_means_cluster_center)??

print("Mini?Batch?K-Means算法聚類中心點(diǎn):\n?center=",mbk_cluster_center)??

order?=?pairwise_distances_argmin(k_means_cluster_center,mbk_cluster_center)??

#畫圖??

plt.figure(figsize=(12,6),facecolor="w")??

plt.subplots_adjust(left=0.05,right=0.95,bottom=0.05,top=0.9)??

cm?=?mpl.colors.ListedColormap(['#FFC2CC',?'#C2FFCC',?'#CCC2FF'])??

cm2?=?mpl.colors.ListedColormap(['#FF0000',?'#00FF00',?'#0000FF'])??

#子圖1——原始數(shù)據(jù)??

plt.subplot(221)??

plt.scatter(X[:,0],X[:,1],c=Y,s=6,cmap=cm,edgecolors="none")??

plt.title(u"原始數(shù)據(jù)分布圖")??

plt.xticks(())??

plt.yticks(())??

plt.grid(True)??

#子圖2：K-Means算法聚類結(jié)果圖??

plt.subplot(222)??

plt.scatter(X[:,0],?X[:,1],?c=km_y_hat,?s=6,?cmap=cm,edgecolors='none')??

plt.scatter(k_means_cluster_center[:,0],?k_means_cluster_center[:,1],c=range(clusters),s=60,cmap=cm2,edgecolors='none')??

plt.title(u'K-Means算法聚類結(jié)果圖')??

plt.xticks(())??

plt.yticks(())??

plt.text(-3.8,?3,??'train?time:?%.2fms'?%?(km_batch*1000))??

plt.grid(True)??

#子圖三Mini?Batch?K-Means算法聚類結(jié)果圖??

plt.subplot(223)??

plt.scatter(X[:,0],?X[:,1],?c=mbk_y_hat,?s=6,?cmap=cm,edgecolors='none')??

plt.scatter(mbk_cluster_center[:,0],?mbk_cluster_center[:,1],c=range(clusters),s=60,cmap=cm2,edgecolors='none')??

plt.title(u'Mini?Batch?K-Means算法聚類結(jié)果圖')??

plt.xticks(())??

plt.yticks(())??

plt.text(-3.8,?3,??'train?time:?%.2fms'?%?(mbk_batch*1000))??

plt.grid(True)??

plt.savefig("kmean與mini?batch?kmeans?算法的比較.png")??

plt.show()??

#運(yùn)行結(jié)果：??

K-Means算法模型訓(xùn)練消耗時(shí)間:0.2260s??

Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:0.0230s??

K-Means算法聚類中心點(diǎn):??

?center=?[[?0.96091862??1.13741775]??

?[?1.1979318??-1.02783007]??

?[-0.98673669?-1.09398768]]??

Mini?Batch?K-Means算法聚類中心點(diǎn):??

?center=?[[?1.34304199?-1.01641075]??

?[?0.83760683??1.01229021]??

?[-0.92702179?-1.08205992]]??

五、聚類算法的衡量指標(biāo)

[python]?view plaincopy

#!/usr/bin/env?python??

#?-*-?coding:utf-8?-*-??

#?Author:ZhengzhengLiu??

#聚類算法評估??

import?time??

import?numpy?as?np??

import?matplotlib?as?mpl??

import?matplotlib.pyplot?as?plt??

import?matplotlib.colors??

from?sklearn.cluster?import?KMeans,MiniBatchKMeans??

from?sklearn?import?metrics??

from?sklearn.metrics.pairwise?import?pairwise_distances_argmin??

from?sklearn.datasets.samples_generator?import?make_blobs??

#解決中文顯示問題??

mpl.rcParams['font.sans-serif']?=?[u'SimHei']??

mpl.rcParams['axes.unicode_minus']?=?False??

#初始化三個(gè)中心??

centers?=?[[1,1],[-1,-1],[1,-1]]??

clusters?=?len(centers)?????#聚類數(shù)目為3??

#產(chǎn)生3000組二維數(shù)據(jù)樣本，三個(gè)中心點(diǎn)，標(biāo)準(zhǔn)差是0.7??

X,Y?=?make_blobs(n_samples=300,centers=centers,cluster_std=0.7,random_state=28)??

#構(gòu)建kmeans算法??

k_means?=??KMeans(init="k-means++",n_clusters=clusters,random_state=28)??

t0?=?time.time()??

k_means.fit(X)??????#模型訓(xùn)練??

km_batch?=?time.time()-t0???????#使用kmeans訓(xùn)練數(shù)據(jù)消耗的時(shí)間??

print("K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%km_batch)??

#構(gòu)建mini?batch?kmeans算法??

batch_size?=?100????????#采樣集的大小??

mbk?=?MiniBatchKMeans(init="k-means++",n_clusters=clusters,batch_size=batch_size,random_state=28)??

t0?=?time.time()??

mbk.fit(X)??

mbk_batch?=?time.time()-t0??

print("Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%mbk_batch)??

km_y_hat?=?k_means.labels_??

mbkm_y_hat?=?mbk.labels_??

k_means_cluster_centers?=?k_means.cluster_centers_??

mbk_means_cluster_centers?=?mbk.cluster_centers_??

print?("K-Means算法聚類中心點(diǎn):\ncenter=",?k_means_cluster_centers)??

print?("Mini?Batch?K-Means算法聚類中心點(diǎn):\ncenter=",?mbk_means_cluster_centers)??

order?=?pairwise_distances_argmin(k_means_cluster_centers,??

??????????????????????????????????mbk_means_cluster_centers)??

#效果評估??

###?效果評估??

score_funcs?=?[??

????metrics.adjusted_rand_score,????#ARI（調(diào)整蘭德指數(shù)）??

????metrics.v_measure_score,????????#均一性與完整性的加權(quán)平均??

????metrics.adjusted_mutual_info_score,?#AMI（調(diào)整互信息）??

????metrics.mutual_info_score,??????#互信息??

]??

##?2.?迭代對每個(gè)評估函數(shù)進(jìn)行評估操作??

for?score_func?in?score_funcs:??

????t0?=?time.time()??

????km_scores?=?score_func(Y,?km_y_hat)??

????print("K-Means算法:%s評估函數(shù)計(jì)算結(jié)果值:%.5f；計(jì)算消耗時(shí)間:%0.3fs"?%?(score_func.__name__,?km_scores,?time.time()?-?t0))??

????t0?=?time.time()??

????mbkm_scores?=?score_func(Y,?mbkm_y_hat)??

????print("Mini?Batch?K-Means算法:%s評估函數(shù)計(jì)算結(jié)果值:%.5f；計(jì)算消耗時(shí)間:%0.3fs\n"?%?(score_func.__name__,?mbkm_scores,?time.time()?-?t0))??

#運(yùn)行結(jié)果：??

K-Means算法模型訓(xùn)練消耗時(shí)間:0.6350s??

Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:0.0900s??

K-Means算法聚類中心點(diǎn):??

center=?[[?0.96091862??1.13741775]??

?[?1.1979318??-1.02783007]??

?[-0.98673669?-1.09398768]]??

Mini?Batch?K-Means算法聚類中心點(diǎn):??

center=?[[?1.34304199?-1.01641075]??

?[?0.83760683??1.01229021]??

?[-0.92702179?-1.08205992]]??

K-Means算法:adjusted_rand_score評估函數(shù)計(jì)算結(jié)果值:0.72566；計(jì)算消耗時(shí)間:0.071s??

Mini?Batch?K-Means算法:adjusted_rand_score評估函數(shù)計(jì)算結(jié)果值:0.69544；計(jì)算消耗時(shí)間:0.001s??

K-Means算法:v_measure_score評估函數(shù)計(jì)算結(jié)果值:0.67529；計(jì)算消耗時(shí)間:0.004s??

Mini?Batch?K-Means算法:v_measure_score評估函數(shù)計(jì)算結(jié)果值:0.65055；計(jì)算消耗時(shí)間:0.004s??

K-Means算法:adjusted_mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.67263；計(jì)算消耗時(shí)間:0.006s??

Mini?Batch?K-Means算法:adjusted_mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.64731；計(jì)算消耗時(shí)間:0.005s??

K-Means算法:mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.74116；計(jì)算消耗時(shí)間:0.002s??

Mini?Batch?K-Means算法:mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.71351；計(jì)算消耗時(shí)間:0.001s??

轉(zhuǎn)載于:https://www.cnblogs.com/mfryf/p/9007524.html

總結(jié)

以上是生活随笔為你收集整理的机器学习sklearn19.0聚类算法——Kmeans算法的全部內(nèi)容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： gcc: internal compil
下一篇： AttributeError: ‘Req

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频 在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操

编程问答

机器学习sklearn19.0聚类算法——Kmeans算法

總結(jié)

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操