生活随笔
收集整理的這篇文章主要介紹了
机器学习sklearn19.0聚类算法——Kmeans算法
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
一、關(guān)于聚類及相似度、距離的知識點(diǎn)
?
二、k-means算法思想與流程
三、sklearn中對于kmeans算法的參數(shù)
四、代碼示例以及應(yīng)用的知識點(diǎn)簡介
(1)make_blobs:聚類數(shù)據(jù)生成器
?
sklearn.datasets.make_blobs(n_samples=100, n_features=2,centers=3, cluster_std=1.0, center_box=(-10.0, 10.0), shuffle=True, random_state=None)[source]
?
返回值為:
(2)np.vstack方法作用——堆疊數(shù)組
詳細(xì)介紹參照博客鏈接:http://blog.csdn.net/csdn15698845876/article/details/73380803
?
?
[python]?view plaincopy
????import?numpy?as?np??import?pandas?as?pd??import?matplotlib?as?mpl??import?matplotlib.pyplot?as?plt??import?matplotlib.colors??import?sklearn.datasets?as?ds??from?sklearn.cluster?import?KMeans????????mpl.rcParams['font.sans-serif']?=?[u'SimHei']??mpl.rcParams['axes.unicode_minus']?=?False????N?=?1500??centers?=?4??data,y?=?ds.make_blobs(N,n_features=2,centers=centers,random_state=28)????data2,y2?=?ds.make_blobs(N,n_features=2,centers=centers,random_state=28)??data3?=?np.vstack((data[y==0][:200],data[y==1][:100],data[y==2][:10],data[y==3][:50]))??y3?=?np.array([0]*200+[1]*100+[2]*10+[3]*50)????km?=?KMeans(n_clusters=centers,random_state=28)??km.fit(data,y)??y_hat?=?km.predict(data)??print("所有樣本距離聚簇中心點(diǎn)的總距離和:",km.inertia_)??print("距離聚簇中心點(diǎn)的平均距離:",(km.inertia_/N))??print("聚簇中心點(diǎn):",km.cluster_centers_)????y_hat2?=?km.fit_predict(data2)??y_hat3?=?km.fit_predict(data3)????def?expandBorder(a,?b):??????d?=?(b?-?a)?*?0.1??????return?a-d,?b+d????cm?=?mpl.colors.ListedColormap(list("rgbmyc"))??plt.figure(figsize=(15,9),facecolor="w")??plt.subplot(241)??plt.scatter(data[:,0],data[:,1],c=y,s=30,cmap=cm,edgecolors="none")????x1_min,x2_min?=?np.min(data,axis=0)??x1_max,x2_max?=?np.max(data,axis=0)??x1_min,x1_max?=?expandBorder(x1_min,x1_max)??x2_min,x2_max?=?expandBorder(x2_min,x2_max)??plt.xlim((x1_min,x1_max))??plt.ylim((x2_min,x2_max))??plt.title("原始數(shù)據(jù)")??plt.grid(True)????plt.subplot(242)??plt.scatter(data[:,?0],?data[:,?1],?c=y_hat,?s=30,?cmap=cm,?edgecolors='none')??plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'K-Means算法聚類結(jié)果')??plt.grid(True)????m?=?np.array(((1,?1),?(0.5,?5)))??data_r?=?data.dot(m)??y_r_hat?=?km.fit_predict(data_r)??plt.subplot(243)??plt.scatter(data_r[:,?0],?data_r[:,?1],?c=y,?s=30,?cmap=cm,?edgecolors='none')????x1_min,?x2_min?=?np.min(data_r,?axis=0)??x1_max,?x2_max?=?np.max(data_r,?axis=0)??x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)????plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'數(shù)據(jù)旋轉(zhuǎn)后原始數(shù)據(jù)圖')??plt.grid(True)????plt.subplot(244)??plt.scatter(data_r[:,?0],?data_r[:,?1],?c=y_r_hat,?s=30,?cmap=cm,?edgecolors='none')??plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'數(shù)據(jù)旋轉(zhuǎn)后預(yù)測圖')??plt.grid(True)????plt.subplot(245)??plt.scatter(data2[:,?0],?data2[:,?1],?c=y2,?s=30,?cmap=cm,?edgecolors='none')??x1_min,?x2_min?=?np.min(data2,?axis=0)??x1_max,?x2_max?=?np.max(data2,?axis=0)??x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'不同方差的原始數(shù)據(jù)')??plt.grid(True)????plt.subplot(246)??plt.scatter(data2[:,?0],?data2[:,?1],?c=y_hat2,?s=30,?cmap=cm,?edgecolors='none')??plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'不同方差簇?cái)?shù)據(jù)的K-Means算法聚類結(jié)果')??plt.grid(True)????plt.subplot(247)??plt.scatter(data3[:,?0],?data3[:,?1],?c=y3,?s=30,?cmap=cm,?edgecolors='none')??x1_min,?x2_min?=?np.min(data3,?axis=0)??x1_max,?x2_max?=?np.max(data3,?axis=0)??x1_min,?x1_max?=?expandBorder(x1_min,?x1_max)??x2_min,?x2_max?=?expandBorder(x2_min,?x2_max)??plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'不同簇樣本數(shù)量原始數(shù)據(jù)圖')??plt.grid(True)????plt.subplot(248)??plt.scatter(data3[:,?0],?data3[:,?1],?c=y_hat3,?s=30,?cmap=cm,?edgecolors='none')??plt.xlim((x1_min,?x1_max))??plt.ylim((x2_min,?x2_max))??plt.title(u'不同簇樣本數(shù)量的K-Means算法聚類結(jié)果')??plt.grid(True)????plt.tight_layout(2,?rect=(0,?0,?1,?0.97))??plt.suptitle(u'數(shù)據(jù)分布對KMeans聚類的影響',?fontsize=18)??plt.savefig("k-means聚類算法.png")??plt.show()????所有樣本距離聚簇中心點(diǎn)的總距離和:?2592.9990199??距離聚簇中心點(diǎn)的平均距離:?1.72866601327??聚簇中心點(diǎn):?[[?-7.44342199e+00??-2.00152176e+00]???[??5.80338598e+00???2.75272962e-03]???[?-6.36176159e+00???6.94997331e+00]???[??4.34372837e+00???1.33977807e+00]]??
?
代碼中用到的知識點(diǎn):
?
?
[python]?view plaincopy
????import?time??import?numpy?as?np??import?matplotlib?as?mpl??import?matplotlib.pyplot?as?plt??import?matplotlib.colors??from?sklearn.cluster?import?KMeans,MiniBatchKMeans??from?sklearn.datasets.samples_generator?import?make_blobs??from?sklearn.metrics.pairwise?import?pairwise_distances_argmin????mpl.rcParams['font.sans-serif']?=?[u'SimHei']??mpl.rcParams['axes.unicode_minus']?=?False????centers?=?[[1,1],[-1,-1],[1,-1]]??clusters?=?len(centers)?????X,Y?=?make_blobs(n_samples=300,centers=centers,cluster_std=0.7,random_state=28)????k_means?=??KMeans(init="k-means++",n_clusters=clusters,random_state=28)??t0?=?time.time()??k_means.fit(X)??????km_batch?=?time.time()-t0???????print("K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%km_batch)????batch_size?=?100????????mbk?=?MiniBatchKMeans(init="k-means++",n_clusters=clusters,batch_size=batch_size,random_state=28)??t0?=?time.time()??mbk.fit(X)??mbk_batch?=?time.time()-t0??print("Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%mbk_batch)????km_y_hat?=?k_means.predict(X)??mbk_y_hat?=?mbk.predict(X)????k_means_cluster_center?=?k_means.cluster_centers_??mbk_cluster_center?=?mbk.cluster_centers_??print("K-Means算法聚類中心點(diǎn):\n?center=",k_means_cluster_center)??print("Mini?Batch?K-Means算法聚類中心點(diǎn):\n?center=",mbk_cluster_center)??order?=?pairwise_distances_argmin(k_means_cluster_center,mbk_cluster_center)????plt.figure(figsize=(12,6),facecolor="w")??plt.subplots_adjust(left=0.05,right=0.95,bottom=0.05,top=0.9)??cm?=?mpl.colors.ListedColormap(['#FFC2CC',?'#C2FFCC',?'#CCC2FF'])??cm2?=?mpl.colors.ListedColormap(['#FF0000',?'#00FF00',?'#0000FF'])????plt.subplot(221)??plt.scatter(X[:,0],X[:,1],c=Y,s=6,cmap=cm,edgecolors="none")??plt.title(u"原始數(shù)據(jù)分布圖")??plt.xticks(())??plt.yticks(())??plt.grid(True)????plt.subplot(222)??plt.scatter(X[:,0],?X[:,1],?c=km_y_hat,?s=6,?cmap=cm,edgecolors='none')??plt.scatter(k_means_cluster_center[:,0],?k_means_cluster_center[:,1],c=range(clusters),s=60,cmap=cm2,edgecolors='none')??plt.title(u'K-Means算法聚類結(jié)果圖')??plt.xticks(())??plt.yticks(())??plt.text(-3.8,?3,??'train?time:?%.2fms'?%?(km_batch*1000))??plt.grid(True)????plt.subplot(223)??plt.scatter(X[:,0],?X[:,1],?c=mbk_y_hat,?s=6,?cmap=cm,edgecolors='none')??plt.scatter(mbk_cluster_center[:,0],?mbk_cluster_center[:,1],c=range(clusters),s=60,cmap=cm2,edgecolors='none')??plt.title(u'Mini?Batch?K-Means算法聚類結(jié)果圖')??plt.xticks(())??plt.yticks(())??plt.text(-3.8,?3,??'train?time:?%.2fms'?%?(mbk_batch*1000))??plt.grid(True)??plt.savefig("kmean與mini?batch?kmeans?算法的比較.png")??plt.show()????K-Means算法模型訓(xùn)練消耗時(shí)間:0.2260s??Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:0.0230s??K-Means算法聚類中心點(diǎn):???center=?[[?0.96091862??1.13741775]???[?1.1979318??-1.02783007]???[-0.98673669?-1.09398768]]??Mini?Batch?K-Means算法聚類中心點(diǎn):???center=?[[?1.34304199?-1.01641075]???[?0.83760683??1.01229021]???[-0.92702179?-1.08205992]]??
?
?
五、聚類算法的衡量指標(biāo)
?
[python]?view plaincopy
????import?time??import?numpy?as?np??import?matplotlib?as?mpl??import?matplotlib.pyplot?as?plt??import?matplotlib.colors??from?sklearn.cluster?import?KMeans,MiniBatchKMeans??from?sklearn?import?metrics??from?sklearn.metrics.pairwise?import?pairwise_distances_argmin??from?sklearn.datasets.samples_generator?import?make_blobs????mpl.rcParams['font.sans-serif']?=?[u'SimHei']??mpl.rcParams['axes.unicode_minus']?=?False????centers?=?[[1,1],[-1,-1],[1,-1]]??clusters?=?len(centers)?????X,Y?=?make_blobs(n_samples=300,centers=centers,cluster_std=0.7,random_state=28)????k_means?=??KMeans(init="k-means++",n_clusters=clusters,random_state=28)??t0?=?time.time()??k_means.fit(X)??????km_batch?=?time.time()-t0???????print("K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%km_batch)????batch_size?=?100????????mbk?=?MiniBatchKMeans(init="k-means++",n_clusters=clusters,batch_size=batch_size,random_state=28)??t0?=?time.time()??mbk.fit(X)??mbk_batch?=?time.time()-t0??print("Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:%.4fs"%mbk_batch)????km_y_hat?=?k_means.labels_??mbkm_y_hat?=?mbk.labels_????k_means_cluster_centers?=?k_means.cluster_centers_??mbk_means_cluster_centers?=?mbk.cluster_centers_??print?("K-Means算法聚類中心點(diǎn):\ncenter=",?k_means_cluster_centers)??print?("Mini?Batch?K-Means算法聚類中心點(diǎn):\ncenter=",?mbk_means_cluster_centers)??order?=?pairwise_distances_argmin(k_means_cluster_centers,????????????????????????????????????mbk_means_cluster_centers)????score_funcs?=?[??????metrics.adjusted_rand_score,????????metrics.v_measure_score,????????????metrics.adjusted_mutual_info_score,?????metrics.mutual_info_score,??????]????for?score_func?in?score_funcs:??????t0?=?time.time()??????km_scores?=?score_func(Y,?km_y_hat)??????print("K-Means算法:%s評估函數(shù)計(jì)算結(jié)果值:%.5f;計(jì)算消耗時(shí)間:%0.3fs"?%?(score_func.__name__,?km_scores,?time.time()?-?t0))????????t0?=?time.time()??????mbkm_scores?=?score_func(Y,?mbkm_y_hat)??????print("Mini?Batch?K-Means算法:%s評估函數(shù)計(jì)算結(jié)果值:%.5f;計(jì)算消耗時(shí)間:%0.3fs\n"?%?(score_func.__name__,?mbkm_scores,?time.time()?-?t0))????K-Means算法模型訓(xùn)練消耗時(shí)間:0.6350s??Mini?Batch?K-Means算法模型訓(xùn)練消耗時(shí)間:0.0900s??K-Means算法聚類中心點(diǎn):??center=?[[?0.96091862??1.13741775]???[?1.1979318??-1.02783007]???[-0.98673669?-1.09398768]]??Mini?Batch?K-Means算法聚類中心點(diǎn):??center=?[[?1.34304199?-1.01641075]???[?0.83760683??1.01229021]???[-0.92702179?-1.08205992]]??K-Means算法:adjusted_rand_score評估函數(shù)計(jì)算結(jié)果值:0.72566;計(jì)算消耗時(shí)間:0.071s??Mini?Batch?K-Means算法:adjusted_rand_score評估函數(shù)計(jì)算結(jié)果值:0.69544;計(jì)算消耗時(shí)間:0.001s????K-Means算法:v_measure_score評估函數(shù)計(jì)算結(jié)果值:0.67529;計(jì)算消耗時(shí)間:0.004s??Mini?Batch?K-Means算法:v_measure_score評估函數(shù)計(jì)算結(jié)果值:0.65055;計(jì)算消耗時(shí)間:0.004s????K-Means算法:adjusted_mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.67263;計(jì)算消耗時(shí)間:0.006s??Mini?Batch?K-Means算法:adjusted_mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.64731;計(jì)算消耗時(shí)間:0.005s????K-Means算法:mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.74116;計(jì)算消耗時(shí)間:0.002s??Mini?Batch?K-Means算法:mutual_info_score評估函數(shù)計(jì)算結(jié)果值:0.71351;計(jì)算消耗時(shí)間:0.001s??
轉(zhuǎn)載于:https://www.cnblogs.com/mfryf/p/9007524.html
總結(jié)
以上是生活随笔為你收集整理的机器学习sklearn19.0聚类算法——Kmeans算法的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
如果覺得生活随笔網(wǎng)站內(nèi)容還不錯(cuò),歡迎將生活随笔推薦給好友。