python 读grid 数据_科学网—Python_机器学习_总结14:Grid search - 李军的博文
機器學(xué)習(xí)中存在兩類參數(shù):通過訓(xùn)練數(shù)據(jù)學(xué)習(xí)得到的參數(shù);---可認為是辨識得到的參數(shù),例如模型系數(shù);
在學(xué)習(xí)算法中單獨需要優(yōu)化的參數(shù)--超參、調(diào)優(yōu)參數(shù);---算法自身的系數(shù),例如決策樹的深度參數(shù);
Grid search:根據(jù)超參列表,窮舉搜索,找出最優(yōu)值;缺點計算量很大;改進辦法:randomized search;
例:支持向量機流水線的訓(xùn)練與調(diào)優(yōu)import?matplotlib.pyplot?as?plt
import?numpy?as?np
import?pandas?as?pd
from?sklearn.learning_curve?import?learning_curve
from?sklearn.preprocessing?import?StandardScaler
from?sklearn.decomposition?import?PCA
from?sklearn.linear_model?import?LogisticRegression
from?sklearn.pipeline?import?Pipeline
from?sklearn.learning_curve?import?validation_curve
from?sklearn.grid_search?import?GridSearchCV
from?sklearn.svm?import?SVC
###############################################################################
df?=?pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data',?header=None)
#將數(shù)據(jù)分成訓(xùn)練集和測試集
from?sklearn.preprocessing?import?LabelEncoder
X?=?df.loc[:,?2:].values
y?=?df.loc[:,?1].values
le?=?LabelEncoder()
y?=?le.fit_transform(y)
#print(le.transform(['M',?'B']))
#將數(shù)據(jù)分成訓(xùn)練集和測試集
from?sklearn.cross_validation?import?train_test_split
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.2,random_state=1)
###############################################################################
pipe_svc?=?Pipeline([('scl',?StandardScaler()),('clf',SVC(random_state=1))])
param_range?=?[0.0001,?0.001,?0.01,?0.1,?1.0,?10.0,?100.0,?1000.0]
param_grid?=?[{'clf__C':param_range,?'clf__kernel':['linear']},
{'clf__C':param_range,?'clf__gamma':param_range,'clf__kernel':['rbf']},]
gs?=?GridSearchCV(estimator=pipe_svc,
param_grid=param_grid,
scoring='accuracy',
cv=10,
n_jobs=-1)
gs?=?gs.fit(X_train,?y_train)
print(gs.best_score_)
print(gs.best_params_)
嵌套交叉驗證:如果在不同機器學(xué)習(xí)算法之間進行選擇,推薦使用---嵌套交叉驗證,而非單獨使用k折交叉驗證;
在嵌套交叉驗證的外圍循環(huán)中,將數(shù)據(jù)分為訓(xùn)練塊和測試塊;
在模型選擇的內(nèi)部循環(huán)中,基于訓(xùn)練塊,利用k折交叉驗證;
完成模型選擇后,使用測試塊驗證模型性能;
import?matplotlib.pyplot?as?plt
import?numpy?as?np
import?pandas?as?pd
from?sklearn.learning_curve?import?learning_curve
from?sklearn.preprocessing?import?StandardScaler
from?sklearn.decomposition?import?PCA
from?sklearn.linear_model?import?LogisticRegression
from?sklearn.pipeline?import?Pipeline
from?sklearn.learning_curve?import?validation_curve
from?sklearn.grid_search?import?GridSearchCV
from?sklearn.svm?import?SVC
###############################################################################
df?=?pd.read_csv('http://archive.ics.uci.edu/ml/machine-learning-databases/breast-cancer-wisconsin/wdbc.data',?header=None)
#將數(shù)據(jù)分成訓(xùn)練集和測試集
from?sklearn.preprocessing?import?LabelEncoder
X?=?df.loc[:,?2:].values
y?=?df.loc[:,?1].values
le?=?LabelEncoder()
y?=?le.fit_transform(y)
#print(le.transform(['M',?'B']))
#將數(shù)據(jù)分成訓(xùn)練集和測試集
from?sklearn.cross_validation?import?train_test_split
X_train,?X_test,?y_train,?y_test?=?train_test_split(X,?y,?test_size=0.2,random_state=1)
###############################################################################
pipe_svc?=?Pipeline([('scl',?StandardScaler()),('clf',SVC(random_state=1))])
param_range?=?[0.0001,?0.001,?0.01,?0.1,?1.0,?10.0,?100.0,?1000.0]
param_grid?=?[{'clf__C':param_range,?'clf__kernel':['linear']},
{'clf__C':param_range,?'clf__gamma':param_range,'clf__kernel':['rbf']},]
gs?=?GridSearchCV(estimator=pipe_svc,
param_grid=param_grid,
scoring='accuracy',
cv=10,
n_jobs=-1)
scores?=?cross_val_score(gs,?X,y,?scoring='accuracy',?cv=5)
print('CV?accuracy?:?%.3f?+/-?%.3f'?%?(np.mean(scores),?np.std(scores))
#參考《Python 機器學(xué)習(xí)》,作者:Sebastian Raschaka, 機械工業(yè)出版社;
轉(zhuǎn)載本文請聯(lián)系原作者獲取授權(quán),同時請注明本文來自李軍科學(xué)網(wǎng)博客。
鏈接地址:http://blog.sciencenet.cn/blog-3377553-1137308.html
上一篇:Python_機器學(xué)習(xí)_總結(jié)13:learning curve與validation curve
下一篇:Python_機器學(xué)習(xí)_總結(jié)15:模型性能評價指標(biāo)
總結(jié)
以上是生活随笔為你收集整理的python 读grid 数据_科学网—Python_机器学习_总结14:Grid search - 李军的博文的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python web 框架例子_最快的
- 下一篇: spark入门_Spark技术入门——配