svm rbf人脸识别 yale_实操课——机器学习之人脸识别
實(shí)驗(yàn)原理:
支持向量機(jī)(support vector machine)是一種分類算法,通過尋求結(jié)構(gòu)化風(fēng)險(xiǎn)最小來(lái)提高學(xué)習(xí)機(jī)泛化能力,實(shí)現(xiàn)經(jīng)驗(yàn)風(fēng)險(xiǎn)和置信范圍的最小化,從而達(dá)到在統(tǒng)計(jì)樣本量較少的情況下,亦能獲得良好統(tǒng)計(jì)規(guī)律的目的。通俗來(lái)講,它是一種二類分類模型,其基本模型定義為特征空間上的間隔最大的線性分類器,即支持向量機(jī)的學(xué)習(xí)策略便是間隔最大化,最終可轉(zhuǎn)化為一個(gè)凸二次規(guī)劃問題的求解。
具體原理:
1. 在n維空間中找到一個(gè)分類超平面,將空間上的點(diǎn)分類。如下圖是線性分類的例子。
2. 一般而言,一個(gè)點(diǎn)距離超平面的遠(yuǎn)近可以表示為分類預(yù)測(cè)的確信或準(zhǔn)確程度。SVM就是要最大化這個(gè)間隔值。而在虛線上的點(diǎn)便叫做支持向量Supprot Verctor。
3. 實(shí)際中,我們會(huì)經(jīng)常遇到線性不可分的樣例,此時(shí),我們的常用做法是把樣例特征映射到高維空間中去(如下圖);
3. 線性不可分映射到高維空間,可能會(huì)導(dǎo)致維度大小高到可怕的(19維乃至無(wú)窮維的例子),導(dǎo)致計(jì)算復(fù)雜。核函數(shù)的價(jià)值在于它雖然也是講特征進(jìn)行從低維到高維的轉(zhuǎn)換,但核函數(shù)絕就絕在它事先在低維上進(jìn)行計(jì)算,而將實(shí)質(zhì)上的分類效果表現(xiàn)在了高維上,也就如上文所說(shuō)的避免了直接在高維空間中的復(fù)雜計(jì)算。
4.使用松弛變量處理數(shù)據(jù)噪音
sklearn中SVM的結(jié)構(gòu),及各個(gè)參數(shù)說(shuō)明如下
sklearn.svm.SVC :
view plain?copy
sklearn.svm.SVC(C=1.0,?kernel='rbf',?degree=3,?gamma='auto',?coef0=0.0,?shrinking=True,?probability=False,tol=0.001,?cache_size=200,?class_weight=None,?verbose=False,?max_iter=-1,?decision_function_shape=None,random_state=None)??
參數(shù)說(shuō)明:
view plain?copy
C:C-SVC的懲罰參數(shù)C?默認(rèn)值是1.0??
C越大,相當(dāng)于懲罰松弛變量,希望松弛變量接近0,即對(duì)誤分類的懲罰增大,趨向于對(duì)訓(xùn)練集全分對(duì)的情況,這樣對(duì)訓(xùn)練集測(cè)試時(shí)準(zhǔn)確率很高,但泛化能力弱。C值小,對(duì)誤分類的懲罰減小,允許容錯(cuò),將他們當(dāng)成噪聲點(diǎn),泛化能力較強(qiáng)。??
kernel :核函數(shù),默認(rèn)是rbf,可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’??
?? 0?–?線性:u'v ?
? 1 –?多項(xiàng)式:(gamma*u'*v + coef0)^degree ?
?? 2 – RBF函數(shù):exp(-gamma|u-v|^2)??
?? 3 –sigmoid:tanh(gamma*u'*v + coef0)??
degree :多項(xiàng)式poly函數(shù)的維度,默認(rèn)是3,選擇其他核函數(shù)時(shí)會(huì)被忽略。??
gamma :?‘rbf’,‘poly’?和‘sigmoid’的核函數(shù)參數(shù)。默認(rèn)是’auto’,則會(huì)選擇1/n_features ?
coef0?:核函數(shù)的常數(shù)項(xiàng)。對(duì)于‘poly’和?‘sigmoid’有用。??
probability :是否采用概率估計(jì)?.默認(rèn)為False ?
shrinking :是否采用shrinking heuristic方法,默認(rèn)為true??
tol :停止訓(xùn)練的誤差值大小,默認(rèn)為1e-3 ?
cache_size :核函數(shù)cache緩存大小,默認(rèn)為200??
class_weight :類別的權(quán)重,字典形式傳遞。設(shè)置第幾類的參數(shù)C為weight*C(C-SVC中的C)??
verbose :允許冗余輸出???
max_iter :最大迭代次數(shù)。-1為無(wú)限制。??
decision_function_shape :‘ovo’, ‘ovr’ or None, default=None3??
random_state :數(shù)據(jù)洗牌時(shí)的種子值,int值??
主要調(diào)節(jié)的參數(shù)有:C、kernel、degree、gamma、coef0。
系統(tǒng)環(huán)境
Linux Ubuntu 16.04
Python3.6
任務(wù)內(nèi)容
用SVM算法對(duì)fetch_lfw_people數(shù)據(jù)進(jìn)行人臉識(shí)別,并將預(yù)測(cè)結(jié)果可視化。
任務(wù)步驟
1.創(chuàng)建目錄并下載實(shí)驗(yàn)所需的數(shù)據(jù)。
view plain?copy
mkdir?-p?/home/zhangyu/scikit_learn_data/lfw_home??
cd?/home/zhangyu/scikit_learn_data/lfw_home??
wget?http://192.168.1.100:60000/allfiles/ma_learn/lfwfunneled.tgz??
wget?http://192.168.1.100:60000/allfiles/ma_learn/pairsDevTest.txt??
wget?http://192.168.1.100:60000/allfiles/ma_learn/pairsDevTrain.txt??
wget?http://192.168.1.100:60000/allfiles/ma_learn/pairs.txt??
tar?xzvf?lfwfunneled.tgz??
2.新建Python project ,名為python15.
在python15項(xiàng)目下,新建Python file,名為SVM
3.用SVM算法對(duì)fetch_lfw_people數(shù)據(jù)進(jìn)行人臉識(shí)別,并將預(yù)測(cè)結(jié)果可視化,完整代碼如下:
view plain?copy
from?__future__?import?print_function??
from?time?import?time??
import?logging??
import?matplotlib.pyplot?as?plt??
from?sklearn.model_selection?import?train_test_split??
from?sklearn.datasets?import?fetch_lfw_people??
from?sklearn.model_selection?import?GridSearchCV??
from?sklearn.metrics?import?classification_report??
from?sklearn.metrics?import?confusion_matrix??
from?sklearn.decomposition?import?PCA??
from?sklearn.svm?import?SVC??
#?Display?progress?logs?on?stdout??
logging.basicConfig(level=logging.INFO,?format='%(asctime)s?%(message)s')??
###############################################################################??
#?Download?the?data,?if?not?already?on?disk?and?load?it?as?numpy?arrays??
lfw_people?=?fetch_lfw_people(min_faces_per_person=70,?resize=0.4)??
#?introspect?the?images?arrays?to?find?the?shapes?(for?plotting)??
n_samples,?h,?w?=?lfw_people.images.shape??
#?for?machine?learning?we?use?the?2?data?directly?(as?relative?pixel??
#?positions?info?is?ignored?by?this?model)??
X?=?lfw_people.data??
n_features?=?X.shape[1]??
#?the?label?to?predict?is?the?id?of?the?person??
y?=?lfw_people.target??
target_names?=?lfw_people.target_names??
n_classes?=?target_names.shape[0]??
print("Total?dataset?size:")??
print("n_samples:?%d"?%?n_samples)??
print("n_features:?%d"?%?n_features)??
print("n_classes:?%d"?%?n_classes)??
###############################################################################??
#?Split?into?a?training?set?and?a?test?set?using?a?stratified?k?fold??
#?split?into?a?training?and?testing?set??
X_train,?X_test,?y_train,?y_test?=?train_test_split(??
????X,?y,?test_size=0.25)??
###############################################################################??
#?Compute?a?PCA?(eigenfaces)?on?the?face?dataset?(treated?as?unlabeled??
#?dataset):?unsupervised?feature?extraction?/?dimensionality?reduction??
n_components?=?150??
print("Extracting?the?top?%d?eigenfaces?from?%d?faces"??
??????%?(n_components,?X_train.shape[0]))??
t0?=?time()??
pca?=?PCA(svd_solver='randomized',n_components=n_components,?whiten=True).fit(X_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
eigenfaces?=?pca.components_.reshape((n_components,?h,?w))??
print("Projecting?the?input?data?on?the?eigenfaces?orthonormal?basis")??
t0?=?time()??
X_train_pca?=?pca.transform(X_train)??
X_test_pca?=?pca.transform(X_test)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
###############################################################################??
#?Train?a?SVM?classification?model??
print("Fitting?the?classifier?to?the?training?set")??
t0?=?time()??
param_grid?=?{'C':?[1e3,?5e3,?1e4,?5e4,?1e5],??
??????????????'gamma':?[0.0001,?0.0005,?0.001,?0.005,?0.01,?0.1],?}??
clf?=?GridSearchCV(SVC(kernel='rbf',?class_weight='balanced'),?param_grid)??
clf?=?clf.fit(X_train_pca,?y_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print("Best?estimator?found?by?grid?search:")??
print(clf.best_estimator_)??
###############################################################################??
#?Quantitative?evaluation?of?the?model?quality?on?the?test?set??
print("Predicting?people's?names?on?the?test?set")??
t0?=?time()??
y_pred?=?clf.predict(X_test_pca)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print(classification_report(y_test,?y_pred,?target_names=target_names))??
print(confusion_matrix(y_test,?y_pred,?labels=range(n_classes)))??
###############################################################################??
#?Qualitative?evaluation?of?the?predictions?using?matplotlib??
def?plot_gallery(images,?titles,?h,?w,?n_row=3,?n_col=4):??
????"""Helper?function?to?plot?a?gallery?of?portraits"""??
????plt.figure(figsize=(1.8?*?n_col,?2.4?*?n_row))??
????plt.subplots_adjust(bottom=0,?left=.01,?right=.99,?top=.90,?hspace=.35)??
????for?i?in?range(n_row?*?n_col):??
????????plt.subplot(n_row,?n_col,?i?+?1)??
????????plt.imshow(images[i].reshape((h,?w)),?cmap=plt.cm.gray)??
????????plt.title(titles[i],?size=12)??
????????plt.xticks(())??
????????plt.yticks(())??
#?plot?the?result?of?the?prediction?on?a?portion?of?the?test?set??
def?title(y_pred,?y_test,?target_names,?i):??
????pred_name?=?target_names[y_pred[i]].rsplit('?',?1)[-1]??
????true_name?=?target_names[y_test[i]].rsplit('?',?1)[-1]??
????return?'predicted:?%s\ntrue:??????%s'?%?(pred_name,?true_name)??
prediction_titles?=?[title(y_pred,?y_test,?target_names,?i)??
?????????????????????for?i?in?range(y_pred.shape[0])]??
plot_gallery(X_test,?prediction_titles,?h,?w)??
#?plot?the?gallery?of?the?most?significative?eigenfaces??
eigenface_titles?=?["eigenface?%d"?%?i?for?i?in?range(eigenfaces.shape[0])]??
plot_gallery(eigenfaces,?eigenface_titles,?h,?w)??
plt.show()??
4.對(duì)完整代碼進(jìn)行分部描述,用import導(dǎo)入實(shí)驗(yàn)所用到的包
view plain?copy
from?__future__?import?print_function??
from?time?import?time??
import?logging??
import?matplotlib.pyplot?as?plt??
from?sklearn.model_selection?import?train_test_split??
from?sklearn.datasets?import?fetch_lfw_people??
from?sklearn.model_selection?import?GridSearchCV??
from?sklearn.metrics?import?classification_report??
from?sklearn.metrics?import?confusion_matrix??
from?sklearn.decomposition?import?PCA??
from?sklearn.svm?import?SVC??
5.提取數(shù)據(jù)
view plain?copy
lfw_people?=?fetch_lfw_people(min_faces_per_person=70,?resize=0.4)??
#?introspect?the?images?arrays?to?find?the?shapes?(for?plotting)??
n_samples,?h,?w?=?lfw_people.images.shape??
#?for?machine?learning?we?use?the?2?data?directly?(as?relative?pixel??
#?positions?info?is?ignored?by?this?model)??
X?=?lfw_people.data??
n_features?=?X.shape[1]??
#?the?label?to?predict?is?the?id?of?the?person??
y?=?lfw_people.target??
target_names?=?lfw_people.target_names??
n_classes?=?target_names.shape[0]??
print("Total?dataset?size:")??
print("n_samples:?%d"?%?n_samples)??
print("n_features:?%d"?%?n_features)??
print("n_classes:?%d"?%?n_classes)??
運(yùn)行結(jié)果:
6.特征提取
view plain?copy
X_train,?X_test,?y_train,?y_test?=?train_test_split(??
????X,?y,?test_size=0.25)??
###############################################################################??
#?Compute?a?PCA?(eigenfaces)?on?the?face?dataset?(treated?as?unlabeled??
#?dataset):?unsupervised?feature?extraction?/?dimensionality?reduction??
n_components?=?150??
print("Extracting?the?top?%d?eigenfaces?from?%d?faces"??
??????%?(n_components,?X_train.shape[0]))??
t0?=?time()??
pca?=?PCA(svd_solver='randomized',n_components=n_components,?whiten=True).fit(X_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
eigenfaces?=?pca.components_.reshape((n_components,?h,?w))??
print("Projecting?the?input?data?on?the?eigenfaces?orthonormal?basis")??
t0?=?time()??
X_train_pca?=?pca.transform(X_train)??
X_test_pca?=?pca.transform(X_test)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
運(yùn)行結(jié)果:
7.建立SVM分類模型
view plain?copy
print("Fitting?the?classifier?to?the?training?set")??
t0?=?time()??
param_grid?=?{'C':?[1e3,?5e3,?1e4,?5e4,?1e5],??
??????????????'gamma':?[0.0001,?0.0005,?0.001,?0.005,?0.01,?0.1],?}??
clf?=?GridSearchCV(SVC(kernel='rbf',?class_weight='balanced'),?param_grid)??
clf?=?clf.fit(X_train_pca,?y_train)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print("Best?estimator?found?by?grid?search:")??
print(clf.best_estimator_)??
運(yùn)行結(jié)果:
8.模型評(píng)估
view plain?copy
print("Predicting?people's?names?on?the?test?set")??
t0?=?time()??
y_pred?=?clf.predict(X_test_pca)??
print("done?in?%0.3fs"?%?(time()?-?t0))??
print(classification_report(y_test,?y_pred,?target_names=target_names))??
print(confusion_matrix(y_test,?y_pred,?labels=range(n_classes)))??
運(yùn)行結(jié)果:
9.預(yù)測(cè)結(jié)果可視化
view plain?copy
def?plot_gallery(images,?titles,?h,?w,?n_row=3,?n_col=4):??
????"""Helper?function?to?plot?a?gallery?of?portraits"""??
????plt.figure(figsize=(1.8?*?n_col,?2.4?*?n_row))??
????plt.subplots_adjust(bottom=0,?left=.01,?right=.99,?top=.90,?hspace=.35)??
????for?i?in?range(n_row?*?n_col):??
????????plt.subplot(n_row,?n_col,?i?+?1)??
????????plt.imshow(images[i].reshape((h,?w)),?cmap=plt.cm.gray)??
????????plt.title(titles[i],?size=12)??
????????plt.xticks(())??
????????plt.yticks(())??
#?plot?the?result?of?the?prediction?on?a?portion?of?the?test?set??
def?title(y_pred,?y_test,?target_names,?i):??
????pred_name?=?target_names[y_pred[i]].rsplit('?',?1)[-1]??
????true_name?=?target_names[y_test[i]].rsplit('?',?1)[-1]??
????return?'predicted:?%s\ntrue:??????%s'?%?(pred_name,?true_name)??
prediction_titles?=?[title(y_pred,?y_test,?target_names,?i)??
?????????????????????for?i?in?range(y_pred.shape[0])]??
plot_gallery(X_test,?prediction_titles,?h,?w)??
#?plot?the?gallery?of?the?most?significative?eigenfaces??
eigenface_titles?=?["eigenface?%d"?%?i?for?i?in?range(eigenfaces.shape[0])]??
plot_gallery(eigenfaces,?eigenface_titles,?h,?w)??
plt.show()??
運(yùn)行結(jié)果:
eigenface:
總結(jié)
以上是生活随笔為你收集整理的svm rbf人脸识别 yale_实操课——机器学习之人脸识别的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 上海欢乐谷买了门票进去玩还要钱吗
- 下一篇: “即此是玄关”上一句是什么