當(dāng)前位置：首頁(yè) > 编程语言 > python >内容正文

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二)

發(fā)布時(shí)間：2025/3/21 python 18 豆豆

生活随笔收集整理的這篇文章主要介紹了 ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二) 小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(二)

2. xgboost參數(shù)/XGBoost?Parameters

一般參數(shù)/General Parameters

Booster參數(shù)/Booster Parameters

學(xué)習(xí)任務(wù)參數(shù)/Learning Task Parameters

???????

原文題目：《Complete Guide to Parameter Tuning in XGBoost with codes in Python》
原文地址：https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
所有權(quán)為原文所有，本文只負(fù)責(zé)翻譯。

相關(guān)文章
ML之XGBoost：XGBoost算法模型(相關(guān)配圖)的簡(jiǎn)介(XGBoost并行處理)、關(guān)鍵思路、代碼實(shí)現(xiàn)(目標(biāo)函數(shù)/評(píng)價(jià)函數(shù))、安裝、使用方法、案例應(yīng)用之詳細(xì)攻略
ML之XGBoost：Kaggle神器XGBoost算法模型的簡(jiǎn)介(資源)、安裝、使用方法、案例應(yīng)用之詳細(xì)攻略
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(一)
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(二)
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(三)
ML之XGBoost：XGBoost參數(shù)調(diào)優(yōu)的優(yōu)秀外文翻譯—《XGBoost中的參數(shù)調(diào)優(yōu)完整指南(帶python中的代碼)》(四)

2. xgboost參數(shù)/XGBoost?Parameters

The overall parameters have been?divided into 3 categories by XGBoost authors:
XGBoost作者將總體參數(shù)分為3類：

General?Parameters:?Guide the overall functioning
一般參數(shù)：指導(dǎo)整體功能

Booster Parameters:?Guide the individual booster (tree/regression) at each step
Booster參數(shù)：在每個(gè)步驟中引導(dǎo)單個(gè)助推器（樹/回歸）

Learning Task?Parameters:?Guide the optimization performed
學(xué)習(xí)任務(wù)參數(shù)：指導(dǎo)優(yōu)化執(zhí)行

I will give analogies to GBM here and highly recommend to read?this article?to learn from the very basics.
我將在此對(duì)GBM進(jìn)行類比，并強(qiáng)烈建議閱讀本文以從非常基礎(chǔ)的內(nèi)容中學(xué)習(xí)。
?

一般參數(shù)/General Parameters

These define the overall functionality of XGBoost.
這些定義了XGBoost的整體功能。

booster [default=gbtree] ? ??助推器[默認(rèn)值=gbtree]

Select the type of model to run at each iteration. It has 2 options:
選擇要在每次迭代中運(yùn)行的模型類型。它有兩種選擇：
- gbtree: tree-based models ? ? ?gbtree:基于樹的模型
- gblinear: linear models ? ? ? ? ? ?GBLinear:線性模型

silent [default=0]: ? ??silent [默認(rèn)值=0]：

Silent mode is activated is set to 1, i.e. no running messages will be printed.
?silent模式激活設(shè)置為1，即不會(huì)打印正在運(yùn)行的消息。
It’s generally good to keep it 0 as the messages?might help in understanding the model.
一般來(lái)說(shuō)，最好保持0，因?yàn)橄⒖赡苡兄诶斫饽Ｐ汀?/li>

nthread [default to maximum number of threads available if not set]
nthread[默認(rèn)為最大可用線程數(shù)（如果未設(shè)置）]

This is used for parallel processing and number of cores in the system should be entered
這用于并行處理，應(yīng)輸入系統(tǒng)中的內(nèi)核數(shù)。
If you wish to run on all cores, value?should not be entered and algorithm will detect automatically
如果您希望在所有核心上運(yùn)行，則不應(yīng)輸入值，算法將自動(dòng)檢測(cè)。

There are 2 more parameters which are set automatically by XGBoost and you need not worry about them. Lets move on to Booster parameters.
還有兩個(gè)參數(shù)是由xgboost自動(dòng)設(shè)置的，您不必?fù)?dān)心它們。讓我們繼續(xù)討論助推器參數(shù)。

Booster參數(shù)/Booster Parameters

Though?there are 2 types of boosters, I’ll consider only?tree booster?here because it always outperforms the linear booster and thus the later is rarely used.
雖然有兩種助推器，但這里我只考慮樹助推器，因?yàn)樗偸莾?yōu)于線性助推器，因此很少使用后者。

eta [default=0.3] ? ? ? ? ? ? ? ? ? ? ? ? ? ??eta[默認(rèn)值=0.3]

Analogous to learning rate in GBM ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?類似于GBM中的學(xué)習(xí)率
Makes the model more robust by shrinking the weights on each step ? ?通過(guò)收縮每一步的權(quán)重，使模型更加健壯
Typical final values to be used: 0.01-0.2 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?使用的典型最終值：0.01-0.2

min_child_weight [default=1] ? ? ??最小子權(quán)重[默認(rèn)值=1]

Defines the minimum?sum of weights of all observations required in a child.
定義子級(jí)中所需的所有觀察值的最小權(quán)重之和。
This is similar to?min_child_leaf?in GBM but not exactly. This refers to min “sum of weights” of observations while GBM has min “number of observations”.
這與GBM中的Min_Child_Leaf類似，但不完全相同。這是指觀測(cè)值的最小“權(quán)重和”，而GBM的最小“觀測(cè)數(shù)”。
Used to control over-fitting. Higher values prevent a model from learning relations which might be highly?specific to the?particular sample selected for a tree.
用于控制裝配。較高的值會(huì)阻止模型學(xué)習(xí)關(guān)系，這可能與為樹選擇的特定樣本高度相關(guān)。
Too high values can lead to under-fitting hence, it should be tuned using CV.
過(guò)高的數(shù)值會(huì)導(dǎo)致擬合不足，因此應(yīng)使用cv對(duì)其進(jìn)行調(diào)整。

max_depth [default=6] ? ??最大深度[默認(rèn)值=6]

The maximum depth of a tree, same as GBM. ? ? ? ??樹的最大深度，與gbm相同。
Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
用于控制擬合，因?yàn)楦叩纳疃葘⒃试S模型學(xué)習(xí)特定于特定樣本的關(guān)系。
Should be tuned using CV.
應(yīng)該使用cv進(jìn)行調(diào)整。
Typical values: 3-10
典型值：3-10

max_leaf_nodes ? ? ?最大葉節(jié)點(diǎn)數(shù)

The maximum number of terminal nodes or leaves in a tree.
樹中終端節(jié)點(diǎn)或葉的最大數(shù)目。
Can be defined in place of?max_depth. Since binary trees are created, a depth of ‘n’ would produce a maximum of 2^n leaves.
可在最大深度處定義。由于創(chuàng)建了二叉樹，“n”的深度最多可生成2^n個(gè)葉。
If this is defined, GBM will ignore max_depth.
如果定義了這一點(diǎn)，GBM將忽略最大深度。

gamma [default=0] ? ? ?gamma[默認(rèn)值=0]

A node is split only when the resulting split gives a positive reduction in the loss function. Gamma specifies the minimum loss reduction required to make a split.
只有當(dāng)所產(chǎn)生的拆分使損失函數(shù)正減少時(shí)，才會(huì)拆分節(jié)點(diǎn)。gamma指定進(jìn)行分割所需的最小損失減少。
Makes the algorithm conservative. The values can vary depending on the loss function and should be tuned.
使算法保守。這些值可能因損失函數(shù)而變化，應(yīng)進(jìn)行調(diào)整。

max_delta_step [default=0] ? ? ? ? ? ? ? ? ?最大增量步進(jìn)[默認(rèn)值=0]

In maximum delta step we allow each tree’s weight estimation to be. If the value is set to 0, it means there is no constraint. If it is set to a positive value, it can help making the update step more conservative.
在最大增量步驟中，我們?cè)试S每棵樹的權(quán)重估計(jì)為。如果該值設(shè)置為0，則表示沒(méi)有約束。如果將其設(shè)置為正值，將有助于使更新步驟更加保守。
Usually this parameter is not needed, but it might help in logistic regression when class is extremely imbalanced.
通常不需要這個(gè)參數(shù)，但當(dāng)類極不平衡時(shí)，它可能有助于邏輯回歸。
This is generally not used but you can explore further if you wish.
這通常不使用，但如果您愿意，您可以進(jìn)一步探索。

subsample [default=1] ? ? ? ? ?子樣本[默認(rèn)值=1]

Same as the subsample of GBM. Denotes the fraction of observations to be randomly samples for each tree.
與GBM的子樣本相同。表示每棵樹隨機(jī)采樣的觀測(cè)分?jǐn)?shù)。
Lower values make the algorithm more conservative and prevents overfitting but too small values might lead to under-fitting.
值越小，算法越保守，防止過(guò)擬合，但過(guò)小的值可能導(dǎo)致過(guò)擬合。
Typical values: 0.5-1
典型值：0.5-1

colsample_bytree [default=1] ? ??colsample_bytree [默認(rèn)值=1]

Similar to max_features in GBM. Denotes the fraction of columns?to be randomly samples for each tree.
類似于GBM中的max_功能。表示每棵樹隨機(jī)采樣的列的分?jǐn)?shù)。
Typical values: 0.5-1
典型值：0.5-1

colsample_bylevel [default=1] ? ? ????????colsample_bylevel [默認(rèn)值=1]

Denotes the subsample ratio of columns for each split, in each level.
表示每個(gè)級(jí)別中每個(gè)拆分的列的子樣本比率。
I don’t use this often because subsample and colsample_bytree will do the job for you.?but you can explore further if you feel so.
我不經(jīng)常使用這個(gè)，因?yàn)镾ubsample和Colsample Bytree將為您完成這項(xiàng)工作。但是如果您覺(jué)得這樣，您可以進(jìn)一步探索。

lambda [default=1] ? ? ? ? ? ? ? ? ? ?????????lambda[默認(rèn)值=1]

L2 regularization term on weights (analogous to Ridge regression)
權(quán)的L2正則化項(xiàng)（類似于嶺回歸）
This used to handle the regularization part of XGBoost. Though many data scientists don’t use it often, it should be explored to reduce overfitting.
用于處理xgboost的正則化部分。雖然許多數(shù)據(jù)科學(xué)家不經(jīng)常使用它，但應(yīng)該探索它來(lái)減少過(guò)度擬合。

alpha [default=0] ? ? ? ? ? ? ? ? ? ? ????????alpha[默認(rèn)值=0]

L1 regularization term on weight?(analogous to Lasso?regression)
L1重量上的正則化項(xiàng)（類似于lasso回歸）
Can be used in case of very high dimensionality so that the algorithm runs faster when implemented
可以在高維情況下使用，以便算法在實(shí)現(xiàn)時(shí)運(yùn)行更快

scale_pos_weight [default=1] ? ? ? ? ? ? ?scale_pos_weight ?[默認(rèn)值=1]

A value greater than 0 should be?used in case of high class imbalance as it helps in faster convergence.
大于0的值應(yīng)用于高級(jí)不平衡的情況，因?yàn)樗兄诟斓氖諗俊?/li>

???????學(xué)習(xí)任務(wù)參數(shù)/Learning Task Parameters

These parameters are used to define the optimization objective the metric to be calculated at each step.
這些參數(shù)用于定義優(yōu)化目標(biāo)，即在每個(gè)步驟中要計(jì)算的度量。

objective [default=reg:linear] ? ? ? ? ?目標(biāo)[默認(rèn)值=reg：線性]

This defines the?loss function to be minimized. Mostly used values are:
這定義了要最小化的損失函數(shù)。最常用的值是：???????
- binary:logistic?–logistic regression for binary classification, returns?predicted probability (not class)
  二進(jìn)制：logistic–logistic回歸用于二進(jìn)制分類，返回預(yù)測(cè)概率（非類別）
- multi:softmax?–multiclass classification using the softmax objective, returns predicted class (not probabilities)
  multi:softmax–使用softmax目標(biāo)的多類分類，返回預(yù)測(cè)類（不是概率）
  - you also need to set an additional?num_class?(number of classes) parameter defining the number of unique classes
    您還需要設(shè)置一個(gè)額外的num_class（類數(shù)）參數(shù)來(lái)定義唯一類的數(shù)目。
- multi:softprob?–same as softmax, but returns?predicted probability of each data point belonging to each class.
  multi:softprob–與softmax相同，但返回屬于每個(gè)類的每個(gè)數(shù)據(jù)點(diǎn)的預(yù)測(cè)概率。

eval_metric [ default according to objective ] ? ? ? ? ? ? ??評(píng)估指標(biāo)[根據(jù)目標(biāo)默認(rèn)]

The metric to be used for?validation data.
用于驗(yàn)證數(shù)據(jù)的度量。
The default values are rmse for regression and error for classification.
默認(rèn)值為回歸的RMSE和分類的錯(cuò)誤。
Typical?values are: ? ? ?????????典型值為：
- rmse?– root mean square error ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??rmse–均方根誤差
- mae?–?mean absolute error? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?平均絕對(duì)誤差
- logloss?–?negative?log-likelihood? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 對(duì)數(shù)損失–負(fù)對(duì)數(shù)可能性
- error?–?Binary classification error rate (0.5 threshold)? ? ? 錯(cuò)誤–二進(jìn)制分類錯(cuò)誤率（0.5閾值）
- merror?–?Multiclass classification error rate? ? ? ? ? ? ? ? ? ? ??多類分類錯(cuò)誤率 ? ? ? ? ? ? ? ? ??
- mlogloss?–?Multiclass logloss? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?多類日志損失
- auc:?Area under the curve? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?AUC：曲線下面積

seed [default=0] ? ? ? ? ? ??種子[默認(rèn)值=0]

The random number seed.
隨機(jī)數(shù)種子。
Can be used for generating reproducible results and also for parameter tuning.
可用于生成可重復(fù)的結(jié)果，也可用于參數(shù)調(diào)整。

If you’ve been using Scikit-Learn till now, these parameter names might not look familiar. A good news is that xgboost module in python has an sklearn wrapper called XGBClassifier. It uses sklearn style naming convention. The parameters names which will change are:
如果您到目前為止一直在使用scikit-learn，這些參數(shù)名稱可能看起來(lái)不熟悉。一個(gè)好消息是，python中的xgboost模塊有一個(gè)名為xgbclassifier的sklearn包裝器。它使用sklearn樣式命名約定。將更改的參數(shù)名稱為：

eta –> learning_rate ? ? ? ? ? ? ??

lambda –> reg_lambda ? ? ? ? ? ? ?

alpha –> reg_alpha

You must be wondering that we have defined everything except something similar to the “n_estimators” parameter in GBM. Well this exists as a parameter in XGBClassifier. However, it has to be passed as “num_boosting_rounds” while calling the fit function in the standard xgboost implementation.
您一定想知道，除了類似于GBM中的“n_Estimators”參數(shù)之外，我們已經(jīng)定義了所有內(nèi)容。這在XGBClassifier中作為一個(gè)參數(shù)存在。但是，在標(biāo)準(zhǔn)xgboost實(shí)現(xiàn)中調(diào)用fit函數(shù)時(shí)，必須將其作為“num-booting-rounds”傳遞。

I recommend you to go through the following parts of xgboost guide to better understand the parameters and codes:
我建議您仔細(xì)閱讀xgboost指南的以下部分，以便更好地了解參數(shù)和代碼：

XGBoost Parameters (official guide)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?xgboost參數(shù)（官方指南）

XGBoost Demo Codes (xgboost GitHub repository)? ? ? ? ?xgboost演示代碼（xgboost github存儲(chǔ)庫(kù)）

Python API Reference (official guide)? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??python api參考（官方指南）

總結(jié)

以上是生活随笔為你收集整理的ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二)的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇： ML之XGBoost：XGBoost参数
下一篇： ML之XGBoost：XGBoost参数

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频 在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操

python

ML之XGBoost：XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(二)

2. xgboost參數(shù)/XGBoost?Parameters

一般參數(shù)/General Parameters

Booster參數(shù)/Booster Parameters

???????學(xué)習(xí)任務(wù)參數(shù)/Learning Task Parameters

總結(jié)

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操