ML之XGBoost:XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(一)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(一)
?
?
?
目錄
概述/Overview
介紹/Introduction
你應該知道什么/What should you know ?
目錄/Table of Contents
1.?xgboost的優勢/The XGBoost Advantage
?
?
?
?
?
原文題目:《Complete Guide to Parameter Tuning in XGBoost with codes in Python》
原文地址:https://www.analyticsvidhya.com/blog/2016/03/complete-guide-parameter-tuning-xgboost-with-codes-python/
 所有權為原文所有,本文只負責翻譯。
相關文章
ML之XGBoost:XGBoost算法模型(相關配圖)的簡介(XGBoost并行處理)、關鍵思路、代碼實現(目標函數/評價函數)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost:Kaggle神器XGBoost算法模型的簡介(資源)、安裝、使用方法、案例應用之詳細攻略
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(一)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(二)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(三)
ML之XGBoost:XGBoost參數調優的優秀外文翻譯—《XGBoost中的參數調優完整指南(帶python中的代碼)》(四)
概述/Overview
- XGBoost is a powerful machine learning algorithm especially where speed and accuracy are concerned
- We need to consider different parameters and their values to be specified while implementing an XGBoost model
- The XGBoost model requires parameter tuning to improve and fully leverage its advantages over other algorithms
--------------------------------------------------------------------------------------------------------------------------------------
- xgboost是一種強大的機器學習算法,特別是在速度和精度方面。
- 在實現XGBoost模型時,我們需要考慮不同的參數及其要被確定的數值。
- xgboost模型需要參數調整,以改進和充分利用其相對于其他算法的優勢。
?
介紹/Introduction
If?things don’t go your way in predictive modeling, use XGboost. ?XGBoost algorithm has become the ultimate weapon of many data scientist. It’s a?highly sophisticated algorithm, powerful enough to deal with all sorts of irregularities of data.
 如果在預測建模中事情不太成功,那么使用xgboost。xgboost算法已經成為許多數據科學家的終極武器。這是一種高度復雜的算法,其強大程度足以處理各種不規則的數據。
Building a model using XGBoost is easy. But, improving the model using XGBoost is difficult (at least I struggled a lot). This algorithm uses multiple parameters. To improve the model, parameter tuning is must. It is very difficult to get answers to practical questions like – Which set of parameters you should tune ? What is the ideal value of these parameters to obtain optimal output ?
 使用xgboost構建模型很容易。但是,使用xgboost改進模型是困難的(至少我努力做了很多)。該算法使用多個參數。為了改進模型,必須對參數進行調整。很難找到實際問題的答案,比如你應該調整哪些參數?為了獲得最佳輸出,這些參數的理想值是多少?
This article is best suited to people who are new to XGBoost. In this article, we’ll learn the art of parameter tuning along with some useful information about XGBoost. Also, we’ll practice this algorithm using a ?data set?in Python.
 這篇文章最適合剛接觸XGBoost的人。在本文中,我們將學習參數調優的藝術,以及一些有關xgboost的有用信息。另外,我們將使用Python中的數據集來實踐此算法。
?
你應該知道什么/What should you know ?
XGBoost (eXtreme Gradient Boosting)?is an advanced implementation of gradient boosting algorithm. Since I covered Gradient Boosting Machine in detail in my previous article –?Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python, I highly recommend going through that before reading further. It will help you bolster your understanding of boosting in general and parameter tuning for GBM.
 xgboost(極端梯度增強)是梯度增強算法的高級實現。由于我在上一篇文章–《?Complete Guide to Parameter Tuning in Gradient Boosting (GBM) in Python( python中的GBM參數微調的完整指南))》中詳細介紹了漸變增強機器,所以我強烈建議在進一步閱讀之前仔細閱讀。它將幫助您增強對GBM一般增強和參數調整的理解。
Special Thanks:?Personally, I would like to acknowledge the timeless support provided by?Mr. Sudalai Rajkumar?(aka SRK), currently?AV Rank 2. This article wouldn’t be possible without his help.?He is helping us guide thousands of data scientists. A big thanks to SRK!
 特別感謝:個人角度,我想感謝Sudalai Rajkumar(aka SRK)先生提供的一直以來的支持,目前AV排名2。沒有他的幫助,這篇文章是不可能的。他正在幫助我們指導成千上萬的數據科學家。非常感謝SRK!
?
目錄/Table of Contents
--------------------------------------------------------------------------------------------------------------------------------------
?
1.?xgboost的優勢/The XGBoost Advantage
I’ve always admired the boosting capabilities that this algorithm infuses in a predictive model. When I explored more about its performance and science behind its high accuracy, I discovered many advantages:
 我一直欣賞這種算法在預測模型中注入的增強功能。當我更多地了解它的高精度背后的性能和科學性時,我發現了許多優勢:
- Standard GBM implementation has no?regularization?like?XGBoost,?therefore?it also helps to reduce overfitting.
 標準的GBM實現沒有像XGBoost那樣的規范化,因此它也有助于減少過擬合。
- In fact, XGBoost is also known as?‘regularized boosting‘ technique.
 事實上,xgboost也被稱為“規則化增壓”技術。
- XGBoost implements parallel processing and is?blazingly faster?as compared to GBM.
 XGBoost實現了并行處理,與GBM相比速度快得驚人。
- But hang on, we know that?boosting?is sequential process so how can it be parallelized? We know that each tree can be built only after the previous one, so?what stops us from making a tree using all cores? I hope you?get?where I’m coming from. Check?this link?out to explore further.
 但是仔細一想,我們知道提升是一個連續的過程,所以它如何被并行化呢?我們知道每棵樹只能在前一棵樹之后才能被建造,那么是什么阻止了我們用所有的核心來建造一棵樹呢?我希望你知道我從哪里來。請查看此鏈接以進一步了解。
- XGBoost also supports implementation on Hadoop.
 XGBoost還支持Hadoop上的實現。
- XGBoost allow users to define?custom optimization objectives and evaluation criteria.
 xgboost允許用戶定義自定義優化目標和評估標準。
- This adds a whole new dimension to the model and there is no limit to what we can do.
 這為模型增加了一個全新的維度,我們所能做的沒有限制。
- XGBoost has an in-built routine to handle?missing values.
 XGBoost有一個內置的例程來處理丟失的值。
- User is required to?supply?a different value than other observations and pass that as a parameter. XGBoost?tries different things as it encounters a missing value on each node and learns which path to take for missing values in future.
 用戶需要提供與其他觀察值不同的值,并將其作為參數傳遞。XGBoost嘗試不同的方法,因為它在每個節點上遇到一個缺少值的情況,并了解將來要為缺少值采取什么路徑。
- A GBM would stop splitting a node when it encounters a negative loss in the split. Thus it is more of a?greedy algorithm.
 當一個GBM在分割中遇到負損失時,它將停止分割一個節點。因此,它更像是一個貪婪的算法。
- XGBoost on the other hand make?splits upto the max_depth?specified and then start?pruning?the tree backwards and remove splits beyond which there is no positive gain.
 另一方面,xgboost將拆分到指定的最大深度,然后開始向后修剪樹,刪除沒有正增益的拆分。
- Another?advantage is that sometimes a split of negative loss say -2 may be followed by a split of positive loss +10. GBM would stop as it encounters -2. But XGBoost will go deeper and it will see a combined effect of +8 of the split and keep both.
 另一個好處是有時負損失的分割,比如-2,然后正損失的分割+10。GBM遇到-2時會停止。但是xgboost會更深入,它會看到拆分的+8的組合效果,并保持兩者。
- XGBoost allows user to run a?cross-validation at each iteration?of the boosting process and thus it is easy to get the exact optimum number of boosting iterations in a single run.
 XGBoost允許用戶在每次提升過程迭代時運行交叉驗證,因此很容易在一次運行中獲得準確的最佳提升迭代次數。
- This is unlike GBM where we have to run a grid-search and only a limited values can be tested.
 這與GBM不同,我們必須運行網格搜索,并且只能測試有限的值。
- User can start training an XGBoost model from its last iteration of previous run. This can be of significant advantage in certain specific applications.
 用戶可以從上次運行的迭代開始訓練xgboost模型。在某些特定的應用中,這可能具有顯著的優勢。
- GBM implementation of sklearn also has this feature so they are even on this point.
 sklearn的gbm實現也有這個特性,所以它們在這一點上更平穩。
I hope now you understand the sheer power XGBoost algorithm. Note that these are the points which I could muster. You know a few more? Feel free to drop?a comment below and I will update the list.
 我希望您現在能夠理解XGBoost算法的強大功能。請注意,這些是我可以收集的要點。你知道更多嗎?請隨意在下面添加評論,我將更新列表。
Did I whet your appetite ? Good.?You can refer to following web-pages for a deeper understanding:
 我有沒有激起你的食欲?很好。您可以參考以下網頁以進一步了解:
- XGBoost Guide – Introduction to Boosted Trees? ?xgboost指南-增強型樹介紹
- Words from the Author of XGBoost?[Video] ? ? ? ? ? ?XGBoos作者的描述
?
總結
以上是生活随笔為你收集整理的ML之XGBoost:XGBoost参数调优的优秀外文翻译—《XGBoost中的参数调优完整指南(带python中的代码)》(一)的全部內容,希望文章能夠幫你解決所遇到的問題。
 
                            
                        - 上一篇: Dataset之OttoGroup:Ot
- 下一篇: ML之XGBoost:XGBoost参数
