DL之DNN优化技术:自定义MultiLayerNet【5*100+ReLU】对MNIST数据集训练进而比较三种权重初始值(Xavier参数初始化、He参数初始化)性能差异
生活随笔
收集整理的這篇文章主要介紹了
DL之DNN优化技术:自定义MultiLayerNet【5*100+ReLU】对MNIST数据集训练进而比较三种权重初始值(Xavier参数初始化、He参数初始化)性能差异
小編覺得挺不錯的,現在分享給大家,幫大家做個參考.
DL之DNN優化技術:自定義MultiLayerNet【5*100+ReLU】對MNIST數據集訓練進而比較三種權重初始值(Xavier參數初始化、He參數初始化)性能差異
導讀
#思路:觀察不同的權重初始值(std=0.01、Xavier初始值、He初始值)的賦值進行實驗,會在多大程度上影響神經網絡的學習。
#結論:std=0.01時完全無法進行學習,是因為正向傳播中傳遞的值很小(集中在0附近的數據)。因此,逆向傳播時求到的梯度也很小,權重幾乎不進行更新。相反,當權重初始值為Xavier初始值和He初始值時,學習進行得很順利。并且,我們發現He初始值時的學習進度更快一些。
#總結:在神經網絡的學習中,權重初始值非常重要。很多時候權重初始值的設定關系到神經網絡的學習能否成功。權重初始值的重要性容易被忽視,而任何事情的開始(初始值)總是關鍵的。
?
目錄
輸出結果
設計思路
核心代碼
?
?
輸出結果
===========iteration:0=========== std=0.01:2.302533896615576 Xavier:2.301592862642649 He:2.452819600404312 ===========iteration:100=========== std=0.01:2.3021427450183882 Xavier:2.2492771742332085 He:1.614645290697084 ===========iteration:200=========== std=0.01:2.3019226530108763 Xavier:2.142875264754691 He:0.8883226546097108 ===========iteration:300=========== std=0.01:2.3021797231413514 Xavier:1.801154569414849 He:0.5779849031641334 ===========iteration:400=========== std=0.01:2.3012695247928474 Xavier:1.3899007227604079 He:0.41014765063844627 ===========iteration:500=========== std=0.01:2.3007728429528314 Xavier:0.9069490262118367 He:0.33691702821838565 ===========iteration:600=========== std=0.01:2.298961977446477 Xavier:0.7562167106493611 He:0.3818234934485747 ===========iteration:700=========== std=0.01:2.3035037771527715 Xavier:0.5636724725221689 He:0.21607562992114449 ===========iteration:800=========== std=0.01:2.3034607224422023 Xavier:0.5658840865099287 He:0.33168882912900743 ===========iteration:900=========== std=0.01:2.305051548224051 Xavier:0.588201820904584 He:0.2569635828759095 ===========iteration:1000=========== std=0.01:2.2994594023429755 Xavier:0.4185962336886156 He:0.20020701131406038 ===========iteration:1100=========== std=0.01:2.2981894831572904 Xavier:0.3963740567004913 He:0.25746657996551603 ===========iteration:1200=========== std=0.01:2.2953607843932193 Xavier:0.41330568558866765 He:0.2796398422265146 ===========iteration:1300=========== std=0.01:2.2964967978545396 Xavier:0.39618376387851506 He:0.2782019670206384 ===========iteration:1400=========== std=0.01:2.299861702734514 Xavier:0.24832216447348573 He:0.1512273585162205 ===========iteration:1500=========== std=0.01:2.3006214773891234 Xavier:0.3596899255315174 He:0.2719352219860638 ===========iteration:1600=========== std=0.01:2.298109767745866 Xavier:0.35977950572647455 He:0.2650267112104039 ===========iteration:1700=========== std=0.01:2.301979953517381 Xavier:0.23664052932406424 He:0.13415720105707601 ===========iteration:1800=========== std=0.01:2.299083895357553 Xavier:0.2483172887982285 He:0.14187181238369628 ===========iteration:1900=========== std=0.01:2.305385198129093 Xavier:0.3655424067819445 He:0.21497438379944553?
設計思路
?
?
核心代碼
class MultiLayerNet: '……'def predict(self, x): for layer in self.layers.values(): x = layer.forward(x)return x def loss(self, x, t):y = self.predict(x) weight_decay = 0 for idx in range(1, self.hidden_layer_num + 2):W = self.params['W' + str(idx)]weight_decay += 0.5 * self.weight_decay_lambda * np.sum(W ** 2)return self.last_layer.forward(y, t) + weight_decay def accuracy(self, x, t): y = self.predict(x) y = np.argmax(y, axis=1) if t.ndim != 1 : t = np.argmax(t, axis=1)accuracy = np.sum(y == t) / float(x.shape[0]) #計算accuracy并返回return accuracy def numerical_gradient(self, x, t): #T1、numerical_gradient()函數:數值微分法求梯度loss_W = lambda W: self.loss(x, t) grads = {}for idx in range(1, self.hidden_layer_num+2): grads['W' + str(idx)] = numerical_gradient(loss_W, self.params['W' + str(idx)])grads['b' + str(idx)] = numerical_gradient(loss_W, self.params['b' + str(idx)])return grads def gradient(self, x, t):self.loss(x, t)dout = 1dout = self.last_layer.backward(dout)layers = list(self.layers.values()) layers.reverse()for layer in layers: dout = layer.backward(dout)grads = {}for idx in range(1, self.hidden_layer_num+2):grads['W' + str(idx)] = self.layers['Affine' + str(idx)].dW + self.weight_decay_lambda * self.layers['Affine' + str(idx)].Wgrads['b' + str(idx)] = self.layers['Affine' + str(idx)].dbreturn grads networks = {} train_loss = {} for key, weight_type in weight_init_types.items():networks[key] = MultiLayerNet(input_size=784, hidden_size_list=[100, 100, 100, 100],output_size=10, weight_init_std=weight_type)train_loss[key] = []for i in range(max_iterations):#定義x_batch、t_batchbatch_mask = np.random.choice(train_size, batch_size)x_batch = x_train[batch_mask]t_batch = t_train[batch_mask]for key in weight_init_types.keys():grads = networks[key].gradient(x_batch, t_batch)optimizer.update(networks[key].params, grads)loss = networks[key].loss(x_batch, t_batch)train_loss[key].append(loss)if i % 100 == 0:print("===========" + "iteration:" + str(i) + "===========")for key in weight_init_types.keys():loss = networks[key].loss(x_batch, t_batch)print(key + ":" + str(loss))?
?
相關文章
DL之DNN:自定義MultiLayerNet【5*100+ReLU】對MNIST數據集訓練進而比較三種權重初始值性能差異
?
?
?
?
?
?
?
?
總結
以上是生活随笔為你收集整理的DL之DNN优化技术:自定义MultiLayerNet【5*100+ReLU】对MNIST数据集训练进而比较三种权重初始值(Xavier参数初始化、He参数初始化)性能差异的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: DL之DNN优化技术:DNN中参数初始化
- 下一篇: Graphviz之DT:手把手教你使用可