當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

UFLDL教程： Exercise: Sparse Autoencoder

發布時間：2023/12/13 编程问答 28 豆豆

生活随笔收集整理的這篇文章主要介紹了 UFLDL教程： Exercise: Sparse Autoencoder 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

自編碼可以跟PCA 一樣，給特征屬性降維

一些matlab函數

bsxfun:C=bsxfun(fun,A,B)表達的是兩個數組A和B間元素的二值操作，fun是函數句柄或者m文件，或者是內嵌的函數。在實際使用過程中fun有很多選擇比如說加，減等，前面需要使用符號’@’.一般情況下A和B需要尺寸大小相同，如果不相同的話，則只能有一個維度不同，同時A和B中在該維度處必須有一個的維度為1。比如說bsxfun(@minus, A, mean(A))，其中A和mean(A)的大小是不同的，這里的意思需要先將mean(A)擴充到和A大小相同，然后用A的每個元素減去擴充后的mean(A)對應元素的值。rand：生成均勻分布的偽隨機數。分布在（0~1）之間主要語法：rand(m,n)生成m行n列的均勻分布的偽隨機數rand(m,n,'double')生成指定精度的均勻分布的偽隨機數，參數還可以是'single'rand(RandStream,m,n)利用指定的RandStream(我理解為隨機種子)生成偽隨機數randn：生成標準正態分布的偽隨機數（均值為0，方差為1）。主要語法：和上面一樣randi：生成均勻分布的偽隨機整數主要語法：randi（iMax）在閉區間（0，iMax）生成均勻分布的偽隨機整數 randi（iMax，m，n）在閉區間（0，iMax）生成mXn型隨機矩陣r = randi([iMin,iMax],m,n)在閉區間（iMin，iMax）生成mXn型隨機矩陣exist:測試參數是否存在，比如說exist('opt_normalize', 'var')表示檢測變量opt_normalize是否存在，其中的’var’表示變量的意思。colormap:設置當前常見的顏色值表。floor：floor(A):取不大于A的最大整數。ceil:ceil(A):取不小于A的最小整數。imagesc:imagesc和image類似，可以用于顯示圖像。比如imagesc(array,'EraseMode','none',[-1 1])，這里的意思是將array中的數據線性映射到[-1,1]之間，然后使用當前設置的顏色表進行顯示。此時的[-1,1]充滿了整個顏色表。背景擦除模式設置為node，表示不擦除背景。repmat:該函數是擴展一個矩陣并把原來矩陣中的數據復制進去。比如說B = repmat(A,m,n)，就是創建一個矩陣B，B中復制了共m*n個A矩陣，因此B矩陣的大小為[size(A,1)*m size(A,2)*n]。--------------------------------- matlab中的@用法：問：f=@(x)acos(x)表示什么意思？其中@代表什么？答：表示f為函數句柄，@是定義句柄的運算符。f=@(x)acos(x) 相當于建立了一個函數文件： % f.m function y=f(x) y=acos(x);若有下列語句：xsqual=@(x)1/2.*(x==-1/2)+1.*(x>-1/28&x<1/2)+1.2.*(x==-1/2); 則相當于建立了一個函數文件： % xsqual.m function y=xsqual(x) y=1/2.*(x==-1/2)+1.*(x>-1/28&x<1/2)+1.2.*(x==-1/2);函數句柄的好處①提高運行速度。因為matlab對函數的調用每次都是要搜索所有的路徑，從set path中我們可以看到，路徑是非常的多的，所以如果一個函數在你的程序中需要經常用到的話，使用函數句柄，對你的速度會有提高的。②使用可以與變量一樣方便。比如說，我再這個目錄運行后，創建了本目錄的一個函數句柄，當我轉到其他的目錄下的時候，創建的函數句柄還是可以直接調用的，而不需要把那個函數文件拷貝過來。因為你創建的function handles中，已經包含了路徑。使用函數句柄的作用：不使用函數句柄的情況下，對函數多次調用，每次都要為該函數進行全面的路徑搜索，直接影響計算速度，借助句柄可以完全避免這種時間損耗。也就是直接指定了函數的指針。函數句柄就像一個函數的名字，有點類似于C++程序中的引用。

重點公式回顧

公式（1）

公式（2）

公式（3）

公式（4）

公式（5）

公式（6）

公式（7）

公式（8）

公式（9）

公式（10）

公式（11）

反向傳播推導過程中第l層第i個節點殘差的推導過程：

教程中反向傳播算法的推導中對于第3.步的推導(ng并沒有在教程中給出推導，但是譯者進行了推導)，我用了不同于譯者的推導過程：

教程回顧及譯者對第3步的推導

實驗基礎

其實實現該功能的主要步驟還是需要計算出網絡的損失函數以及其偏導數.
1. 計算出網絡每個節點的輸入值（即程序中的z值）和輸出值（即程序中的a值，a是z的sigmoid函數值）。
2. 利用z值和a值計算出網絡每個節點的誤差值（即程序中的delta值）。
3. 這樣可以利用上面計算出的每個節點的a，z，delta來表達出系統的損失函數以及損失函數的偏導數。

其實步驟1是前向進行的，也就是說按照輸入層——>隱含層——>輸出層的方向進行計算。而步驟2是方向進行的（這也是該算法叫做BP算法的來源），即每個節點的誤差值是按照輸出層——>隱含層——>輸入層方向進行的。

步驟

1.產生訓練集。從10張512*512的圖片中，隨機選擇10000張8*8的小圖塊，然后再把它歸一化，得到訓練集patches。具體見程序 sampleIMAGES.m
2.計算出代價函數 Jsparse(W,b) 及其梯度。具體見程序sparseAutoencoderCost.m。
3.通過函數 computeNumericalGradient.m計算出大概梯度（EPSILON = 10-4），然后通過函數checkNumericalGradient.m檢查上一步寫的計算梯度的代碼是否正確。首先，通過計算函數在點[4，10]處的梯度對比用computeNumericalGradient.m中的方法計算該函數的梯度，這兩者梯度的差值小于10-9就代表computeNumericalGradient.m中方法是正確的。然后，用computeNumericalGradient.m中方法計算代價函數 Jsparse(W,b) 的梯度對比用sparseAutoencoderCost.m中的方法計算代價函數 Jsparse(W,b) 的梯度，如果這兩者梯度的差值小于10-9就證明sparseAutoencoderCost.m中方法是正確的。
4.訓練稀疏自動編碼器。用的 L-BFGS算法（注意：這個算法不能將它用于商業用途，若用與商業用途的話，可以使用fminlbfgs函數，他比L-BFGS慢但可用于商業用途），具體見文件夾 minFunc。另外，初始化參數矩陣θ（包含W(1),W(2),b(1),b(2)）時，W(1),W(2)的初始值是從中隨機均勻分布產生，其中 nin是隱藏層神經元個數， nout 是輸出層神經元個數。b(1),b(2)初始化為0.
5.可視化結果。點擊train.m運行總程序，訓練稀疏自動編碼器，得到可視化結果。把產生的權重結果可視化，通過它我們能夠知道，該算法究竟從圖片中學習了哪些特征。

代碼及注釋

train.m

（1）調用sampleIMAGES函數從已知圖像中扣取多個圖像塊兒
（2）調用display_network函數，以網格的形式，隨機顯示多個扣取的圖像塊兒
（3）梯度校驗，該部分的目的是測試函數是否正確，可以由單獨的函數checkSparseAutoencoderCost實現
①利用sparseAutoencoderCost函數計算網路的代價函數和梯度值
②利用computeNumericalGradient函數計算梯度值（這里，要利用checkNumericalGradient函數驗證該梯度計算函數是否正確）
③比較①和②的梯度計算結果，判斷編寫的sparseAutoencoderCost函數是否正確
如果sparseAutoencoderCost函數是正確的，那么，在實際訓練中，不需要運行checkSparseAutoencoderCost
（4）利用L-BFGS方法對網絡進行訓練，從而得到最優化的網絡的權值和偏執項
（5）對訓練結果進行可視化

%http://blog.csdn.net/jiandanjinxin/article/details/72875977%% CS294A/CS294W Programming Assignment Starter Code% Instructions % ------------ % % This file contains code that helps you get started on the % programming assignment. You will need to complete the code in sampleIMAGES.m, % sparseAutoencoderCost.m and computeNumericalGradient.m. % For the purpose of completing the assignment, you do not need to % change the code in this file. % %%====================================================================== %% STEP 0: Here we provide the relevant parameters values that will % allow your sparse autoencoder to get good filters; you do not need to % change the parameters below. %第0步：提供可得到較好濾波器的相關參數值，不得改變以下參數 visibleSize = 8*8; % number of input units 輸入層單元數 hiddenSize = 25; % number of hidden units 隱藏層單元數 sparsityParam = 0.01; % desired average activation of the hidden units.稀疏值% (This was denoted by the Greek alphabet rho, which looks like a lower-case "p",% in the lecture notes). lambda = 0.0001; % weight decay parameter 權重衰減系數 beta = 3; % weight of sparsity penalty term 稀疏值懲罰項的權重 %%====================================================================== %% STEP 1: Implement sampleIMAGES 第1步：實現圖片采樣 %實現圖片采樣后，函數display_network從訓練集中隨機顯示200張 % After implementing sampleIMAGES, the display_network command should % display a random sample of 200 patches from the dataset %從10000張中隨機選擇200張顯示 patches = sampleIMAGES; figure display_network(patches(:,randi(size(patches,2),200,1)),8) title('sampleIMAGES') %%為產生一個200維的列向量，每一維的值為0~10000中的隨機數，說明是隨機取200個patch來顯示% Obtain random parameters theta 初始化參數向量theta theta = initializeParameters(hiddenSize, visibleSize);%%====================================================================== %% STEP 2: Implement sparseAutoencoderCost %在計算代價函數時，可以一次計算其所有的元素項值（均方差項、權重衰減項、懲罰項），但是一步一步地計算各元素項值， % 然后每步完成后運行梯度檢驗的方法可能會更容易實現，建議按照下面的步驟來實現函數sparseAutoencoderCost： % You can implement all of the components (squared error cost, weight decay term, % sparsity penalty) in the cost function at once, but it may be easier to do % it step-by-step and run gradient checking (see STEP 3) after each step. We % suggest implementing the sparseAutoencoderCost function using the following steps: % % (a) Implement forward propagation in your neural network, and implement the % squared error term of the cost function. Implement backpropagation to % compute the derivatives. Then (using lambda=beta=0), run Gradient Checking % to verify that the calculations corresponding to the squared error cost % term are correct.實現神經網絡中的前向傳播和代價函數中的均方差項。通過反向傳導計算偏導數。 % 然后運行梯度檢驗法來檢查均方差項是否計算錯誤。 % % (b) Add in the weight decay term (in both the cost function and the derivative % calculations), then re-run Gradient Checking to verify correctness. %在代價函數和偏導數計算中加入權重衰減項，然后運行梯度檢驗法來檢查其正確性。 % (c) Add in the sparsity penalty term, then re-run Gradient Checking to % verify correctness.加入懲罰項，然后運行梯度檢驗法來檢查其正確性。 % % Feel free to change the training settings when debugging your % code. (For example, reducing the training set size or % number of hidden units may make your code run faster; and setting beta % and/or lambda to zero may be helpful for debugging.) However, in your % final submission of the visualized weights, please use parameters we % gave in Step 0 above. % 計算代價函數和梯度 [cost, grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, lambda, ...sparsityParam, beta, patches);%%====================================================================== %% STEP 3: Gradient Checking % % Hint: If you are debugging your code, performing gradient checking on smaller models % and smaller training sets (e.g., using only 10 training examples and 1-2 hidden % units) may speed things up.% First, lets make sure your numerical gradient computation is correct for a % simple function. After you have implemented computeNumericalGradient.m, % run the following: checkNumericalGradient();% Now we can use it to check your cost function and derivative calculations % for the sparse autoencoder. % 利用近似方法計算梯度（要調用自編碼器的代價函數計算程序） numgrad = computeNumericalGradient( @(x) sparseAutoencoderCost(x, visibleSize, ...hiddenSize, lambda, ...sparsityParam, beta, ...patches), theta);% Use this to visually compare the gradients side by side % 比較cost函數計算得到的梯度和由近似得到的梯度 disp(' numgrad grad') disp([numgrad grad]); % Compare numerically computed gradients with the ones obtained from backpropagation diff = norm(numgrad-grad)/norm(numgrad+grad); disp(diff); % Should be small. In our implementation, these values are% usually less than 1e-9.% When you got this working, Congratulations!!! %%====================================================================== %% STEP 4: After verifying that your implementation of % sparseAutoencoderCost is correct, You can start training your sparse % autoencoder with minFunc (L-BFGS).% Randomly initialize the parameters theta = initializeParameters(hiddenSize, visibleSize);% Use minFunc to minimize the function addpath minFunc/ options.Method = 'lbfgs'; % Here, we use L-BFGS to optimize our cost% function. Generally, for minFunc to work, you% need a function pointer with two outputs: the% function value and the gradient. In our problem,% sparseAutoencoderCost.m satisfies this. options.maxIter = 400; % Maximum number of iterations of L-BFGS to run options.display = 'on';% opttheta是整個神經網絡的權值和偏執項構成的向量 [opttheta, cost] = minFunc( @(p) sparseAutoencoderCost(p, ...visibleSize, hiddenSize, ...lambda, sparsityParam, ...beta, patches), ...theta, options);%%====================================================================== %% STEP 5: Visualization %%第一層的權值矩陣 W1 = reshape(opttheta(1:hiddenSize*visibleSize), hiddenSize, visibleSize); figure; display_network(W1', 12) title('Visiualization of Weight1') print -djpeg weights.jpg % save the visualization to a file

sampleIMAGES.m

function patches = sampleIMAGES() % sampleIMAGES % Returns 10000 patches for trainingload IMAGES; % load images from disk figure; imshow3D(IMAGES) patchsize = 8; % we'll use 8x8 patches numpatches = 10000;% Initialize patches with zeros. Your code will fill in this matrix--one % column per patch, 10000 columns. patches = zeros(patchsize*patchsize, numpatches);%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Fill in the variable called "patches" using data % from IMAGES. % % IMAGES is a 3D array containing 10 images % For instance, IMAGES(:,:,6) is a 512x512 array containing the 6th image, % and you can type "imagesc(IMAGES(:,:,6)), colormap gray;" to visualize % it. (The contrast on these images look a bit off because they have % been preprocessed using using "whitening." See the lecture notes for % more details.) As a second example, IMAGES(21:30,21:30,1) is an image % patch corresponding to the pixels in the block (21,21) to (30,30) of % Image 1tic image_size=size(IMAGES); i=randi(image_size(1)-patchsize+1,1,numpatches);%生成元素值隨機為大于0且小于image_size(1)-patchsize+1的1行numpatches矩陣 j=randi(image_size(2)-patchsize+1,1,numpatches); k=randi(image_size(3),1,numpatches); for num=1:numpatchespatches(:,num)=reshape(IMAGES(i(num):i(num)+patchsize-1,j(num):j(num)+patchsize-1,k(num)),1,patchsize*patchsize); end toc%% --------------------------------------------------------------- % For the autoencoder to work well we need to normalize the data % Specifically, since the output of the network is bounded between [0,1] % (due to the sigmoid activation function), we have to make sure % the range of pixel values is also bounded between [0,1] patches = normalizeData(patches);end%% --------------------------------------------------------------- function patches = normalizeData(patches)% Squash data to [0.1, 0.9] since we use sigmoid as the activation % function in the output layer% Remove DC (mean of images). 把patches數組中的每個元素值都減去mean(patches) patches = bsxfun(@minus, patches, mean(patches));% Truncate to +/-3 standard deviations and scale to -1 to 1 pstd = 3 * std(patches(:));%把patches的標準差變為其原來的3倍 patches = max(min(patches, pstd), -pstd) / pstd; %因為根據3sigma法則，95%以上的數據都在該區域內 % 這里轉換后將數據變到了-1到1之間% Rescale from [-1,1] to [0.1,0.9] patches = (patches + 1) * 0.4 + 0.1;end

sparseAutoencoderCost.m

function [cost,grad] = sparseAutoencoderCost(theta, visibleSize, hiddenSize, ...lambda, sparsityParam, beta, data) % 計算網絡的代價函數和梯度 % visibleSize: the number of input units (probably 64) 輸入層神經單元節點數 % hiddenSize: the number of hidden units (probably 25) 隱藏層神經單元節點數 % lambda: weight decay parameter權重衰減系數 % sparsityParam: The desired average activation for the hidden units (denoted in the lecture % 稀疏性參數 notes by the greek alphabet rho, which looks like a lower-case "p"). % beta: weight of sparsity penalty term稀疏懲罰項的權重 % data: Our 64x10000 matrix containing the training data. So, data(:,i) is the i-th training example. % 訓練集64x10000 %theta：參數向量，包含W1、W2、b1、b2 % The input theta is a vector (because minFunc expects the parameters to be a vector). % We first convert theta to the (W1, W2, b1, b2) matrix/vector format, so that this % follows the notation convention of the lecture notes. %%將長向量轉換成每一層的權值矩陣和偏置向量值 W1 = reshape(theta(1:hiddenSize*visibleSize), hiddenSize, visibleSize); W2 = reshape(theta(hiddenSize*visibleSize+1:2*hiddenSize*visibleSize), visibleSize, hiddenSize); b1 = theta(2*hiddenSize*visibleSize+1:2*hiddenSize*visibleSize+hiddenSize); b2 = theta(2*hiddenSize*visibleSize+hiddenSize+1:end);% Cost and gradient variables (your code needs to compute these values). % Here, we initialize them to zeros. cost = 0; W1grad = zeros(size(W1)); W2grad = zeros(size(W2)); b1grad = zeros(size(b1)); b2grad = zeros(size(b2));%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: Compute the cost/optimization objective J_sparse(W,b) for the Sparse Autoencoder, % and the corresponding gradients W1grad, W2grad, b1grad, b2grad. % % W1grad, W2grad, b1grad and b2grad should be computed using backpropagation. % Note that W1grad has the same dimensions as W1, b1grad has the same dimensions % as b1, etc. Your code should set W1grad to be the partial derivative of J_sparse(W,b) with % respect to W1. I.e., W1grad(i,j) should be the partial derivative of J_sparse(W,b) % with respect to the input parameter W1(i,j). Thus, W1grad should be equal to the term % [(1/m) \Delta W^{(1)} + \lambda W^{(1)}] in the last block of pseudo-code in Section 2.2 % of the lecture notes (and similarly for W2grad, b1grad, b2grad). % % Stated differently, if we were using batch gradient descent to optimize the parameters, % the gradient descent update to W1 would be W1 := W1 - alpha * W1grad, and similarly for W2, b1, b2. % %% 1.前向傳播forward propagation disp('公式（1）') data_size=size(data); active_value2=repmat(b1,1,data_size(2)); active_value3=repmat(b2,1,data_size(2)); active_value2=sigmoid(W1*data+active_value2); %第2層激活值 active_value3=sigmoid(W2*active_value2+active_value3); %第3層激活值 %% 2.計算代價函數computing error term and cost ave_square=sum(sum((active_value3-data).^2)./2)/data_size(2);%均方差項，誤差項所有樣本代價函數均值 weight_decay=lambda/2*(sum(sum(W1.^2))+sum(sum(W2.^2))); %權重衰減項,所有權值項平方和 disp('公式（2）p_real') p_real=sum(active_value2,2)./data_size(2); %對active_value2每行求和，再平均，得到每個隱藏單元的平均活躍度（25行1列） p_para=repmat(sparsityParam,hiddenSize,1); %稀疏性參數（25行1列） sparsity=beta.*sum(p_para.*log(p_para./p_real)+(1-p_para).*log((1-p_para)./(1-p_real)));%懲罰項，所有隱藏層的神經元相對熵之和，括號內為公式（3） disp('公式（4），公式（5）cost') cost=ave_square+weight_decay+sparsity; %代價函數disp('公式（7）delta3') delta3=(active_value3-data).*(active_value3).*(1-active_value3); %第3層殘差 average_sparsity=repmat(sum(active_value2,2)./data_size(2),1,data_size(2));%每個隱藏單元的平均活躍度（25行10000列）default_sparsity=repmat(sparsityParam,hiddenSize,data_size(2)); %稀疏性參數（25行10000列） disp('公式（6）sparsity_penalty') sparsity_penalty=beta.*(-(default_sparsity./average_sparsity)+((1-default_sparsity)./(1-average_sparsity))); disp('公式（8）delta2') delta2=(W2'*delta3+sparsity_penalty).*((active_value2).*(1-active_value2));%第2層殘差，這里加入了稀疏項 %% 3.反向傳導backword propagation % 計算代價函數對各層權值和偏執項的梯度 W2grad=delta3*active_value2'./data_size(2)+lambda.*W2; %W2梯度 W1grad=delta2*data'./data_size(2)+lambda.*W1; %W1梯度 b2grad=sum(delta3,2)./data_size(2); %b2梯度 b1grad=sum(delta2,2)./data_size(2); %b1梯度%------------------------------------------------------------------- % After computing the cost and gradient, we will convert the gradients back % to a vector format (suitable for minFunc). Specifically, we will unroll % your gradient matrices into a vector.grad = [W1grad(:) ; W2grad(:) ; b1grad(:) ; b2grad(:)];end%%% %% 前向傳播算法 % a1=data; % z2=bsxfun(@plus,W1*a1,b1); % a2=sigmoid(z2); % z3=bsxfun(@plus,W2*a2,b2); % a3=sigmoid(z3); % % %% 計算網絡誤差 % % 誤差項J1=所有樣本代價函數均值 % y=data; % 網絡的理想輸出值 % Ei=sum((a3-y).^2)/2; %每一個樣本的代價函數 % J1=sum(Ei)/m; % % 正則化項J2=所有權值項平方和 % J2=sum(W1(:).^2)+sum(W2(:).^2); % % 稀疏項J3=所有隱藏層的神經元相對熵之和 % rho_hat=sum(a2,2)/m; % KL=sum(sparsityParam*log(sparsityParam./rho_hat)+... % (1-sparsityParam)*log((1-sparsityParam)./(1-rho_hat))); % J3=KL; % % 網絡的代價函數 % cost=J1+lambda*J2/2+beta*J3; % % % %% 反向傳播算法計算各層敏感度delta % delta3=-(data-a3).*dsigmoid(z3); % spare_delta=beta*(-sparsityParam./rho_hat+(1-sparsityParam)./(1-rho_hat)); % delta2=bsxfun(@plus,W2'*delta3,spare_delta).*dsigmoid(z2); % 這里加入了稀疏項 % % %% 計算代價函數對各層權值和偏執項的梯度 % W1grad=delta2*a1'/m+lambda*W1; % W2grad=delta3*a2'/m+lambda*W2; % b1grad=sum(delta2,2)/m; % b2grad=sum(delta3,2)/m; %%%------------------------------------------------------------------- % Here's an implementation of the sigmoid function, which you may find useful % in your computation of the costs and the gradients. This inputs a (row or % column) vector (say (z1, z2, z3)) and returns (f(z1), f(z2), f(z3)). function sigm = sigmoid(x)sigm = 1 ./ (1 + exp(-x)); end%% 求解sigmoid函數的導數 % function dsigm = dsigmoid(x) % sigx = sigmoid(x); % dsigm=sigx.*(1-sigx); % end

computeNumericalGradient.m

function numgrad = computeNumericalGradient(J, theta) % numgrad = computeNumericalGradient(J, theta) % theta: a vector of parameters參數向量，包含W1、W2、b1、b2 % J: a function that outputs a real-number. Calling y = J(theta) will return the % function value at theta. % Initialize numgrad with zeros numgrad = zeros(size(theta));%% ---------- YOUR CODE HERE -------------------------------------- % Instructions: % Implement numerical gradient checking, and return the result in numgrad. % (See Section 2.3 of the lecture notes.) % You should write code so that numgrad(i) is (the numerical approximation to) the % partial derivative of J with respect to the i-th input argument, evaluated at theta. % I.e., numgrad(i) should be the (approximately) the partial derivative of J with % respect to theta(i). % % Hint: You will probably want to compute the elements of numgrad one at a time. EPSILON=0.0001; for i=1:size(theta)theta_plus=theta;theta_minu=theta;theta_plus(i)=theta_plus(i)+EPSILON;theta_minu(i)=theta_minu(i)-EPSILON;numgrad(i)=(J(theta_plus)-J(theta_minu))/(2*EPSILON); end %% --------------------------------------------------------------- end

checkNumericalGradient.m

梯度檢驗是在編寫機器學習算法時必備的技術，可以檢驗所編寫的cost函數是否正確
cost函數的主要功能是：計算代價函數、計算代價函數對參數的梯度
實際程序中，梯度檢驗要配合cost函數一起使用，可以將該部分單獨放在一個測試函數checkCost() 中
① 給定一組樣本及參數初始值
② 利用cost函數計算grad
③ 利用computeNumericalGradient函數計算梯度的近似值numGrad
④ 比較grad和numGrad是否比較相近：如果diff小于1e-6，則cost函數是正確的，否則，需要檢查cost函數
diff = norm(numGrad-grad)/norm(numGrad+grad);
disp(diff);
在確定cost函數沒有問題后，要屏蔽掉梯度檢驗部分的代碼，否則，將會浪費許多時間

function [] = checkNumericalGradient() % 該函數主要目的是檢驗SparseAutoencoderCost函數是否正確 % This code can be used to check your numerical gradient implementation % in computeNumericalGradient.m % It analytically evaluates the gradient of a very simple function called % simpleQuadraticFunction (see below) and compares the result with your numerical % solution. Your numerical gradient implementation is incorrect if % your numerical solution deviates too much from the analytical solution.% Evaluate the function and gradient at x = [4; 10]; (Here, x is a 2d vector.) x = [4; 10]; [value, grad] = simpleQuadraticFunction(x);% Use your code to numerically compute the gradient of simpleQuadraticFunction at x. % (The notation "@simpleQuadraticFunction" denotes a pointer to a function.) numgrad = computeNumericalGradient(@simpleQuadraticFunction, x);% Visually examine the two gradient computations. The two columns % you get should be very similar. disp([numgrad grad]); fprintf('The above two columns you get should be very similar.\n(Left-Your Numerical Gradient, Right-Analytical Gradient)\n\n');% Evaluate the norm of the difference between two solutions. % If you have a correct implementation, and assuming you used EPSILON = 0.0001 % in computeNumericalGradient.m, then diff below should be 2.1452e-12 diff = norm(numgrad-grad)/norm(numgrad+grad); disp(diff); fprintf('Norm of the difference between numerical and analytical gradient (should be < 1e-9)\n\n'); endfunction [value,grad] = simpleQuadraticFunction(x) % this function accepts a 2D vector as input. % Its outputs are: % value: h(x1, x2) = x1^2 + 3*x1*x2 % grad: A 2x1 vector that gives the partial derivatives of h with respect to x1 and x2 % Note that when we pass @simpleQuadraticFunction(x) to computeNumericalGradients, we're assuming % that computeNumericalGradients will use only the first returned value of this function.value = x(1)^2 + 3*x(1)*x(2);grad = zeros(2, 1); grad(1) = 2*x(1) + 3*x(2); grad(2) = 3*x(1);end % %% some initialize % numgrad = zeros(size(theta));%Initialize numgrad with zeros % n = size(theta,1);% theta(1),...,theta(n) % EPSILON =1e-4; % % %% calculate the partial derivative of J with respect to theta(i) % for i = 1:n % theta_add = zeros(n,1); % theta_add(i) = EPSILON; % numgrad(i) = (J(theta + theta_add) - J(theta-theta_add))./EPSILON/2; % end

參考文獻

UFLDL教程

Exercise:Sparse Autoencoder

Deep Learning 1_深度學習UFLDL教程：Sparse Autoencoder練習（斯坦福大學深度學習教程）

Deep learning：九(Sparse Autoencoder練習)

UFLDL教程答案(1):Exercise:Sparse_Autoencoder

UFLDL教程之（一）sparseae_exercise

梯度檢驗！

吳恩達 Andrew Ng 的公開課

總結

以上是生活随笔為你收集整理的UFLDL教程： Exercise: Sparse Autoencoder的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：怎么申请浦发苏宁易购联名信用卡？申请条件
下一篇：农行信用币有额度拒绝是怎么回事？申请被拒

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频 在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操

编程问答

UFLDL教程： Exercise: Sparse Autoencoder

自編碼可以跟PCA 一樣，給特征屬性降維

總結

国产亚洲精品久久久久动-影视先锋中文字幕-av网站在线观看一区-亚洲视频在线观看-久久亚洲不卡-欧美精品一区在线观看-欧美乱淫视频-欧美熟妇另类久久久久久不卡-粉嫩av一区二区三区四区五区-日韩欧美操