NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)
NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)
Notice
The original text comes from?NVIDIA-AI Course. This article only provides Chinese translation.
?
?
?
目錄
Image Regression
Classification Vs. Regression ?分類與回歸
Continuous Outputs ?連續(xù)輸出
Changing The Final Layer ?改變最后一層
Evaluation ?評價(jià)
Face XY Project ?人臉坐標(biāo)項(xiàng)目
Interactive Tool Startup Steps ?交互式工具啟動(dòng)步驟
More Regression Projects ? ?更多的回歸項(xiàng)目
???????
?
?
?
?
Image Regression
正在更新……
Classification Vs. Regression ?分類與回歸
? ? ? ?Unlike Image Classification applications, which map image inputs to?discrete?outputs (classes), the Image Regression task maps the image input pixels to?continuous?outputs.
? ? ? ?與將圖像輸入映射到離散輸出(類)的圖像分類應(yīng)用程序不同,圖像回歸任務(wù)將圖像輸入像素映射到連續(xù)輸出。
Continuous Outputs ?連續(xù)輸出
? ? ? In the course regression project, those continuous outputs happen to define the X and Y coordinates of various features on a face, such as a?nose. Mapping an image stream to a location for tracking can be used in other applications, such as following a line in mobile robotics. Tracking isn't the only thing a Regression model can do though. The output values could be something quite different such as steering values, or camera movement parameters.
? ? ? ?在課程回歸項(xiàng)目中,這些連續(xù)的輸出恰好定義了人臉(如鼻子)上各種特征的X和Y坐標(biāo)。將圖像流映射到用于跟蹤的位置可以用于其他應(yīng)用程序,比如在移動(dòng)機(jī)器人中跟蹤一條線。跟蹤并不是回歸模型唯一能做的事情。輸出值可以是一些完全不同的東西,如轉(zhuǎn)向值,或相機(jī)運(yùn)動(dòng)參數(shù)。
Changing The Final Layer ?改變最后一層
? ? ? ?The final layer of the pre-trained ResNet-18 network is a fully connected (fc) layer that has 512 inputs mapped to 1000 output classes, or (512, 1000). Using transfer learning in the Image Classification projects, that last layer was changed to only a few classes, depending on the application. For example, if there are to be 3 classes trained, we change the fc layer to (512, 3). The output includes the final layer of the neural network as a fully connected layer, with 512 inputs mapped to 3 classes.
? ? ? ?預(yù)訓(xùn)練的ResNet-18網(wǎng)絡(luò)的最后一層是完全連接(fc)層,其中有512個(gè)輸入映射到1000個(gè)輸出類(512,1000)。在圖像分類項(xiàng)目中使用遷移學(xué)習(xí),根據(jù)應(yīng)用程序的不同,最后一層只更改為幾個(gè)類。例如,如果要訓(xùn)練3個(gè)類,我們將fc層更改為(52,3),輸出包括神經(jīng)網(wǎng)絡(luò)的最后一層作為全連接層,其中512個(gè)輸入映射到3個(gè)類。
? ? ? ?In the case of a Regression project predicting coordinates, we want?two?values for each category, the X and Y values. That means twice as many outputs are required in the fc layer. For example, if there are 3 facial features (nose,?left_eye,?right_eye), each with both an X and Y output, then 6 outputs are required, or (512, 6) for the fc layer.
? ? ? ?對于預(yù)測坐標(biāo)的回歸項(xiàng)目,我們希望每個(gè)類別都有兩個(gè)值,X和Y值。這意味著fc層需要兩倍的輸出。例如,如果有3個(gè)面部特征(鼻子、左眼、右眼),每個(gè)都有X和Y輸出,那么fc層需要6個(gè)輸出,或者(512,6)。
? ? ? ?In classification, recall that the softmax function was used to build a probability distribution of the output values. For regression, we want to keep the actual values, because we didn't train for probabilities, but for actual X and Y output values.
? ? ? ?在分類中,記得使用softmax函數(shù)來構(gòu)建輸出值的概率分布。對于回歸,我們想要保留實(shí)際的值,因?yàn)槲覀儧]有訓(xùn)練概率,而是實(shí)際的X和Y的輸出值。
Evaluation ?評價(jià)
? ? ? ?Classification and Regression also differ in the way they are evaluated. The discrete values of classification can be evaluated based on accuracy, i.e. a calculation of the percentage of "right" answers. In the case of regression, we are interested in getting as close as possible to a correct answer. Therefore, the root mean squared error can be used.
? ? ? ?分類和回歸在評估方法上也有所不同。分類的離散值可以根據(jù)準(zhǔn)確度來評估,即計(jì)算“正確”答案的百分比。在回歸的情況下,我們感興趣的是盡可能接近一個(gè)正確的答案。因此,可以使用均方根誤差。
?
Face XY Project ?人臉坐標(biāo)項(xiàng)目
? ? ? The goal of this project is to build an Image Regression project that can predict the X and Y coordinates of a facial feature in a live image.
? ? ? 該項(xiàng)目的目標(biāo)是建立一個(gè)圖像回歸項(xiàng)目,可以預(yù)測一個(gè)活圖像中面部特征的X和Y坐標(biāo)。
Interactive Tool Startup Steps ?交互式工具啟動(dòng)步驟
? ? ? ? You will implement the project by collecting your own data using a clickable image display tool, training a model to find the XY coordinates of the feature, and then testing and updating your model as needed using images from the live camera. Since you are collecting two values for each category, the model may require more training and data to get a satisfactory result.?
? ? ? ? 您將通過使用可單擊的圖像顯示工具收集您自己的數(shù)據(jù)來實(shí)現(xiàn)該項(xiàng)目,訓(xùn)練一個(gè)模型來找到特性的XY坐標(biāo),然后根據(jù)需要使用來自live camera的圖像測試和更新您的模型。由于您為每個(gè)類別收集兩個(gè)值,因此模型可能需要更多的訓(xùn)練和數(shù)據(jù)來獲得滿意的結(jié)果。
Be patient!?Building your model is an iterative process. ?要有耐心!構(gòu)建模型是一個(gè)迭代過程。
Step 1: Open The Notebook ??第一步:打開筆記本
? ? ?To get started, navigate to the regression folder in your JupyterLab interface and double-click the?regression_interactive.ipynb?notebook to open it.
? ? ?首先,導(dǎo)航到JupyterLab界面中的regression文件夾,雙擊regression_interactive。ipynb筆記本打開它。
Step 2: Execute All Of The Code Blocks ??步驟2:執(zhí)行所有代碼塊
? ? ?The notebook is designed to be reusable for any XY regression task you wish to build. Step through the code blocks and execute them one at a time.
? ? ?記事本的設(shè)計(jì)是可重用的任何XY回歸任務(wù),您希望建立。遍歷代碼塊并一次執(zhí)行一個(gè)。
This block sets the size of the images and starts the camera. If your camera is already active in this notebook or in another notebook, first shut down the kernel in the active notebook before running this code cell. Make sure that the correct camera type is selected for execution (USB or CSI). This cell may take several seconds to execute.
此塊設(shè)置圖像的大小并啟動(dòng)相機(jī)。如果您的相機(jī)已經(jīng)在本筆記本或其他筆記本中處于活動(dòng)狀態(tài),那么在運(yùn)行此代碼單元之前,請先關(guān)閉活動(dòng)筆記本中的內(nèi)核。確保選擇正確的相機(jī)類型執(zhí)行(USB或CSI)。此單元格可能需要幾秒鐘執(zhí)行。
You get to define your?TASK?and?CATEGORIES?parameters here, as well as how many datasets you want to track. For the Face XY Project, this has already been defined for you as the?face?task with categories of?nose, left_eye, and right_eye. Each category for the XY regression tool will require both an X and Y values. Go ahead and execute the cell. Subdirectories for each category are created to store the example images you collect. The file names of the images will contain the XY coordinates that you tag the images with during the data collection step. This cell should only take a few seconds to execute.
您可以在這里定義任務(wù)和類別參數(shù),以及要跟蹤的數(shù)據(jù)集的數(shù)量。對于Face XY項(xiàng)目,這已經(jīng)為您定義為Face任務(wù),包含nose、left_eye和right_eye類別。XY回歸工具的每個(gè)類別都需要X和Y值。繼續(xù)執(zhí)行單元格。創(chuàng)建每個(gè)類別的子目錄來存儲(chǔ)您收集的示例圖像。圖像的文件名將包含在數(shù)據(jù)收集步驟中標(biāo)記圖像所用的XY坐標(biāo)。這個(gè)單元格只需要幾秒鐘就可以執(zhí)行。
You’ll collect images for your categories with a special clickable image widget set up in this cell. As you click the “nose” or “eye” in the live feed image, the data image filename is automatically annotated and saved using the X and Y coordinates from the click.
您將使用這個(gè)單元格中設(shè)置的一個(gè)特殊的可單擊圖像小部件為您的類別收集圖像。當(dāng)您單擊實(shí)時(shí)提要圖像中的“nose”或“eye”時(shí),數(shù)據(jù)圖像文件名將使用單擊中的X和Y坐標(biāo)自動(dòng)注釋和保存。
The model is set to the same pre-trained ResNet18 model for this project:
模型設(shè)置為本項(xiàng)目相同的預(yù)訓(xùn)練后的ResNet18模型:
model = torchvision.models.resnet18(pretrained=True)
For more information on available PyTorch pre-trained models, see the?PyTorch documentation. In addition to choosing the model, the last layer of the model is modified to accept only the number of classes that we are training for. In the case of the Face XY Project, it is twice the number of categories, since each requires both X and Y coordinates (i.e.?nose X,?nose Y,?left_eye X,?right_eye X?and?right_eye Y).
有關(guān)可用的PyTorch預(yù)培訓(xùn)模型的更多信息,請參閱PyTorch文檔。除了選擇模型外,模型的最后一層被修改為只接受我們要培訓(xùn)的類的數(shù)量。在Face XY項(xiàng)目中,它是類別數(shù)的兩倍,因?yàn)槊總€(gè)類別都需要X和Y坐標(biāo)(即鼻子X,鼻子Y,左眼X,右眼X和右眼Y)。
output_dim = 2 * len(dataset.categories)
model.fc = torch.nn.Linear(512, output_dim)
This code cell may take several seconds to execute.
執(zhí)行此代碼單元格可能需要幾秒鐘。
This code block sets up threading to run the model in the background so that you can view the live camera feed and visualize the model performance in real time. This cell should only take a few seconds to execute. For this project, circle blue circle will overlay the model prediction for the location of the feature selected.
此代碼塊設(shè)置線程在后臺(tái)運(yùn)行模型,以便您可以查看實(shí)時(shí)攝像機(jī)提要并實(shí)時(shí)可視化模型性能。這個(gè)單元格只需要幾秒鐘就可以執(zhí)行。對于這個(gè)項(xiàng)目,circle blue circle將覆蓋所選特征位置的模型預(yù)測。
The training code cell sets the hyper-parameters for the model training (number of epochs, batch size, learning rate, momentum) and loads the images for training or evaluation. The regression version is very similar to the simple classification training, though the loss is calculated differently. The mean square error over the X and Y value errors is calculated and used as the loss for backpropagation in training to improve the model. This code cell may take several seconds to execute.
訓(xùn)練代碼單元設(shè)置模型訓(xùn)練的超參數(shù)(epochs數(shù)、批大小、學(xué)習(xí)率、動(dòng)量),并加載用于訓(xùn)練或評估的圖像。回歸版本與簡單分類訓(xùn)練非常相似,只是計(jì)算損失的方法不同。通過計(jì)算X、Y值誤差的均方誤差,作為訓(xùn)練中反向傳播的損失,對模型進(jìn)行了改進(jìn)。執(zhí)行此代碼單元格可能需要幾秒鐘。
This is the last code cell. All that's left to do is pack all the widgets into one comprehensive tool and display it. This cell may take several seconds to run and should display the full tool for you to work with. There are three image windows. Initially, only the left camera feed is populated. The middle window will display the most recent annotated snapshot image once you start collecting data. The right-most window will display the live prediction view once the model has been trained.
這是最后一個(gè)代碼單元格。剩下要做的就是將所有小部件打包到一個(gè)全面的工具中并顯示它。這個(gè)單元格可能需要幾秒鐘的時(shí)間來運(yùn)行,應(yīng)該會(huì)顯示要使用的完整工具。有三個(gè)圖像窗口。最初,只填充左攝像機(jī)提要。一旦開始收集數(shù)據(jù),中間的窗口將顯示最新的帶注釋快照圖像。一旦模型被訓(xùn)練好,最右邊的窗口將顯示實(shí)時(shí)預(yù)測視圖。
Step 3: Collect Data, Train, Test ? ?第三步:收集數(shù)據(jù),訓(xùn)練,測試
? ? ? Position the camera in front of your face and collect initial data. Point to the target feature with the mouse cursor that matches the category you've selected (such as the?nose). Click to collect data. The annotated snapshot you just collected will appear in the middle display box. As you collect each image, vary your head position and pose:
? ? ? ?把相機(jī)放在你的臉前面,收集初始數(shù)據(jù)。用鼠標(biāo)指針指向與您選擇的類別匹配的目標(biāo)特性(例如鼻子)。單擊以收集數(shù)據(jù)。您剛剛收集的帶注釋的快照將出現(xiàn)在中間的顯示框中。當(dāng)你收集每張圖片時(shí),改變你的頭部位置和姿勢:
添加20張圖片,您的左眼臉與左t_eye類別選擇
Step 4: Improve Your Model ? ?第四步:改進(jìn)你的模型
? ? ? Use the live inference as a guide to improve your model! The live feed shows the model's prediction. As you move your head, does the target circle correctly follow your nose (or left_eye, right_eye)? If not, then click the correct location and add data. After you've added some data for a new scenario, train the model some more. For example:
? ? ? 使用活動(dòng)推理作為指導(dǎo)來改進(jìn)您的模型!實(shí)時(shí)feed顯示了模型的預(yù)測。當(dāng)你移動(dòng)頭部時(shí),目標(biāo)圓是否正確地跟隨你的鼻子(或left t_eye, right_eye)?如果沒有,則單擊正確的位置并添加數(shù)據(jù)。在為新場景添加了一些數(shù)據(jù)之后,對模型進(jìn)行更多的培訓(xùn)。例如:
- Move the camera so that the face is closer. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
移動(dòng)相機(jī),讓臉更近。預(yù)測器的性能還好嗎?如果沒有,嘗試為每個(gè)類別添加一些數(shù)據(jù)(每個(gè)類別10個(gè))并重新訓(xùn)練(5個(gè)epochs)。這有幫助嗎?你可以試驗(yàn)更多的數(shù)據(jù)和更多的訓(xùn)練。 - Move the camera to provide a different background. Is the performance of the predictor still good? If not, try adding some data for each category (10 each) and retrain (5 epochs). Does this help? You can experiment with more data and more training.
移動(dòng)相機(jī)以提供不同的背景。預(yù)測器的性能還好嗎?如果沒有,嘗試為每個(gè)類別添加一些數(shù)據(jù)(每個(gè)類別10個(gè))并重新訓(xùn)練(5個(gè)epochs)。這有幫助嗎?你可以試驗(yàn)更多的數(shù)據(jù)和更多的訓(xùn)練。 - Are there any other scenarios you think the model might not perform well? Try them out!
您是否認(rèn)為模型還可能執(zhí)行得不好?試一試! - Can you get a friend to try your model? Does it work the same? You know the drill: more data and training!
你能找個(gè)朋友試試你的模型嗎?工作原理一樣嗎?你知道這個(gè)練習(xí):更多的數(shù)據(jù)和訓(xùn)練!???????
Step 5: Save Your Model ? ?第五步:保存模型
? ? ? When you are satisfied with your model, save it by entering a name in the "model path" box and click "save model".
? ? ? 當(dāng)您對您的模型感到滿意時(shí),通過在“模型路徑”框中輸入一個(gè)名稱并單擊“保存模型”保存模型。
More Regression Projects ? ????????更多的回歸項(xiàng)目
? ? ? To build another project, follow the pattern you did with the Face Project. Save your previous work, modify the?TASK?and?CATEGORIES?values, shutdown and restart the notebook, and run all the cells. Then collect, train, and test!
? ? ? 要構(gòu)建另一個(gè)項(xiàng)目,請遵循您對Face項(xiàng)目所做的模式。保存以前的工作,修改任務(wù)和類別值,關(guān)閉和重啟筆記本,并運(yùn)行所有單元格。然后收集、訓(xùn)練和測試!
?
?
?
總結(jié)
以上是生活随笔為你收集整理的NVIDIA之AI Course:Getting Started with AI on Jetson Nano—Class notes(四)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: NVIDIA之AI Course:Get
- 下一篇: NVIDIA之AI Course:Get