From 53f20d58628171934c097dff5602fe17765eae99 Mon Sep 17 00:00:00 2001 From: leshe4ka46 Date: Thu, 25 Dec 2025 21:28:30 +0300 Subject: finish --- Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb | 1492 ++++++++++++++++++++++++ 1 file changed, 1492 insertions(+) create mode 100644 Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb (limited to 'Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb') diff --git a/Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb b/Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb new file mode 100644 index 0000000..c3d0c2a --- /dev/null +++ b/Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb @@ -0,0 +1,1492 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "weabkZTF3ZZM" + }, + "source": [ + "
\"Header\"
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dz8YI6Fb3ZZN" + }, + "source": [ + "# 3. Convolutional Neural Networks" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8UWR4l4X3ZZN" + }, + "source": [ + "In the previous section, we built and trained a simple model to classify ASL images. The model was able to learn how to correctly classify the training dataset with very high accuracy, but, it did not perform nearly as well on validation dataset. This behavior of not generalizing well to non-training data is called [overfitting](https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html), and in this section, we will introduce a popular kind of model called a [convolutional neural network](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53) that is especially good for reading images and classifying them." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GmRLS07k3ZZN" + }, + "source": [ + "## 3.1 Objectives" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Iuvwj_tr3ZZN" + }, + "source": [ + "* Prep data specifically for a CNN\n", + "* Create a more sophisticated CNN model, understanding a greater variety of model layers\n", + "* Train a CNN model and observe its performance" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 221, + "status": "ok", + "timestamp": 1715240535370, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "9kMRTHEV2AFm", + "outputId": "f1fb3858-e6a7-4906-ec7e-c4d34abcf013" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import torch.nn as nn\n", + "import pandas as pd\n", + "import torch\n", + "from torch.optim import Adam\n", + "from torch.utils.data import Dataset, DataLoader\n", + "\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "torch.cuda.is_available()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xEGukATl3ZZN" + }, + "source": [ + "## 3.2 Loading and Preparing the Data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2.1 Preparing Images" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "-SyD7hID3ZZN" + }, + "source": [ + "Let's begin by loading our DataFrames like we did in the previous lab:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "executionInfo": { + "elapsed": 3372, + "status": "ok", + "timestamp": 1715240541334, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "XMMgEMcc2Ehg" + }, + "outputs": [], + "source": [ + "train_df = pd.read_csv(\"data/asl_data/sign_mnist_train.csv\")\n", + "valid_df = pd.read_csv(\"data/asl_data/sign_mnist_valid.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
labelpixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9...pixel775pixel776pixel777pixel778pixel779pixel780pixel781pixel782pixel783pixel784
03107118127134139143146150153...207207207207206206206204203202
16155157156156156157156158158...691491288794163175103135149
22187188188187187186187188187...202201200199198199198195194195
32211211212212211210211210210...235234233231230226225222229163
412164167170172176179180184185...92105105108133163157163164179
\n", + "

5 rows × 785 columns

\n", + "
" + ], + "text/plain": [ + " label pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 \\\n", + "0 3 107 118 127 134 139 143 146 150 \n", + "1 6 155 157 156 156 156 157 156 158 \n", + "2 2 187 188 188 187 187 186 187 188 \n", + "3 2 211 211 212 212 211 210 211 210 \n", + "4 12 164 167 170 172 176 179 180 184 \n", + "\n", + " pixel9 ... pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 \\\n", + "0 153 ... 207 207 207 207 206 206 \n", + "1 158 ... 69 149 128 87 94 163 \n", + "2 187 ... 202 201 200 199 198 199 \n", + "3 210 ... 235 234 233 231 230 226 \n", + "4 185 ... 92 105 105 108 133 163 \n", + "\n", + " pixel781 pixel782 pixel783 pixel784 \n", + "0 206 204 203 202 \n", + "1 175 103 135 149 \n", + "2 198 195 194 195 \n", + "3 225 222 229 163 \n", + "4 157 163 164 179 \n", + "\n", + "[5 rows x 785 columns]" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This ASL data is already flattened." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[107, 118, 127, ..., 204, 203, 202],\n", + " [155, 157, 156, ..., 103, 135, 149],\n", + " [187, 188, 188, ..., 195, 194, 195],\n", + " [211, 211, 212, ..., 222, 229, 163],\n", + " [164, 167, 170, ..., 163, 164, 179]])" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sample_df = train_df.head().copy() # Grab the top 5 rows\n", + "sample_df.pop('label')\n", + "sample_x = sample_df.values\n", + "sample_x" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5, 784)" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sample_x.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this format, we don't have all the information about which pixels are near each other. Because of this, we can't apply convolutions that will detect features. Let's [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) our dataset so that they are in a 28x28 pixel format. This will allow our convolutions to associate groups of pixels and detect important features.\n", + "\n", + "Note that for the first convolutional layer of our model, we need to have not only the height and width of the image, but also the number of [color channels](https://www.photoshopessentials.com/essentials/rgb/). Our images are grayscale, so we'll just have 1 channel.\n", + "\n", + "That means that we need to convert the current shape `(5, 784)` to `(5, 1, 28, 28)`. With [NumPy](https://numpy.org/doc/stable/index.html) arrays, we can pass a `-1` for any dimension we wish to remain the same." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(5, 1, 28, 28)" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "IMG_HEIGHT = 28\n", + "IMG_WIDTH = 28\n", + "IMG_CHS = 1\n", + "\n", + "sample_x = sample_x.reshape(-1, IMG_CHS, IMG_HEIGHT, IMG_WIDTH)\n", + "sample_x.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2.2 Create a Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's add the steps above into our `MyDataset` class." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There are 4 `FIXME`s in the class definition below. Can you replace them with the correct values?" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "executionInfo": { + "elapsed": 173, + "status": "ok", + "timestamp": 1715240547901, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "tpzGOri32Klj" + }, + "outputs": [], + "source": [ + "class MyDataset(Dataset):\n", + " def __init__(self, base_df):\n", + " x_df = base_df.copy() # Some operations below are in-place\n", + " y_df = x_df.pop('label')\n", + " x_df = x_df.values / 255 # Normalize values from 0 to 1\n", + " x_df = x_df.reshape(-1, IMG_CHS, IMG_WIDTH, IMG_HEIGHT)\n", + " self.xs = torch.tensor(x_df).float().to(device)\n", + " self.ys = torch.tensor(y_df).to(device)\n", + "\n", + " def __getitem__(self, idx):\n", + " x = self.xs[idx]\n", + " y = self.ys[idx]\n", + " return x, y\n", + "\n", + " def __len__(self):\n", + " return len(self.xs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Solution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click the `...` below for the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SOLUTION\n", + "class MyDataset(Dataset):\n", + " def __init__(self, base_df):\n", + " x_df = base_df.copy() # Some operations below are in-place\n", + " y_df = x_df.pop('label')\n", + " x_df = x_df.values / 255 # Normalize values from 0 to 1\n", + " x_df = x_df.reshape(-1, IMG_CHS, IMG_WIDTH, IMG_HEIGHT)\n", + " self.xs = torch.tensor(x_df).float().to(device)\n", + " self.ys = torch.tensor(y_df).to(device)\n", + "\n", + " def __getitem__(self, idx):\n", + " x = self.xs[idx]\n", + " y = self.ys[idx]\n", + " return x, y\n", + "\n", + " def __len__(self):\n", + " return len(self.xs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3.2.3 Create a DataLoader" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Next, let's create the DataLoader from the Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One of these function calls is missing the `shuffle=True` argument. Can you remember which one it is and add it back in?" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "executionInfo": { + "elapsed": 1096, + "status": "ok", + "timestamp": 1715240550115, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "unf8Cz4WcK_M" + }, + "outputs": [], + "source": [ + "BATCH_SIZE = 32\n", + "\n", + "train_data = MyDataset(train_df)\n", + "train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)\n", + "train_N = len(train_loader.dataset)\n", + "\n", + "valid_data = MyDataset(valid_df)\n", + "valid_loader = DataLoader(valid_data, batch_size=BATCH_SIZE)\n", + "valid_N = len(valid_loader.dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Solution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click the `...` below for the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SOLUTION\n", + "train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's grab a batch from the DataLoader to make sure it works." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 2, + "status": "ok", + "timestamp": 1715240550382, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "Z4xylt03dz1W", + "outputId": "80447d85-302d-4549-976b-f4c3ac0f0644" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[tensor([[[[0.7373, 0.7373, 0.7412, ..., 0.2431, 0.1765, 0.1804],\n", + " [0.7412, 0.7451, 0.7490, ..., 0.2196, 0.1647, 0.1608],\n", + " [0.7529, 0.7569, 0.7569, ..., 0.2078, 0.1529, 0.1451],\n", + " ...,\n", + " [0.2235, 0.2118, 0.2000, ..., 0.7686, 0.7647, 0.7569],\n", + " [0.2118, 0.1961, 0.1804, ..., 0.7725, 0.7647, 0.7333],\n", + " [0.1961, 0.1804, 0.1725, ..., 0.7804, 0.7608, 0.6353]]],\n", + " \n", + " \n", + " [[[0.7059, 0.7059, 0.7098, ..., 0.6902, 0.6275, 0.3882],\n", + " [0.7098, 0.7137, 0.7176, ..., 0.6863, 0.6902, 0.5059],\n", + " [0.7216, 0.7294, 0.7294, ..., 0.7020, 0.7098, 0.4078],\n", + " ...,\n", + " [0.8157, 0.8196, 0.8275, ..., 0.7843, 0.7373, 0.7373],\n", + " [0.8157, 0.8235, 0.8196, ..., 0.7882, 0.7451, 0.7255],\n", + " [0.8196, 0.8235, 0.7843, ..., 0.7255, 0.7176, 0.7059]]],\n", + " \n", + " \n", + " [[[0.8863, 0.8902, 0.8863, ..., 0.7961, 0.7882, 0.7804],\n", + " [0.8941, 0.8941, 0.8902, ..., 0.8039, 0.7961, 0.7843],\n", + " [0.8941, 0.8980, 0.8980, ..., 0.8078, 0.8039, 0.7922],\n", + " ...,\n", + " [0.5490, 0.5451, 0.5333, ..., 0.2549, 0.1647, 0.1804],\n", + " [0.5647, 0.5333, 0.5333, ..., 0.2431, 0.1529, 0.1647],\n", + " [0.5490, 0.5373, 0.5255, ..., 0.2235, 0.1412, 0.1608]]],\n", + " \n", + " \n", + " ...,\n", + " \n", + " \n", + " [[[0.2941, 0.3451, 0.3843, ..., 0.6235, 0.6275, 0.6235],\n", + " [0.3059, 0.3569, 0.3922, ..., 0.6392, 0.6392, 0.6353],\n", + " [0.3176, 0.3686, 0.4078, ..., 0.6510, 0.6510, 0.6667],\n", + " ...,\n", + " [0.3255, 0.3373, 0.3490, ..., 0.4353, 0.3020, 0.2392],\n", + " [0.2980, 0.2980, 0.3059, ..., 0.3882, 0.2863, 0.2353],\n", + " [0.3098, 0.3137, 0.3294, ..., 0.3294, 0.2588, 0.2275]]],\n", + " \n", + " \n", + " [[[0.7647, 0.7686, 0.7725, ..., 0.7176, 0.7098, 0.7059],\n", + " [0.7725, 0.7725, 0.7765, ..., 0.7216, 0.7176, 0.7137],\n", + " [0.7765, 0.7804, 0.7804, ..., 0.7333, 0.7216, 0.7176],\n", + " ...,\n", + " [0.8392, 0.8431, 0.8392, ..., 0.7922, 0.7922, 0.7882],\n", + " [0.8275, 0.8353, 0.8392, ..., 0.7882, 0.7843, 0.7765],\n", + " [0.8275, 0.8314, 0.8392, ..., 0.7922, 0.7843, 0.7804]]],\n", + " \n", + " \n", + " [[[0.7216, 0.7412, 0.7686, ..., 0.9490, 0.9412, 0.9294],\n", + " [0.7373, 0.7569, 0.7804, ..., 0.9608, 0.9529, 0.9451],\n", + " [0.7490, 0.7725, 0.7922, ..., 0.9765, 0.9725, 0.9608],\n", + " ...,\n", + " [0.9529, 0.9765, 0.9922, ..., 0.0824, 0.1373, 0.2510],\n", + " [0.9608, 0.9922, 1.0000, ..., 0.1216, 0.0902, 0.2078],\n", + " [0.7569, 0.7686, 0.7765, ..., 0.1725, 0.0627, 0.1725]]]],\n", + " device='cuda:0'),\n", + " tensor([ 2, 8, 12, 19, 11, 2, 13, 20, 4, 15, 17, 23, 7, 0, 14, 9, 8, 20,\n", + " 5, 3, 9, 11, 12, 7, 23, 5, 15, 21, 18, 21, 18, 17],\n", + " device='cuda:0')]" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch = next(iter(train_loader))\n", + "batch" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It looks different, but let's check the `shape`s to be sure." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 205, + "status": "ok", + "timestamp": 1715240552534, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "vannMV7sd6R_", + "outputId": "627858a2-a4ed-467c-cf82-2b7c1a01c13f" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32, 1, 28, 28])" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch[0].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 204, + "status": "ok", + "timestamp": 1715240553488, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "YHJgP3A7d9lu", + "outputId": "4a40ceb8-039b-4517-de8a-bdcb814c4164" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32])" + ] + }, + "execution_count": 14, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch[1].shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6biSPXKJ3ZZP" + }, + "source": [ + "## 3.3 Creating a Convolutional Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ppdkNb1A3ZZP" + }, + "source": [ + "These days, many data scientists start their projects by borrowing model properties from a similar project. Assuming the problem is not totally unique, there's a great chance that people have created models that will perform well which are posted in online repositories like [TensorFlow Hub](https://www.tensorflow.org/hub) and the [NGC Catalog](https://ngc.nvidia.com/catalog/models). Today, we'll provide a model that will work well for this problem.\n", + "\n", + "\n", + "\n", + "We covered many of the different kinds of layers in the lecture, and we will go over them all here with links to their documentation. When in doubt, read the official documentation (or ask [Stack Overflow](https://stackoverflow.com/))." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "executionInfo": { + "elapsed": 202, + "status": "ok", + "timestamp": 1715240555184, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "p_bvGpMId_6q" + }, + "outputs": [], + "source": [ + "n_classes = 24\n", + "kernel_size = 3\n", + "flattened_img_size = 75 * 3 * 3\n", + "\n", + "model = nn.Sequential(\n", + " # First convolution\n", + " nn.Conv2d(IMG_CHS, 25, kernel_size, stride=1, padding=1), # 25 x 28 x 28\n", + " nn.BatchNorm2d(25),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(2, stride=2), # 25 x 14 x 14\n", + " # Second convolution\n", + " nn.Conv2d(25, 50, kernel_size, stride=1, padding=1), # 50 x 14 x 14\n", + " nn.BatchNorm2d(50),\n", + " nn.ReLU(),\n", + " nn.Dropout(.2),\n", + " nn.MaxPool2d(2, stride=2), # 50 x 7 x 7\n", + " # Third convolution\n", + " nn.Conv2d(50, 75, kernel_size, stride=1, padding=1), # 75 x 7 x 7\n", + " nn.BatchNorm2d(75),\n", + " nn.ReLU(),\n", + " nn.MaxPool2d(2, stride=2), # 75 x 3 x 3\n", + " # Flatten to Dense\n", + " nn.Flatten(),\n", + " nn.Linear(flattened_img_size, 512),\n", + " nn.Dropout(.3),\n", + " nn.ReLU(),\n", + " nn.Linear(512, n_classes)\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8WsDr9gE3ZZP" + }, + "source": [ + "### 3.3.1 [Conv2D](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "8eHXRtWa3ZZP" + }, + "source": [ + "\n", + "\n", + "These are our 2D convolutional layers. Small kernels will go over the input image and detect features that are important for classification. Earlier convolutions in the model will detect simple features such as lines. Later convolutions will detect more complex features. Let's look at our first Conv2D layer:\n", + "```Python\n", + "nn.Conv2d(IMG_CHS, 25, kernel_size, stride=1, padding=1)\n", + "```\n", + "25 refers to the number of filters that will be learned. Even though `kernel_size = 3`, PyTorch will assume we want 3 x 3 filters. Stride refer to the step size that the filter will take as it passes over the image. Padding refers to whether the output image that's created from the filter will match the size of the input image." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OiuMlsan3ZZQ" + }, + "source": [ + "### 3.3.2 [BatchNormalization](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Mp72aAnK3ZZQ" + }, + "source": [ + "Like normalizing our inputs, batch normalization scales the values in the hidden layers to improve training. [Read more about it in detail here](https://blog.paperspace.com/busting-the-myths-about-batch-normalization/).\n", + "\n", + "There is a debate on best where to put the batch normalization layer. [This Stack Overflow post](https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout) compiles many perspectives." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "twarf_s63ZZQ" + }, + "source": [ + "### 3.3.3 [MaxPool2D](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MoNIzZZW3ZZQ" + }, + "source": [ + "\n", + "Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side), and also makes our model faster." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mzHlBRja3ZZQ" + }, + "source": [ + "### 3.3.4 [Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "FJjrPvkm3ZZQ" + }, + "source": [ + "\n", + "Dropout is a technique for preventing overfitting. Dropout randomly selects a subset of neurons and turns them off, so that they do not participate in forward or backward propagation in that particular pass. This helps to make sure that the network is robust and redundant, and does not rely on any one area to come up with answers. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NRYPkQPA3ZZQ" + }, + "source": [ + "### 3.3.5 [Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QuMt-DpZ3ZZQ" + }, + "source": [ + "Flatten takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array. The output is called a feature vector and will be connected to the final classification layer." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pSur4TGx3ZZQ" + }, + "source": [ + "### 3.3.6 [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PATqMedY3ZZQ" + }, + "source": [ + "We have seen dense linear layers before in our earlier models. Our first dense layer (512 units) takes the feature vector as input and learns which features will contribute to a particular classification. The second dense layer (24 units) is the final classification layer that outputs our prediction." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_opXKGWj3ZZQ" + }, + "source": [ + "## 3.4 Summarizing the Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Eo6eRrp23ZZQ" + }, + "source": [ + "This may feel like a lot of information, but don't worry. It's not critical that to understand everything right now in order to effectively train convolutional models. Most importantly we know that they can help with extracting useful information from images, and can be used in classification tasks." + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 200, + "status": "ok", + "timestamp": 1715240557183, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "2IAS92gZwcP3", + "outputId": "56678948-aed0-4aa3-dde9-b8cecbaff44d" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "OptimizedModule(\n", + " (_orig_mod): Sequential(\n", + " (0): Conv2d(1, 25, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (1): BatchNorm2d(25, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", + " (2): ReLU()\n", + " (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (4): Conv2d(25, 50, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (5): BatchNorm2d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", + " (6): ReLU()\n", + " (7): Dropout(p=0.2, inplace=False)\n", + " (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (9): Conv2d(50, 75, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n", + " (10): BatchNorm2d(75, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", + " (11): ReLU()\n", + " (12): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n", + " (13): Flatten(start_dim=1, end_dim=-1)\n", + " (14): Linear(in_features=675, out_features=512, bias=True)\n", + " (15): Dropout(p=0.3, inplace=False)\n", + " (16): ReLU()\n", + " (17): Linear(in_features=512, out_features=24, bias=True)\n", + " )\n", + ")" + ] + }, + "execution_count": 16, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = torch.compile(model.to(device))\n", + "model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since the problem we are trying to solve is still the same (classifying ASL images), we will continue to use the same `loss_function` and `accuracy` metric." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": { + "executionInfo": { + "elapsed": 237, + "status": "ok", + "timestamp": 1715240559055, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "-BUIQ5COwsri" + }, + "outputs": [], + "source": [ + "loss_function = nn.CrossEntropyLoss()\n", + "optimizer = Adam(model.parameters())" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1715240559790, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "SniWnvc5NSkA" + }, + "outputs": [], + "source": [ + "def get_batch_accuracy(output, y, N):\n", + " pred = output.argmax(dim=1, keepdim=True)\n", + " correct = pred.eq(y.view_as(pred)).sum().item()\n", + " return correct / N" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OBgbUNDH3ZZR" + }, + "source": [ + "### 3.5 Training the Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "tsS9zDKh3ZZR" + }, + "source": [ + "Despite the very different model architecture, the training looks exactly the same." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These are the same `train` and `validate` functions as before, but they have been mixed up. Can you correctly name each function and replace the `FIXME`s?\n", + "\n", + "One of them should have `model.train` and the other should have `model.eval`. " + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": { + "executionInfo": { + "elapsed": 214, + "status": "ok", + "timestamp": 1715240562885, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "e9R0vJA8NQUW" + }, + "outputs": [], + "source": [ + "def validate():\n", + " loss = 0\n", + " accuracy = 0\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " for x, y in valid_loader:\n", + " output = model(x)\n", + "\n", + " loss += loss_function(output, y).item()\n", + " accuracy += get_batch_accuracy(output, y, valid_N)\n", + " print('validate - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": { + "executionInfo": { + "elapsed": 212, + "status": "ok", + "timestamp": 1715240561357, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "wr-X8QkVv9I7" + }, + "outputs": [], + "source": [ + "def train():\n", + " loss = 0\n", + " accuracy = 0\n", + "\n", + " model.train()\n", + " for x, y in train_loader:\n", + " output = model(x)\n", + " optimizer.zero_grad()\n", + " batch_loss = loss_function(output, y)\n", + " batch_loss.backward()\n", + " optimizer.step()\n", + "\n", + " loss += batch_loss.item()\n", + " accuracy += get_batch_accuracy(output, y, train_N)\n", + " print('train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Solution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click the two `...`s below for the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SOLUTION\n", + "def validate():\n", + " loss = 0\n", + " accuracy = 0\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " for x, y in valid_loader:\n", + " output = model(x)\n", + "\n", + " loss += loss_function(output, y).item()\n", + " accuracy += get_batch_accuracy(output, y, valid_N)\n", + " print('Valid - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SOLUTION\n", + "def train():\n", + " loss = 0\n", + " accuracy = 0\n", + "\n", + " model.train()\n", + " for x, y in train_loader:\n", + " output = model(x)\n", + " optimizer.zero_grad()\n", + " batch_loss = loss_function(output, y)\n", + " batch_loss.backward()\n", + " optimizer.step()\n", + "\n", + " loss += batch_loss.item()\n", + " accuracy += get_batch_accuracy(output, y, train_N)\n", + " print('Train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 720 + }, + "executionInfo": { + "elapsed": 430665, + "status": "error", + "timestamp": 1715240995537, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "qOYsrlmUwyyI", + "outputId": "ccbb497f-8f23-43c3-85c4-81f47c98728d" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch: 0\n", + "train - Loss: 275.1043 Accuracy: 0.9050\n", + "validate - Loss: 20.9696 Accuracy: 0.9738\n", + "Epoch: 1\n", + "train - Loss: 19.4358 Accuracy: 0.9933\n", + "validate - Loss: 34.5608 Accuracy: 0.9424\n", + "Epoch: 2\n", + "train - Loss: 6.1264 Accuracy: 0.9980\n", + "validate - Loss: 80.2993 Accuracy: 0.9088\n", + "Epoch: 3\n", + "train - Loss: 12.6979 Accuracy: 0.9962\n", + "validate - Loss: 9.8250 Accuracy: 0.9848\n", + "Epoch: 4\n", + "train - Loss: 12.4665 Accuracy: 0.9959\n", + "validate - Loss: 12.4502 Accuracy: 0.9840\n", + "Epoch: 5\n", + "train - Loss: 11.4185 Accuracy: 0.9966\n", + "validate - Loss: 21.9803 Accuracy: 0.9527\n", + "Epoch: 6\n", + "train - Loss: 1.9842 Accuracy: 0.9993\n", + "validate - Loss: 16.9936 Accuracy: 0.9757\n", + "Epoch: 7\n", + "train - Loss: 10.6350 Accuracy: 0.9962\n", + "validate - Loss: 30.1904 Accuracy: 0.9621\n", + "Epoch: 8\n", + "train - Loss: 5.0981 Accuracy: 0.9987\n", + "validate - Loss: 9.5280 Accuracy: 0.9873\n", + "Epoch: 9\n", + "train - Loss: 5.5148 Accuracy: 0.9979\n", + "validate - Loss: 19.5927 Accuracy: 0.9646\n", + "Epoch: 10\n", + "train - Loss: 3.9784 Accuracy: 0.9984\n", + "validate - Loss: 17.2282 Accuracy: 0.9764\n", + "Epoch: 11\n", + "train - Loss: 3.3571 Accuracy: 0.9990\n", + "validate - Loss: 51.6098 Accuracy: 0.9406\n", + "Epoch: 12\n", + "train - Loss: 7.9855 Accuracy: 0.9975\n", + "validate - Loss: 22.7062 Accuracy: 0.9763\n", + "Epoch: 13\n", + "train - Loss: 4.7391 Accuracy: 0.9984\n", + "validate - Loss: 134.0925 Accuracy: 0.8787\n", + "Epoch: 14\n", + "train - Loss: 2.6986 Accuracy: 0.9991\n", + "validate - Loss: 24.6166 Accuracy: 0.9714\n", + "Epoch: 15\n", + "train - Loss: 3.4454 Accuracy: 0.9991\n", + "validate - Loss: 30.3219 Accuracy: 0.9667\n", + "Epoch: 16\n", + "train - Loss: 3.4769 Accuracy: 0.9991\n", + "validate - Loss: 44.5873 Accuracy: 0.9589\n", + "Epoch: 17\n", + "train - Loss: 3.9115 Accuracy: 0.9988\n", + "validate - Loss: 32.3518 Accuracy: 0.9635\n", + "Epoch: 18\n", + "train - Loss: 3.1029 Accuracy: 0.9989\n", + "validate - Loss: 28.7711 Accuracy: 0.9703\n", + "Epoch: 19\n", + "train - Loss: 6.0194 Accuracy: 0.9980\n", + "validate - Loss: 15.4996 Accuracy: 0.9822\n" + ] + } + ], + "source": [ + "epochs = 20\n", + "\n", + "for epoch in range(epochs):\n", + " print('Epoch: {}'.format(epoch))\n", + " train()\n", + " validate()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "pVytGlnl3ZZR" + }, + "source": [ + "### 3.5.1 Discussion of Results" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Ukd8Kk8l3ZZR" + }, + "source": [ + "It looks like this model is significantly improved! The training accuracy is very high, and the validation accuracy has improved as well. This is a great result, as all we had to do was swap in a new model.\n", + "\n", + "You may have noticed the validation accuracy jumping around. This is an indication that our model is still not generalizing perfectly. Fortunately, there's more that we can do. Let's talk about it in the next lecture." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "zsOHIy5F3ZZR" + }, + "source": [ + "## 3.6 Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DcIRdSur3ZZR" + }, + "source": [ + "In this section, we utilized several new kinds of layers to implement a CNN, which performed better than the more simple model used in the last section. Hopefully the overall process of creating and training a model with prepared data is starting to become even more familiar." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "o0wFCmbK3ZZS" + }, + "source": [ + "### 3.6.1 Clear the Memory\n", + "Before moving on, please execute the following cell to clear up the GPU memory. This is required to move on to the next notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "0Ul7wgax3ZZS" + }, + "outputs": [], + "source": [ + "import IPython\n", + "app = IPython.Application.instance()\n", + "app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "4kMR2FOK3ZZS" + }, + "source": [ + "### 3.6.2 Next" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "13FglbMX3ZZS" + }, + "source": [ + "In the last several sections you have focused on the creation and training of models. In order to further improve performance, you will now turn your attention to *data augmentation*, a collection of techniques that will allow your models to train on more and better data than what you might have originally at your disposal." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PEzcSC6x3ZZS" + }, + "source": [ + "
\"Header\"
" + ] + } + ], + "metadata": { + "colab": { + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} -- cgit v1.2.3