From 53f20d58628171934c097dff5602fe17765eae99 Mon Sep 17 00:00:00 2001
From: leshe4ka46 <alex9102naid1@ya.ru>
Date: Thu, 25 Dec 2025 21:28:30 +0300
Subject: finish

---
 Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb | 1492 ++++++++++++++++++++++++
 1 file changed, 1492 insertions(+)
 create mode 100644 Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb

(limited to 'Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb')
diff --git a/Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb b/Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb
new file mode 100644
index 0000000..c3d0c2a
--- /dev/null
+++ b/Fundamentals_of_Deep_Learning/03_asl_cnn.ipynb
@@ -0,0 +1,1492 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "weabkZTF3ZZM"
+   },
+   "source": [
+    "<center><a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI_Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a></center>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Dz8YI6Fb3ZZN"
+   },
+   "source": [
+    "# 3. Convolutional Neural Networks"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "8UWR4l4X3ZZN"
+   },
+   "source": [
+    "In the previous section, we built and trained a simple model to classify ASL images. The model was able to learn how to correctly classify the training dataset with very high accuracy, but, it did not perform nearly as well on validation dataset. This behavior of not generalizing well to non-training data is called [overfitting](https://scikit-learn.org/stable/auto_examples/model_selection/plot_underfitting_overfitting.html), and in this section, we will introduce a popular kind of model called a [convolutional neural network](https://towardsdatascience.com/a-comprehensive-guide-to-convolutional-neural-networks-the-eli5-way-3bd2b1164a53) that is especially good for reading images and classifying them."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "GmRLS07k3ZZN"
+   },
+   "source": [
+    "## 3.1 Objectives"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Iuvwj_tr3ZZN"
+   },
+   "source": [
+    "* Prep data specifically for a CNN\n",
+    "* Create a more sophisticated CNN model, understanding a greater variety of model layers\n",
+    "* Train a CNN model and observe its performance"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "executionInfo": {
+     "elapsed": 221,
+     "status": "ok",
+     "timestamp": 1715240535370,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "9kMRTHEV2AFm",
+    "outputId": "f1fb3858-e6a7-4906-ec7e-c4d34abcf013"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import torch.nn as nn\n",
+    "import pandas as pd\n",
+    "import torch\n",
+    "from torch.optim import Adam\n",
+    "from torch.utils.data import Dataset, DataLoader\n",
+    "\n",
+    "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+    "torch.cuda.is_available()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "xEGukATl3ZZN"
+   },
+   "source": [
+    "## 3.2 Loading and Preparing the Data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2.1 Preparing Images"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "-SyD7hID3ZZN"
+   },
+   "source": [
+    "Let's begin by loading our DataFrames like we did in the previous lab:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 3372,
+     "status": "ok",
+     "timestamp": 1715240541334,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "XMMgEMcc2Ehg"
+   },
+   "outputs": [],
+   "source": [
+    "train_df = pd.read_csv(\"data/asl_data/sign_mnist_train.csv\")\n",
+    "valid_df = pd.read_csv(\"data/asl_data/sign_mnist_valid.csv\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>label</th>\n",
+       "      <th>pixel1</th>\n",
+       "      <th>pixel2</th>\n",
+       "      <th>pixel3</th>\n",
+       "      <th>pixel4</th>\n",
+       "      <th>pixel5</th>\n",
+       "      <th>pixel6</th>\n",
+       "      <th>pixel7</th>\n",
+       "      <th>pixel8</th>\n",
+       "      <th>pixel9</th>\n",
+       "      <th>...</th>\n",
+       "      <th>pixel775</th>\n",
+       "      <th>pixel776</th>\n",
+       "      <th>pixel777</th>\n",
+       "      <th>pixel778</th>\n",
+       "      <th>pixel779</th>\n",
+       "      <th>pixel780</th>\n",
+       "      <th>pixel781</th>\n",
+       "      <th>pixel782</th>\n",
+       "      <th>pixel783</th>\n",
+       "      <th>pixel784</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>3</td>\n",
+       "      <td>107</td>\n",
+       "      <td>118</td>\n",
+       "      <td>127</td>\n",
+       "      <td>134</td>\n",
+       "      <td>139</td>\n",
+       "      <td>143</td>\n",
+       "      <td>146</td>\n",
+       "      <td>150</td>\n",
+       "      <td>153</td>\n",
+       "      <td>...</td>\n",
+       "      <td>207</td>\n",
+       "      <td>207</td>\n",
+       "      <td>207</td>\n",
+       "      <td>207</td>\n",
+       "      <td>206</td>\n",
+       "      <td>206</td>\n",
+       "      <td>206</td>\n",
+       "      <td>204</td>\n",
+       "      <td>203</td>\n",
+       "      <td>202</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>6</td>\n",
+       "      <td>155</td>\n",
+       "      <td>157</td>\n",
+       "      <td>156</td>\n",
+       "      <td>156</td>\n",
+       "      <td>156</td>\n",
+       "      <td>157</td>\n",
+       "      <td>156</td>\n",
+       "      <td>158</td>\n",
+       "      <td>158</td>\n",
+       "      <td>...</td>\n",
+       "      <td>69</td>\n",
+       "      <td>149</td>\n",
+       "      <td>128</td>\n",
+       "      <td>87</td>\n",
+       "      <td>94</td>\n",
+       "      <td>163</td>\n",
+       "      <td>175</td>\n",
+       "      <td>103</td>\n",
+       "      <td>135</td>\n",
+       "      <td>149</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>2</td>\n",
+       "      <td>187</td>\n",
+       "      <td>188</td>\n",
+       "      <td>188</td>\n",
+       "      <td>187</td>\n",
+       "      <td>187</td>\n",
+       "      <td>186</td>\n",
+       "      <td>187</td>\n",
+       "      <td>188</td>\n",
+       "      <td>187</td>\n",
+       "      <td>...</td>\n",
+       "      <td>202</td>\n",
+       "      <td>201</td>\n",
+       "      <td>200</td>\n",
+       "      <td>199</td>\n",
+       "      <td>198</td>\n",
+       "      <td>199</td>\n",
+       "      <td>198</td>\n",
+       "      <td>195</td>\n",
+       "      <td>194</td>\n",
+       "      <td>195</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>3</th>\n",
+       "      <td>2</td>\n",
+       "      <td>211</td>\n",
+       "      <td>211</td>\n",
+       "      <td>212</td>\n",
+       "      <td>212</td>\n",
+       "      <td>211</td>\n",
+       "      <td>210</td>\n",
+       "      <td>211</td>\n",
+       "      <td>210</td>\n",
+       "      <td>210</td>\n",
+       "      <td>...</td>\n",
+       "      <td>235</td>\n",
+       "      <td>234</td>\n",
+       "      <td>233</td>\n",
+       "      <td>231</td>\n",
+       "      <td>230</td>\n",
+       "      <td>226</td>\n",
+       "      <td>225</td>\n",
+       "      <td>222</td>\n",
+       "      <td>229</td>\n",
+       "      <td>163</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>4</th>\n",
+       "      <td>12</td>\n",
+       "      <td>164</td>\n",
+       "      <td>167</td>\n",
+       "      <td>170</td>\n",
+       "      <td>172</td>\n",
+       "      <td>176</td>\n",
+       "      <td>179</td>\n",
+       "      <td>180</td>\n",
+       "      <td>184</td>\n",
+       "      <td>185</td>\n",
+       "      <td>...</td>\n",
+       "      <td>92</td>\n",
+       "      <td>105</td>\n",
+       "      <td>105</td>\n",
+       "      <td>108</td>\n",
+       "      <td>133</td>\n",
+       "      <td>163</td>\n",
+       "      <td>157</td>\n",
+       "      <td>163</td>\n",
+       "      <td>164</td>\n",
+       "      <td>179</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>5 rows × 785 columns</p>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   label  pixel1  pixel2  pixel3  pixel4  pixel5  pixel6  pixel7  pixel8  \\\n",
+       "0      3     107     118     127     134     139     143     146     150   \n",
+       "1      6     155     157     156     156     156     157     156     158   \n",
+       "2      2     187     188     188     187     187     186     187     188   \n",
+       "3      2     211     211     212     212     211     210     211     210   \n",
+       "4     12     164     167     170     172     176     179     180     184   \n",
+       "\n",
+       "   pixel9  ...  pixel775  pixel776  pixel777  pixel778  pixel779  pixel780  \\\n",
+       "0     153  ...       207       207       207       207       206       206   \n",
+       "1     158  ...        69       149       128        87        94       163   \n",
+       "2     187  ...       202       201       200       199       198       199   \n",
+       "3     210  ...       235       234       233       231       230       226   \n",
+       "4     185  ...        92       105       105       108       133       163   \n",
+       "\n",
+       "   pixel781  pixel782  pixel783  pixel784  \n",
+       "0       206       204       203       202  \n",
+       "1       175       103       135       149  \n",
+       "2       198       195       194       195  \n",
+       "3       225       222       229       163  \n",
+       "4       157       163       164       179  \n",
+       "\n",
+       "[5 rows x 785 columns]"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "train_df.head()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This ASL data is already flattened."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "array([[107, 118, 127, ..., 204, 203, 202],\n",
+       "       [155, 157, 156, ..., 103, 135, 149],\n",
+       "       [187, 188, 188, ..., 195, 194, 195],\n",
+       "       [211, 211, 212, ..., 222, 229, 163],\n",
+       "       [164, 167, 170, ..., 163, 164, 179]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample_df = train_df.head().copy()  # Grab the top 5 rows\n",
+    "sample_df.pop('label')\n",
+    "sample_x = sample_df.values\n",
+    "sample_x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(5, 784)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "sample_x.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this format, we don't have all the information about which pixels are near each other. Because of this, we can't apply convolutions that will detect features. Let's [reshape](https://numpy.org/doc/stable/reference/generated/numpy.reshape.html) our dataset so that they are in a 28x28 pixel format. This will allow our convolutions to associate groups of pixels and detect important features.\n",
+    "\n",
+    "Note that for the first convolutional layer of our model, we need to have not only the height and width of the image, but also the number of [color channels](https://www.photoshopessentials.com/essentials/rgb/). Our images are grayscale, so we'll just have 1 channel.\n",
+    "\n",
+    "That means that we need to convert the current shape `(5, 784)` to `(5, 1, 28, 28)`. With [NumPy](https://numpy.org/doc/stable/index.html) arrays, we can pass a `-1` for any dimension we wish to remain the same."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(5, 1, 28, 28)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "IMG_HEIGHT = 28\n",
+    "IMG_WIDTH = 28\n",
+    "IMG_CHS = 1\n",
+    "\n",
+    "sample_x = sample_x.reshape(-1, IMG_CHS, IMG_HEIGHT, IMG_WIDTH)\n",
+    "sample_x.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2.2 Create a Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's add the steps above into our `MyDataset` class."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Exercise"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "There are 4 `FIXME`s in the class definition below. Can you replace them with the correct values?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 173,
+     "status": "ok",
+     "timestamp": 1715240547901,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "tpzGOri32Klj"
+   },
+   "outputs": [],
+   "source": [
+    "class MyDataset(Dataset):\n",
+    "    def __init__(self, base_df):\n",
+    "        x_df = base_df.copy()  # Some operations below are in-place\n",
+    "        y_df = x_df.pop('label')\n",
+    "        x_df = x_df.values / 255  # Normalize values from 0 to 1\n",
+    "        x_df = x_df.reshape(-1, IMG_CHS, IMG_WIDTH, IMG_HEIGHT)\n",
+    "        self.xs = torch.tensor(x_df).float().to(device)\n",
+    "        self.ys = torch.tensor(y_df).to(device)\n",
+    "\n",
+    "    def __getitem__(self, idx):\n",
+    "        x = self.xs[idx]\n",
+    "        y = self.ys[idx]\n",
+    "        return x, y\n",
+    "\n",
+    "    def __len__(self):\n",
+    "        return len(self.xs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Click the `...` below for the solution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SOLUTION\n",
+    "class MyDataset(Dataset):\n",
+    "    def __init__(self, base_df):\n",
+    "        x_df = base_df.copy()  # Some operations below are in-place\n",
+    "        y_df = x_df.pop('label')\n",
+    "        x_df = x_df.values / 255  # Normalize values from 0 to 1\n",
+    "        x_df = x_df.reshape(-1, IMG_CHS, IMG_WIDTH, IMG_HEIGHT)\n",
+    "        self.xs = torch.tensor(x_df).float().to(device)\n",
+    "        self.ys = torch.tensor(y_df).to(device)\n",
+    "\n",
+    "    def __getitem__(self, idx):\n",
+    "        x = self.xs[idx]\n",
+    "        y = self.ys[idx]\n",
+    "        return x, y\n",
+    "\n",
+    "    def __len__(self):\n",
+    "        return len(self.xs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 3.2.3 Create a DataLoader"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Next, let's create the DataLoader from the Dataset"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Exercise"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "One of these function calls is missing the `shuffle=True` argument. Can you remember which one it is and add it back in?"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 1096,
+     "status": "ok",
+     "timestamp": 1715240550115,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "unf8Cz4WcK_M"
+   },
+   "outputs": [],
+   "source": [
+    "BATCH_SIZE = 32\n",
+    "\n",
+    "train_data = MyDataset(train_df)\n",
+    "train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)\n",
+    "train_N = len(train_loader.dataset)\n",
+    "\n",
+    "valid_data = MyDataset(valid_df)\n",
+    "valid_loader = DataLoader(valid_data, batch_size=BATCH_SIZE)\n",
+    "valid_N = len(valid_loader.dataset)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Click the `...` below for the solution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SOLUTION\n",
+    "train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's grab a batch from the DataLoader to make sure it works."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "executionInfo": {
+     "elapsed": 2,
+     "status": "ok",
+     "timestamp": 1715240550382,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "Z4xylt03dz1W",
+    "outputId": "80447d85-302d-4549-976b-f4c3ac0f0644"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[tensor([[[[0.7373, 0.7373, 0.7412,  ..., 0.2431, 0.1765, 0.1804],\n",
+       "           [0.7412, 0.7451, 0.7490,  ..., 0.2196, 0.1647, 0.1608],\n",
+       "           [0.7529, 0.7569, 0.7569,  ..., 0.2078, 0.1529, 0.1451],\n",
+       "           ...,\n",
+       "           [0.2235, 0.2118, 0.2000,  ..., 0.7686, 0.7647, 0.7569],\n",
+       "           [0.2118, 0.1961, 0.1804,  ..., 0.7725, 0.7647, 0.7333],\n",
+       "           [0.1961, 0.1804, 0.1725,  ..., 0.7804, 0.7608, 0.6353]]],\n",
+       " \n",
+       " \n",
+       "         [[[0.7059, 0.7059, 0.7098,  ..., 0.6902, 0.6275, 0.3882],\n",
+       "           [0.7098, 0.7137, 0.7176,  ..., 0.6863, 0.6902, 0.5059],\n",
+       "           [0.7216, 0.7294, 0.7294,  ..., 0.7020, 0.7098, 0.4078],\n",
+       "           ...,\n",
+       "           [0.8157, 0.8196, 0.8275,  ..., 0.7843, 0.7373, 0.7373],\n",
+       "           [0.8157, 0.8235, 0.8196,  ..., 0.7882, 0.7451, 0.7255],\n",
+       "           [0.8196, 0.8235, 0.7843,  ..., 0.7255, 0.7176, 0.7059]]],\n",
+       " \n",
+       " \n",
+       "         [[[0.8863, 0.8902, 0.8863,  ..., 0.7961, 0.7882, 0.7804],\n",
+       "           [0.8941, 0.8941, 0.8902,  ..., 0.8039, 0.7961, 0.7843],\n",
+       "           [0.8941, 0.8980, 0.8980,  ..., 0.8078, 0.8039, 0.7922],\n",
+       "           ...,\n",
+       "           [0.5490, 0.5451, 0.5333,  ..., 0.2549, 0.1647, 0.1804],\n",
+       "           [0.5647, 0.5333, 0.5333,  ..., 0.2431, 0.1529, 0.1647],\n",
+       "           [0.5490, 0.5373, 0.5255,  ..., 0.2235, 0.1412, 0.1608]]],\n",
+       " \n",
+       " \n",
+       "         ...,\n",
+       " \n",
+       " \n",
+       "         [[[0.2941, 0.3451, 0.3843,  ..., 0.6235, 0.6275, 0.6235],\n",
+       "           [0.3059, 0.3569, 0.3922,  ..., 0.6392, 0.6392, 0.6353],\n",
+       "           [0.3176, 0.3686, 0.4078,  ..., 0.6510, 0.6510, 0.6667],\n",
+       "           ...,\n",
+       "           [0.3255, 0.3373, 0.3490,  ..., 0.4353, 0.3020, 0.2392],\n",
+       "           [0.2980, 0.2980, 0.3059,  ..., 0.3882, 0.2863, 0.2353],\n",
+       "           [0.3098, 0.3137, 0.3294,  ..., 0.3294, 0.2588, 0.2275]]],\n",
+       " \n",
+       " \n",
+       "         [[[0.7647, 0.7686, 0.7725,  ..., 0.7176, 0.7098, 0.7059],\n",
+       "           [0.7725, 0.7725, 0.7765,  ..., 0.7216, 0.7176, 0.7137],\n",
+       "           [0.7765, 0.7804, 0.7804,  ..., 0.7333, 0.7216, 0.7176],\n",
+       "           ...,\n",
+       "           [0.8392, 0.8431, 0.8392,  ..., 0.7922, 0.7922, 0.7882],\n",
+       "           [0.8275, 0.8353, 0.8392,  ..., 0.7882, 0.7843, 0.7765],\n",
+       "           [0.8275, 0.8314, 0.8392,  ..., 0.7922, 0.7843, 0.7804]]],\n",
+       " \n",
+       " \n",
+       "         [[[0.7216, 0.7412, 0.7686,  ..., 0.9490, 0.9412, 0.9294],\n",
+       "           [0.7373, 0.7569, 0.7804,  ..., 0.9608, 0.9529, 0.9451],\n",
+       "           [0.7490, 0.7725, 0.7922,  ..., 0.9765, 0.9725, 0.9608],\n",
+       "           ...,\n",
+       "           [0.9529, 0.9765, 0.9922,  ..., 0.0824, 0.1373, 0.2510],\n",
+       "           [0.9608, 0.9922, 1.0000,  ..., 0.1216, 0.0902, 0.2078],\n",
+       "           [0.7569, 0.7686, 0.7765,  ..., 0.1725, 0.0627, 0.1725]]]],\n",
+       "        device='cuda:0'),\n",
+       " tensor([ 2,  8, 12, 19, 11,  2, 13, 20,  4, 15, 17, 23,  7,  0, 14,  9,  8, 20,\n",
+       "          5,  3,  9, 11, 12,  7, 23,  5, 15, 21, 18, 21, 18, 17],\n",
+       "        device='cuda:0')]"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch = next(iter(train_loader))\n",
+    "batch"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "It looks different, but let's check the `shape`s to be sure."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "executionInfo": {
+     "elapsed": 205,
+     "status": "ok",
+     "timestamp": 1715240552534,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "vannMV7sd6R_",
+    "outputId": "627858a2-a4ed-467c-cf82-2b7c1a01c13f"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([32, 1, 28, 28])"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch[0].shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "executionInfo": {
+     "elapsed": 204,
+     "status": "ok",
+     "timestamp": 1715240553488,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "YHJgP3A7d9lu",
+    "outputId": "4a40ceb8-039b-4517-de8a-bdcb814c4164"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([32])"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch[1].shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "6biSPXKJ3ZZP"
+   },
+   "source": [
+    "## 3.3 Creating a Convolutional Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ppdkNb1A3ZZP"
+   },
+   "source": [
+    "These days, many data scientists start their projects by borrowing model properties from a similar project. Assuming the problem is not totally unique, there's a great chance that people have created models that will perform well which are posted in online repositories like [TensorFlow Hub](https://www.tensorflow.org/hub) and the [NGC Catalog](https://ngc.nvidia.com/catalog/models). Today, we'll provide a model that will work well for this problem.\n",
+    "\n",
+    "<img src=\"images/cnn.png\" width=180 />\n",
+    "\n",
+    "We covered many of the different kinds of layers in the lecture, and we will go over them all here with links to their documentation. When in doubt, read the official documentation (or ask [Stack Overflow](https://stackoverflow.com/))."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 202,
+     "status": "ok",
+     "timestamp": 1715240555184,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "p_bvGpMId_6q"
+   },
+   "outputs": [],
+   "source": [
+    "n_classes = 24\n",
+    "kernel_size = 3\n",
+    "flattened_img_size = 75 * 3 * 3\n",
+    "\n",
+    "model = nn.Sequential(\n",
+    "    # First convolution\n",
+    "    nn.Conv2d(IMG_CHS, 25, kernel_size, stride=1, padding=1),  # 25 x 28 x 28\n",
+    "    nn.BatchNorm2d(25),\n",
+    "    nn.ReLU(),\n",
+    "    nn.MaxPool2d(2, stride=2),  # 25 x 14 x 14\n",
+    "    # Second convolution\n",
+    "    nn.Conv2d(25, 50, kernel_size, stride=1, padding=1),  # 50 x 14 x 14\n",
+    "    nn.BatchNorm2d(50),\n",
+    "    nn.ReLU(),\n",
+    "    nn.Dropout(.2),\n",
+    "    nn.MaxPool2d(2, stride=2),  # 50 x 7 x 7\n",
+    "    # Third convolution\n",
+    "    nn.Conv2d(50, 75, kernel_size, stride=1, padding=1),  # 75 x 7 x 7\n",
+    "    nn.BatchNorm2d(75),\n",
+    "    nn.ReLU(),\n",
+    "    nn.MaxPool2d(2, stride=2),  # 75 x 3 x 3\n",
+    "    # Flatten to Dense\n",
+    "    nn.Flatten(),\n",
+    "    nn.Linear(flattened_img_size, 512),\n",
+    "    nn.Dropout(.3),\n",
+    "    nn.ReLU(),\n",
+    "    nn.Linear(512, n_classes)\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "8WsDr9gE3ZZP"
+   },
+   "source": [
+    "### 3.3.1 [Conv2D](https://pytorch.org/docs/stable/generated/torch.nn.Conv2d.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "8eHXRtWa3ZZP"
+   },
+   "source": [
+    "<img src=\"images/conv2d.png\" width=300 />\n",
+    "\n",
+    "These are our 2D convolutional layers. Small kernels will go over the input image and detect features that are important for classification. Earlier convolutions in the model will detect simple features such as lines. Later convolutions will detect more complex features. Let's look at our first Conv2D layer:\n",
+    "```Python\n",
+    "nn.Conv2d(IMG_CHS, 25, kernel_size, stride=1, padding=1)\n",
+    "```\n",
+    "25 refers to the number of filters that will be learned. Even though `kernel_size = 3`, PyTorch will assume we want 3 x 3 filters. Stride refer to the step size that the filter will take as it passes over the image. Padding refers to whether the output image that's created from the filter will match the size of the input image."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "OiuMlsan3ZZQ"
+   },
+   "source": [
+    "### 3.3.2 [BatchNormalization](https://pytorch.org/docs/stable/generated/torch.nn.BatchNorm2d.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Mp72aAnK3ZZQ"
+   },
+   "source": [
+    "Like normalizing our inputs, batch normalization scales the values in the hidden layers to improve training. [Read more about it in detail here](https://blog.paperspace.com/busting-the-myths-about-batch-normalization/).\n",
+    "\n",
+    "There is a debate on best where to put the batch normalization layer. [This Stack Overflow post](https://stackoverflow.com/questions/39691902/ordering-of-batch-normalization-and-dropout) compiles many perspectives."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "twarf_s63ZZQ"
+   },
+   "source": [
+    "### 3.3.3 [MaxPool2D](https://pytorch.org/docs/stable/generated/torch.nn.MaxPool2d.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "MoNIzZZW3ZZQ"
+   },
+   "source": [
+    "<img src=\"images/maxpool2d.png\" width=300 />\n",
+    "Max pooling takes an image and essentially shrinks it to a lower resolution. It does this to help the model be robust to translation (objects moving side to side), and also makes our model faster."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "mzHlBRja3ZZQ"
+   },
+   "source": [
+    "### 3.3.4 [Dropout](https://pytorch.org/docs/stable/generated/torch.nn.Dropout.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "FJjrPvkm3ZZQ"
+   },
+   "source": [
+    "<img src=\"images/dropout.png\" width=360 />\n",
+    "Dropout is a technique for preventing overfitting. Dropout randomly selects a subset of neurons and turns them off, so that they do not participate in forward or backward propagation in that particular pass. This helps to make sure that the network is robust and redundant, and does not rely on any one area to come up with answers.    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "NRYPkQPA3ZZQ"
+   },
+   "source": [
+    "### 3.3.5 [Flatten](https://pytorch.org/docs/stable/generated/torch.nn.Flatten.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "QuMt-DpZ3ZZQ"
+   },
+   "source": [
+    "Flatten takes the output of one layer which is multidimensional, and flattens it into a one-dimensional array. The output is called a feature vector and will be connected to the final classification layer."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "pSur4TGx3ZZQ"
+   },
+   "source": [
+    "### 3.3.6 [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "PATqMedY3ZZQ"
+   },
+   "source": [
+    "We have seen dense linear layers before in our earlier models. Our first dense layer (512 units) takes the feature vector as input and learns which features will contribute to a particular classification. The second dense layer (24 units) is the final classification layer that outputs our prediction."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "_opXKGWj3ZZQ"
+   },
+   "source": [
+    "## 3.4 Summarizing the Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Eo6eRrp23ZZQ"
+   },
+   "source": [
+    "This may feel like a lot of information, but don't worry. It's not critical that to understand everything right now in order to effectively train convolutional models. Most importantly we know that they can help with extracting useful information from images, and can be used in classification tasks."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/"
+    },
+    "executionInfo": {
+     "elapsed": 200,
+     "status": "ok",
+     "timestamp": 1715240557183,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "2IAS92gZwcP3",
+    "outputId": "56678948-aed0-4aa3-dde9-b8cecbaff44d"
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "OptimizedModule(\n",
+       "  (_orig_mod): Sequential(\n",
+       "    (0): Conv2d(1, 25, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+       "    (1): BatchNorm2d(25, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    (2): ReLU()\n",
+       "    (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+       "    (4): Conv2d(25, 50, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+       "    (5): BatchNorm2d(50, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    (6): ReLU()\n",
+       "    (7): Dropout(p=0.2, inplace=False)\n",
+       "    (8): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+       "    (9): Conv2d(50, 75, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+       "    (10): BatchNorm2d(75, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n",
+       "    (11): ReLU()\n",
+       "    (12): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+       "    (13): Flatten(start_dim=1, end_dim=-1)\n",
+       "    (14): Linear(in_features=675, out_features=512, bias=True)\n",
+       "    (15): Dropout(p=0.3, inplace=False)\n",
+       "    (16): ReLU()\n",
+       "    (17): Linear(in_features=512, out_features=24, bias=True)\n",
+       "  )\n",
+       ")"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model = torch.compile(model.to(device))\n",
+    "model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Since the problem we are trying to solve is still the same (classifying ASL images), we will continue to use the same `loss_function` and `accuracy` metric."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 237,
+     "status": "ok",
+     "timestamp": 1715240559055,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "-BUIQ5COwsri"
+   },
+   "outputs": [],
+   "source": [
+    "loss_function = nn.CrossEntropyLoss()\n",
+    "optimizer = Adam(model.parameters())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 3,
+     "status": "ok",
+     "timestamp": 1715240559790,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "SniWnvc5NSkA"
+   },
+   "outputs": [],
+   "source": [
+    "def get_batch_accuracy(output, y, N):\n",
+    "    pred = output.argmax(dim=1, keepdim=True)\n",
+    "    correct = pred.eq(y.view_as(pred)).sum().item()\n",
+    "    return correct / N"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "OBgbUNDH3ZZR"
+   },
+   "source": [
+    "### 3.5 Training the Model"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "tsS9zDKh3ZZR"
+   },
+   "source": [
+    "Despite the very different model architecture, the training looks exactly the same."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Exercise"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "These are the same `train` and `validate` functions as before, but they have been mixed up. Can you correctly name each function and replace the `FIXME`s?\n",
+    "\n",
+    "One of them should have `model.train` and the other should have `model.eval`. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 214,
+     "status": "ok",
+     "timestamp": 1715240562885,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "e9R0vJA8NQUW"
+   },
+   "outputs": [],
+   "source": [
+    "def validate():\n",
+    "    loss = 0\n",
+    "    accuracy = 0\n",
+    "\n",
+    "    model.eval()\n",
+    "    with torch.no_grad():\n",
+    "        for x, y in valid_loader:\n",
+    "            output = model(x)\n",
+    "\n",
+    "            loss += loss_function(output, y).item()\n",
+    "            accuracy += get_batch_accuracy(output, y, valid_N)\n",
+    "    print('validate - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {
+    "executionInfo": {
+     "elapsed": 212,
+     "status": "ok",
+     "timestamp": 1715240561357,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "wr-X8QkVv9I7"
+   },
+   "outputs": [],
+   "source": [
+    "def train():\n",
+    "    loss = 0\n",
+    "    accuracy = 0\n",
+    "\n",
+    "    model.train()\n",
+    "    for x, y in train_loader:\n",
+    "        output = model(x)\n",
+    "        optimizer.zero_grad()\n",
+    "        batch_loss = loss_function(output, y)\n",
+    "        batch_loss.backward()\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        loss += batch_loss.item()\n",
+    "        accuracy += get_batch_accuracy(output, y, train_N)\n",
+    "    print('train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Click the two `...`s below for the solution."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SOLUTION\n",
+    "def validate():\n",
+    "    loss = 0\n",
+    "    accuracy = 0\n",
+    "\n",
+    "    model.eval()\n",
+    "    with torch.no_grad():\n",
+    "        for x, y in valid_loader:\n",
+    "            output = model(x)\n",
+    "\n",
+    "            loss += loss_function(output, y).item()\n",
+    "            accuracy += get_batch_accuracy(output, y, valid_N)\n",
+    "    print('Valid - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# SOLUTION\n",
+    "def train():\n",
+    "    loss = 0\n",
+    "    accuracy = 0\n",
+    "\n",
+    "    model.train()\n",
+    "    for x, y in train_loader:\n",
+    "        output = model(x)\n",
+    "        optimizer.zero_grad()\n",
+    "        batch_loss = loss_function(output, y)\n",
+    "        batch_loss.backward()\n",
+    "        optimizer.step()\n",
+    "\n",
+    "        loss += batch_loss.item()\n",
+    "        accuracy += get_batch_accuracy(output, y, train_N)\n",
+    "    print('Train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 720
+    },
+    "executionInfo": {
+     "elapsed": 430665,
+     "status": "error",
+     "timestamp": 1715240995537,
+     "user": {
+      "displayName": "Danielle Detering US",
+      "userId": "15432464718872067879"
+     },
+     "user_tz": 420
+    },
+    "id": "qOYsrlmUwyyI",
+    "outputId": "ccbb497f-8f23-43c3-85c4-81f47c98728d"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Epoch: 0\n",
+      "train - Loss: 275.1043 Accuracy: 0.9050\n",
+      "validate - Loss: 20.9696 Accuracy: 0.9738\n",
+      "Epoch: 1\n",
+      "train - Loss: 19.4358 Accuracy: 0.9933\n",
+      "validate - Loss: 34.5608 Accuracy: 0.9424\n",
+      "Epoch: 2\n",
+      "train - Loss: 6.1264 Accuracy: 0.9980\n",
+      "validate - Loss: 80.2993 Accuracy: 0.9088\n",
+      "Epoch: 3\n",
+      "train - Loss: 12.6979 Accuracy: 0.9962\n",
+      "validate - Loss: 9.8250 Accuracy: 0.9848\n",
+      "Epoch: 4\n",
+      "train - Loss: 12.4665 Accuracy: 0.9959\n",
+      "validate - Loss: 12.4502 Accuracy: 0.9840\n",
+      "Epoch: 5\n",
+      "train - Loss: 11.4185 Accuracy: 0.9966\n",
+      "validate - Loss: 21.9803 Accuracy: 0.9527\n",
+      "Epoch: 6\n",
+      "train - Loss: 1.9842 Accuracy: 0.9993\n",
+      "validate - Loss: 16.9936 Accuracy: 0.9757\n",
+      "Epoch: 7\n",
+      "train - Loss: 10.6350 Accuracy: 0.9962\n",
+      "validate - Loss: 30.1904 Accuracy: 0.9621\n",
+      "Epoch: 8\n",
+      "train - Loss: 5.0981 Accuracy: 0.9987\n",
+      "validate - Loss: 9.5280 Accuracy: 0.9873\n",
+      "Epoch: 9\n",
+      "train - Loss: 5.5148 Accuracy: 0.9979\n",
+      "validate - Loss: 19.5927 Accuracy: 0.9646\n",
+      "Epoch: 10\n",
+      "train - Loss: 3.9784 Accuracy: 0.9984\n",
+      "validate - Loss: 17.2282 Accuracy: 0.9764\n",
+      "Epoch: 11\n",
+      "train - Loss: 3.3571 Accuracy: 0.9990\n",
+      "validate - Loss: 51.6098 Accuracy: 0.9406\n",
+      "Epoch: 12\n",
+      "train - Loss: 7.9855 Accuracy: 0.9975\n",
+      "validate - Loss: 22.7062 Accuracy: 0.9763\n",
+      "Epoch: 13\n",
+      "train - Loss: 4.7391 Accuracy: 0.9984\n",
+      "validate - Loss: 134.0925 Accuracy: 0.8787\n",
+      "Epoch: 14\n",
+      "train - Loss: 2.6986 Accuracy: 0.9991\n",
+      "validate - Loss: 24.6166 Accuracy: 0.9714\n",
+      "Epoch: 15\n",
+      "train - Loss: 3.4454 Accuracy: 0.9991\n",
+      "validate - Loss: 30.3219 Accuracy: 0.9667\n",
+      "Epoch: 16\n",
+      "train - Loss: 3.4769 Accuracy: 0.9991\n",
+      "validate - Loss: 44.5873 Accuracy: 0.9589\n",
+      "Epoch: 17\n",
+      "train - Loss: 3.9115 Accuracy: 0.9988\n",
+      "validate - Loss: 32.3518 Accuracy: 0.9635\n",
+      "Epoch: 18\n",
+      "train - Loss: 3.1029 Accuracy: 0.9989\n",
+      "validate - Loss: 28.7711 Accuracy: 0.9703\n",
+      "Epoch: 19\n",
+      "train - Loss: 6.0194 Accuracy: 0.9980\n",
+      "validate - Loss: 15.4996 Accuracy: 0.9822\n"
+     ]
+    }
+   ],
+   "source": [
+    "epochs = 20\n",
+    "\n",
+    "for epoch in range(epochs):\n",
+    "    print('Epoch: {}'.format(epoch))\n",
+    "    train()\n",
+    "    validate()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "pVytGlnl3ZZR"
+   },
+   "source": [
+    "### 3.5.1 Discussion of Results"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "Ukd8Kk8l3ZZR"
+   },
+   "source": [
+    "It looks like this model is significantly improved! The training accuracy is very high, and the validation accuracy has improved as well. This is a great result, as all we had to do was swap in a new model.\n",
+    "\n",
+    "You may have noticed the validation accuracy jumping around. This is an indication that our model is still not generalizing perfectly. Fortunately, there's more that we can do. Let's talk about it in the next lecture."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "zsOHIy5F3ZZR"
+   },
+   "source": [
+    "## 3.6 Summary"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "DcIRdSur3ZZR"
+   },
+   "source": [
+    "In this section, we utilized several new kinds of layers to implement a CNN, which performed better than the more simple model used in the last section. Hopefully the overall process of creating and training a model with prepared data is starting to become even more familiar."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "o0wFCmbK3ZZS"
+   },
+   "source": [
+    "### 3.6.1 Clear the Memory\n",
+    "Before moving on, please execute the following cell to clear up the GPU memory. This is required to move on to the next notebook."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "id": "0Ul7wgax3ZZS"
+   },
+   "outputs": [],
+   "source": [
+    "import IPython\n",
+    "app = IPython.Application.instance()\n",
+    "app.kernel.do_shutdown(True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "4kMR2FOK3ZZS"
+   },
+   "source": [
+    "### 3.6.2 Next"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "13FglbMX3ZZS"
+   },
+   "source": [
+    "In the last several sections you have focused on the creation and training of models. In order to further improve performance, you will now turn your attention to *data augmentation*, a collection of techniques that will allow your models to train on more and better data than what you might have originally at your disposal."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "PEzcSC6x3ZZS"
+   },
+   "source": [
+    "<center><a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI_Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a></center>"
+   ]
+  }
+ ],
+ "metadata": {
+  "colab": {
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
-- 
cgit v1.2.3


	label	pixel1	pixel2	pixel3	pixel4	pixel5	pixel6	pixel7	pixel8	pixel9	...	pixel775	pixel776	pixel777	pixel778	pixel779	pixel780	pixel781	pixel782	pixel783	pixel784
0	3	107	118	127	134	139	143	146	150	153	...	207	207	207	207	206	206	206	204	203	202
1	6	155	157	156	156	156	157	156	158	158	...	69	149	128	87	94	163	175	103	135	149
2	2	187	188	188	187	187	186	187	188	187	...	202	201	200	199	198	199	198	195	194	195
3	2	211	211	212	212	211	210	211	210	210	...	235	234	233	231	230	226	225	222	229	163
4	12	164	167	170	172	176	179	180	184	185	...	92	105	105	108	133	163	157	163	164	179