From 53f20d58628171934c097dff5602fe17765eae99 Mon Sep 17 00:00:00 2001 From: leshe4ka46 Date: Thu, 25 Dec 2025 21:28:30 +0300 Subject: finish --- Fundamentals_of_Deep_Learning/02_asl.ipynb | 1722 ++++++++++++++++++++++++++++ 1 file changed, 1722 insertions(+) create mode 100644 Fundamentals_of_Deep_Learning/02_asl.ipynb (limited to 'Fundamentals_of_Deep_Learning/02_asl.ipynb') diff --git a/Fundamentals_of_Deep_Learning/02_asl.ipynb b/Fundamentals_of_Deep_Learning/02_asl.ipynb new file mode 100644 index 0000000..4f8fcc7 --- /dev/null +++ b/Fundamentals_of_Deep_Learning/02_asl.ipynb @@ -0,0 +1,1722 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "RDQqqfRXTjyJ" + }, + "source": [ + "
\"Header\"
" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Oq7vdepBTjyK" + }, + "source": [ + "# 2. Image Classification of an American Sign Language Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2nP3rWXBTjyK" + }, + "source": [ + "In this section, we will perform the data preparation, model creation, and model training steps we observed in the last section using a different dataset: images of hands making letters in [American Sign Language](http://www.asl.gs/)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Cs1ioXE2TjyK" + }, + "source": [ + "## 2.1 Objectives" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WzY0p5NZTjyK" + }, + "source": [ + "* Prepare image data for training\n", + "* Create and compile a simple model for image classification\n", + "* Train an image classification model and observe the results" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 849, + "status": "ok", + "timestamp": 1715066695195, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "Ev0pS0GdTjyM", + "outputId": "63e4a89f-c58b-4401-e6a5-4350b5df7714" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "True" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import torch.nn as nn\n", + "import pandas as pd\n", + "import torch\n", + "from torch.optim import Adam\n", + "from torch.utils.data import Dataset, DataLoader\n", + "\n", + "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n", + "torch.cuda.is_available()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mhGii0pzTjyL" + }, + "source": [ + "## 2.2 American Sign Language Dataset" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "x3IH42b2TjyL" + }, + "source": [ + "The [American Sign Language alphabet](http://www.asl.gs/) contains 26 letters. Two of those letters (j and z) require movement, so they are not included in the training dataset. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Xtnc5S_lTjyL" + }, + "source": [ + "" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "ihN8_4hfTjyL" + }, + "source": [ + "### 2.2.1 Kaggle" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "winTCTjiTjyL" + }, + "source": [ + "This dataset is available from the website [Kaggle](http://www.kaggle.com), which is a fantastic place to find datasets and other deep learning resources. In addition to providing resources like datasets and \"kernels\" that are like these notebooks, Kaggle hosts competitions that you can take part in, competing with others in training highly accurate models.\n", + "\n", + "If you're looking to practice or see examples of many deep learning projects, Kaggle is a great site to visit." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xzRiOTvsTjyL" + }, + "source": [ + "## 2.3 Loading the Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "7OouJPWVTjyL" + }, + "source": [ + "This dataset is not available via TorchVision in the same way that MNIST is, so let's learn how to load custom data. By the end of this section we will have `x_train`, `y_train`, `x_valid`, and `y_valid` variables." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rpKiBdCsTjyL" + }, + "source": [ + "### 2.3.1 Reading in the Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "mRBOfwA9TjyM" + }, + "source": [ + "The sign language dataset is in [CSV](https://en.wikipedia.org/wiki/Comma-separated_values) (Comma Separated Values) format, the same data structure behind Microsoft Excel and Google Sheets. It is a grid of rows and columns with labels at the top, as seen in the [train](data/asl_data/sign_mnist_train.csv) and [valid](data/asl_data/sign_mnist_valid.csv) datasets (they may take a moment to load).\n", + "\n", + "To load and work with the data, we'll be using a library called [Pandas](https://pandas.pydata.org/), which is a highly performant tool for loading and manipulating data. We'll read the CSV files into a format called a [DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "bbhLk9sVTjyM" + }, + "source": [ + "Pandas has a [read_csv](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_csv.html) method that expects a csv file, and returns a DataFrame:" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "id": "JeIOe1X4TjyM" + }, + "outputs": [], + "source": [ + "train_df = pd.read_csv(\"data/asl_data/sign_mnist_train.csv\")\n", + "valid_df = pd.read_csv(\"data/asl_data/sign_mnist_valid.csv\")" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BuiUoPWzTjyM" + }, + "source": [ + "### 2.3.2 Exploring the Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "GbirW1rxTjyM" + }, + "source": [ + "Let's take a look at our data. We can use the [head](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) method to print the first few rows of the DataFrame. Each row is an image which has a `label` column, and also, 784 values representing each pixel value in the image, just like with the MNIST dataset. Note that the labels currently are numerical values, not letters of the alphabet:" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 255 + }, + "executionInfo": { + "elapsed": 8, + "status": "ok", + "timestamp": 1715064545381, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "_jEITmvETjyM", + "outputId": "5444b9eb-1948-4140-95c4-91649677766b" + }, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
labelpixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9...pixel775pixel776pixel777pixel778pixel779pixel780pixel781pixel782pixel783pixel784
03107118127134139143146150153...207207207207206206206204203202
16155157156156156157156158158...691491288794163175103135149
22187188188187187186187188187...202201200199198199198195194195
32211211212212211210211210210...235234233231230226225222229163
412164167170172176179180184185...92105105108133163157163164179
\n", + "

5 rows × 785 columns

\n", + "
" + ], + "text/plain": [ + " label pixel1 pixel2 pixel3 pixel4 pixel5 pixel6 pixel7 pixel8 \\\n", + "0 3 107 118 127 134 139 143 146 150 \n", + "1 6 155 157 156 156 156 157 156 158 \n", + "2 2 187 188 188 187 187 186 187 188 \n", + "3 2 211 211 212 212 211 210 211 210 \n", + "4 12 164 167 170 172 176 179 180 184 \n", + "\n", + " pixel9 ... pixel775 pixel776 pixel777 pixel778 pixel779 pixel780 \\\n", + "0 153 ... 207 207 207 207 206 206 \n", + "1 158 ... 69 149 128 87 94 163 \n", + "2 187 ... 202 201 200 199 198 199 \n", + "3 210 ... 235 234 233 231 230 226 \n", + "4 185 ... 92 105 105 108 133 163 \n", + "\n", + " pixel781 pixel782 pixel783 pixel784 \n", + "0 206 204 203 202 \n", + "1 175 103 135 149 \n", + "2 198 195 194 195 \n", + "3 225 222 229 163 \n", + "4 157 163 164 179 \n", + "\n", + "[5 rows x 785 columns]" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_df.head()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "9vZwdu4yTjyN" + }, + "source": [ + "### 2.3.3 Extracting the Labels" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "wxRL89IvTjyN" + }, + "source": [ + "Let's store our training and validation labels in `y_train` and `y_valid` variables. We can use the [pop](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.pop.html) method to remove a column from our DataFrame and assign the removed values to a variable." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 2, + "status": "ok", + "timestamp": 1715064546558, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "EYujv09jTjyN", + "outputId": "572b986c-d128-4b25-9083-9d9607c9b680" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0 3\n", + "1 6\n", + "2 2\n", + "3 2\n", + "4 12\n", + " ..\n", + "27450 12\n", + "27451 22\n", + "27452 17\n", + "27453 16\n", + "27454 22\n", + "Name: label, Length: 27455, dtype: int64" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_train = train_df.pop('label')\n", + "y_valid = valid_df.pop('label')\n", + "y_train" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WQ6EtXGYTjyN" + }, + "source": [ + "### 2.3.4 Extracting the Images" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_mue0q7BTjyN" + }, + "source": [ + "Next, let's store our training and validation images in `x_train` and `x_valid` variables. Here we create those variables:" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 2, + "status": "ok", + "timestamp": 1715064548501, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "7sP3fSkETjyN", + "outputId": "6e04f85b-fb2d-444d-fc08-28a5741c3c6f" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "array([[107, 118, 127, ..., 204, 203, 202],\n", + " [155, 157, 156, ..., 103, 135, 149],\n", + " [187, 188, 188, ..., 195, 194, 195],\n", + " ...,\n", + " [174, 174, 174, ..., 202, 200, 200],\n", + " [177, 181, 184, ..., 64, 87, 93],\n", + " [179, 180, 180, ..., 205, 209, 215]])" + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x_train = train_df.values\n", + "x_valid = valid_df.values\n", + "x_train" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "6s7y8MYOTjyN" + }, + "source": [ + "### 2.3.5 Summarizing the Training and Validation Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YkVK5sL_TjyN" + }, + "source": [ + "We now have 27,455 images with 784 pixels each for training..." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 270, + "status": "ok", + "timestamp": 1715064412921, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "l7liruaxTjyN", + "outputId": "9b462f1d-6306-47bf-d292-151b9847e1ad" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(27455, 784)" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x_train.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WKP6-UT0TjyN" + }, + "source": [ + "...as well as their corresponding labels:" + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1715064415052, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "d4OguU-wTjyN", + "outputId": "d2030105-2f87-44ff-a13d-d530a63c46df" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(27455,)" + ] + }, + "execution_count": 7, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_train.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "VKuHu4V2TjyN" + }, + "source": [ + "For validation, we have 7,172 images..." + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1715064416314, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "AFx0PRCsTjyN", + "outputId": "e25714a6-07a2-4a5f-9dcb-e197e9fee5cb" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7172, 784)" + ] + }, + "execution_count": 8, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x_valid.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cfEP85sSTjyN" + }, + "source": [ + "...and their corresponding labels:" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1715064417463, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "5UgDUqppTjyO", + "outputId": "98f60c26-9975-4da8-e108-3b6b1503721d" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "(7172,)" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "y_valid.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "UWrF4zYPTjyO" + }, + "source": [ + "## 2.4 Visualizing the Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "HJnxfcGJTjyO" + }, + "source": [ + "To visualize the images, we will again use the matplotlib library. We don't need to worry about the details of this visualization, but if interested, you can learn more about [matplotlib](https://matplotlib.org/) at a later time.\n", + "\n", + "Note that we'll have to reshape the data from its current 1D shape of 784 pixels, to a 2D shape of 28x28 pixels to make sense of the image:" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 58 + }, + "executionInfo": { + "elapsed": 972, + "status": "ok", + "timestamp": 1715064554564, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "k-Fkl2mTTjyO", + "outputId": "3fe64867-d689-497c-d0b7-4a27edf8e89d" + }, + "outputs": [ + { + "data": { + "image/png": "", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "plt.figure(figsize=(40,40))\n", + "\n", + "num_images = 20\n", + "for i in range(num_images):\n", + " row = x_train[i]\n", + " label = y_train[i]\n", + "\n", + " image = row.reshape(28,28)\n", + " plt.subplot(1, num_images, i+1)\n", + " plt.title(label, fontdict={'fontsize': 30})\n", + " plt.axis('off')\n", + " plt.imshow(image, cmap='gray')" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "2j5Nwh7zTjyO" + }, + "source": [ + "### 2.4.1 Normalize the Image Data" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "xYfdAXx5TjyV" + }, + "source": [ + "As we did with the MNIST dataset, we are going to normalize the image data, meaning that their pixel values, instead of being between 0 and 255 as they are currently:" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1715064422916, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "zbEOSxuYTjyV", + "outputId": "d30e221e-99db-4e20-fdb6-cf67fb273225" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "0" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x_train.min()" + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 2, + "status": "ok", + "timestamp": 1715064423943, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "kAERnWUcTjyV", + "outputId": "57e592d4-a36e-4f3e-f924-d157fa9680ef" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "255" + ] + }, + "execution_count": 12, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "x_train.max()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_KWYaVLTTjyV" + }, + "source": [ + "In the previous lab, we used [ToTensor](https://pytorch.org/vision/main/generated/torchvision.transforms.ToTensor.html), but we can also modify our data before turning it into a tensor." + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": { + "id": "s3bf3TvgTjyW" + }, + "outputs": [], + "source": [ + "x_train = train_df.values / 255\n", + "x_valid = valid_df.values / 255" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.4.2 Custom Datasets" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iKQR2ouf7Q2r" + }, + "source": [ + "We can use PyTorch's [Dataset](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html) tools in order to create our own dataset. `__init__` will run once when the class is initialized. `__getitem__` returns our images and labels.\n", + "\n", + "Since our dataset is small enough, we can store it on our GPU for faster processing. In the previous lab, we sent our data to the GPU when it was drawn from each batch. Here, we will send it to the GPU in the `__init__` function." + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": { + "id": "KeKWb8tZ6jCy" + }, + "outputs": [], + "source": [ + "class MyDataset(Dataset):\n", + " def __init__(self, x_df, y_df):\n", + " self.xs = torch.tensor(x_df).float().to(device)\n", + " self.ys = torch.tensor(y_df).to(device)\n", + "\n", + " def __getitem__(self, idx):\n", + " x = self.xs[idx]\n", + " y = self.ys[idx]\n", + " return x, y\n", + "\n", + " def __len__(self):\n", + " return len(self.xs)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "A custom PyTorch dataset works just like a prebuilt one. It should be passed to a [DataLoader](https://pytorch.org/tutorials/beginner/basics/data_tutorial.html#preparing-your-data-for-training-with-dataloaders) for model training." + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": { + "id": "XoSSM2YEAh_6" + }, + "outputs": [], + "source": [ + "BATCH_SIZE = 32\n", + "\n", + "train_data = MyDataset(x_train, y_train)\n", + "train_loader = DataLoader(train_data, batch_size=BATCH_SIZE, shuffle=True)\n", + "train_N = len(train_loader.dataset)" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": { + "id": "cfAM1uoP1anX" + }, + "outputs": [], + "source": [ + "valid_data = MyDataset(x_valid, y_valid)\n", + "valid_loader = DataLoader(valid_data, batch_size=BATCH_SIZE)\n", + "valid_N = len(valid_loader.dataset)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can verify the DataLoader works as expected with the code below. We'll make the DataLoader [iterable](https://docs.python.org/3/library/functions.html#iter), and then call [next](https://docs.python.org/3/library/functions.html#next) to draw the first hand from the deck." + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 17, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "train_loader" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Try running the below a few times. The values should change each time." + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 3, + "status": "ok", + "timestamp": 1715068148957, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "5nRQD_koZ5jm", + "outputId": "f24f27e7-2452-46a2-ca83-0710c822b9f3" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "[tensor([[0.6235, 0.6353, 0.6392, ..., 0.1294, 0.0667, 0.0510],\n", + " [0.2000, 0.2314, 0.2510, ..., 0.6863, 0.6941, 0.7020],\n", + " [0.8000, 0.8000, 0.8039, ..., 0.1020, 0.0941, 0.1216],\n", + " ...,\n", + " [0.6157, 0.6314, 0.6392, ..., 0.3020, 0.4353, 0.7961],\n", + " [0.6353, 0.6431, 0.6471, ..., 0.4275, 0.2196, 0.1725],\n", + " [0.3647, 0.3922, 0.4118, ..., 0.6706, 0.6627, 0.6824]],\n", + " device='cuda:0'),\n", + " tensor([ 5, 17, 7, 20, 11, 22, 7, 3, 13, 14, 14, 6, 7, 15, 6, 16, 7, 17,\n", + " 10, 14, 0, 16, 22, 7, 18, 18, 5, 23, 13, 15, 9, 20],\n", + " device='cuda:0')]" + ] + }, + "execution_count": 18, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch = next(iter(train_loader))\n", + "batch" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice the batch has two values. The first is our `x`, and the second is our `y`. The first dimension of each should have `32` values, which is the `batch_size`." + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32, 784])" + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch[0].shape" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "torch.Size([32])" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "batch[1].shape" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "KkD0MEZ1TjyX" + }, + "source": [ + "## 2.5 Build the Model" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PHyHRWPdTjyX" + }, + "source": [ + "We've created our DataLoaders, now it's time to build our models." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For this exercise we are going to build a sequential model. Just like last time, build a model that:\n", + "\n", + "* Has a flatten layer.\n", + "* Has a dense input layer. This layer should contain 512 neurons amd use the `relu` activation function\n", + "* Has a second dense layer with 512 neurons which uses the `relu` activation function\n", + "* Has a dense output layer with neurons equal to the number of classes\n", + "\n", + "We will define a few variables to get started:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [], + "source": [ + "input_size = 28 * 28\n", + "n_classes = 24" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Do your work in the cell below, creating a `model` variable to store the model. We've imported the [Sequental](https://pytorch.org/docs/stable/generated/torch.nn.Sequential.html) model class and [Linear](https://pytorch.org/docs/stable/generated/torch.nn.Linear.html) layer class to get you started. Reveal the solution below for a hint:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "model = nn.Sequential(\n", + " nn.Flatten(),\n", + " nn.Linear(input_size,512),\n", + " nn.ReLU(),\n", + " nn.Linear(512,512),\n", + " nn.ReLU(),\n", + " nn.Linear(512,n_classes),\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Solution" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "_WckEGM9Tr-D" + }, + "outputs": [], + "source": [ + "# SOLUTION\n", + "model = nn.Sequential(\n", + " nn.Flatten(),\n", + " nn.Linear(input_size, 512), # Input\n", + " nn.ReLU(), # Activation for input\n", + " nn.Linear(512, 512), # Hidden\n", + " nn.ReLU(), # Activation for hidden\n", + " nn.Linear(512, n_classes) # Output\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This time, we'll combine compiling the model and sending it to the GPU in one step:" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 326, + "status": "ok", + "timestamp": 1715068155362, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "TjXGJyWuUGcH", + "outputId": "f3474939-80a3-4715-aea4-991a2629232a" + }, + "outputs": [ + { + "data": { + "text/plain": [ + "OptimizedModule(\n", + " (_orig_mod): Sequential(\n", + " (0): Flatten(start_dim=1, end_dim=-1)\n", + " (1): Linear(in_features=784, out_features=512, bias=True)\n", + " (2): ReLU()\n", + " (3): Linear(in_features=512, out_features=512, bias=True)\n", + " (4): ReLU()\n", + " (5): Linear(in_features=512, out_features=24, bias=True)\n", + " )\n", + ")" + ] + }, + "execution_count": 26, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "model = torch.compile(model.to(device))\n", + "model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Since categorizing these ASL images is similar to categorizing MNIST's handwritten digits, we will use the same `loss_function` ([Categorical CrossEntropy](https://pytorch.org/docs/stable/generated/torch.nn.CrossEntropyLoss.html)) and `optimizer` ([Adam](https://pytorch.org/docs/stable/generated/torch.optim.Adam.html)). `nn.CrossEntropyLoss` includes the softmax function, and is computationally faster when passing class indices as opposed to predicted probabilities." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "id": "LtHYbXz7UZQW" + }, + "outputs": [], + "source": [ + "loss_function = nn.CrossEntropyLoss()\n", + "optimizer = Adam(model.parameters())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2.6 Training the Model" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This time, let's look at our `train` and `validate` functions in more detail." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.6.1 The Train Function" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This code is almost the same as in the previous notebook, but we no longer send `x` and `y` to our GPU because our DataLoader already does that.\n", + "\n", + "Before looping through the DataLoader, we will set the model to [model.train](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.train) to make sure its parameters can be updated. To make it easier for us to follow training progress, we'll keep track of the total `loss` and `accuracy`.\n", + "\n", + "Then, for each batch in our `train_loader`, we will:\n", + "1. Get an `output` prediction from the model\n", + "2. Set the gradient to zero with the `optimizer`'s [zero_grad](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.zero_grad.html) function\n", + "3. Calculate the loss with our `loss_function`\n", + "4. Compute the gradient with [backward](https://pytorch.org/docs/stable/generated/torch.Tensor.backward.html)\n", + "5. Update our model parameters with the `optimizer`'s [step](https://pytorch.org/docs/stable/generated/torch.optim.Optimizer.step.html) function.\n", + "6. Update the `loss` and `accuracy` totals" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": { + "id": "AfgrATXFV_og" + }, + "outputs": [], + "source": [ + "def train():\n", + " loss = 0\n", + " accuracy = 0\n", + "\n", + " model.train()\n", + " for x, y in train_loader:\n", + " output = model(x)\n", + " optimizer.zero_grad()\n", + " batch_loss = loss_function(output, y)\n", + " batch_loss.backward()\n", + " optimizer.step()\n", + "\n", + " loss += batch_loss.item()\n", + " accuracy += get_batch_accuracy(output, y, train_N)\n", + " print('Train - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.6.2 The Validate Function" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The model does not learn during validation, so the `validate` function is simpler than the `train` function above.\n", + "\n", + "One key difference is we will set the model to evaluation mode with [model.evaluate](https://pytorch.org/docs/stable/generated/torch.nn.Module.html#torch.nn.Module.eval), which will prevent the model from updating any parameters." + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": { + "id": "8LQcAtcc0ggk" + }, + "outputs": [], + "source": [ + "def validate():\n", + " loss = 0\n", + " accuracy = 0\n", + "\n", + " model.eval()\n", + " with torch.no_grad():\n", + " for x, y in valid_loader:\n", + " output = model(x)\n", + "\n", + " loss += loss_function(output, y).item()\n", + " accuracy += get_batch_accuracy(output, y, valid_N)\n", + " print('Valid - Loss: {:.4f} Accuracy: {:.4f}'.format(loss, accuracy))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.6.3 Calculating the Accuracy" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Both the `train` and `validate` functions use `get_batch_accuracy`, but we have not defined that in this notebook yet. " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Exercise" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The function below has three `FIXME`s. Each one corresponds to the functions input arguments. Can you replace each FIXME with the correct argument?\n", + "\n", + "It may help to view the documentation for [argmax](https://pytorch.org/docs/stable/generated/torch.argmax.html), [eq](https://pytorch.org/docs/stable/generated/torch.eq.html), and [view_as](https://pytorch.org/docs/stable/generated/torch.Tensor.view_as.html)." + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "def get_batch_accuracy(output, y, N):\n", + " pred = output.argmax(dim=1, keepdim=True)\n", + " correct = pred.eq(y.view_as(pred)).sum().item()\n", + " return correct / N" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Solution" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Click the `...` below for the solution." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "# SOLUTION\n", + "def get_batch_accuracy(output, y, N):\n", + " pred = output.argmax(dim=1, keepdim=True)\n", + " correct = pred.eq(y.view_as(pred)).sum().item()\n", + " return correct / N" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.6.3 The Training Loop" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Let's bring it all together! Run the cell below to train the data for 20 `epochs`." + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "executionInfo": { + "elapsed": 38185, + "status": "ok", + "timestamp": 1715068198222, + "user": { + "displayName": "Danielle Detering US", + "userId": "15432464718872067879" + }, + "user_tz": 420 + }, + "id": "JXzlHJw8-j5L", + "outputId": "40b9eda4-352d-45f5-dc95-4193f7f13423" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Epoch: 0\n", + "Train - Loss: 1544.5647 Accuracy: 0.4139\n", + "Valid - Loss: 300.6098 Accuracy: 0.5432\n", + "Epoch: 1\n", + "Train - Loss: 682.1709 Accuracy: 0.7272\n", + "Valid - Loss: 241.6977 Accuracy: 0.6644\n", + "Epoch: 2\n", + "Train - Loss: 356.0020 Accuracy: 0.8608\n", + "Valid - Loss: 207.0941 Accuracy: 0.7241\n", + "Epoch: 3\n", + "Train - Loss: 212.9807 Accuracy: 0.9189\n", + "Valid - Loss: 222.7968 Accuracy: 0.7469\n", + "Epoch: 4\n", + "Train - Loss: 120.5757 Accuracy: 0.9571\n", + "Valid - Loss: 212.5262 Accuracy: 0.7851\n", + "Epoch: 5\n", + "Train - Loss: 90.5495 Accuracy: 0.9672\n", + "Valid - Loss: 270.4961 Accuracy: 0.7196\n", + "Epoch: 6\n", + "Train - Loss: 52.9683 Accuracy: 0.9828\n", + "Valid - Loss: 346.0175 Accuracy: 0.7058\n", + "Epoch: 7\n", + "Train - Loss: 60.2388 Accuracy: 0.9786\n", + "Valid - Loss: 243.9202 Accuracy: 0.7899\n", + "Epoch: 8\n", + "Train - Loss: 66.6491 Accuracy: 0.9772\n", + "Valid - Loss: 231.9360 Accuracy: 0.7985\n", + "Epoch: 9\n", + "Train - Loss: 57.7538 Accuracy: 0.9792\n", + "Valid - Loss: 260.8650 Accuracy: 0.7777\n", + "Epoch: 10\n", + "Train - Loss: 4.7490 Accuracy: 0.9995\n", + "Valid - Loss: 249.2714 Accuracy: 0.7962\n", + "Epoch: 11\n", + "Train - Loss: 87.3801 Accuracy: 0.9695\n", + "Valid - Loss: 263.7915 Accuracy: 0.7554\n", + "Epoch: 12\n", + "Train - Loss: 25.1833 Accuracy: 0.9913\n", + "Valid - Loss: 336.0438 Accuracy: 0.7323\n", + "Epoch: 13\n", + "Train - Loss: 22.2535 Accuracy: 0.9921\n", + "Valid - Loss: 254.7867 Accuracy: 0.8081\n", + "Epoch: 14\n", + "Train - Loss: 1.0967 Accuracy: 1.0000\n", + "Valid - Loss: 258.9991 Accuracy: 0.8115\n", + "Epoch: 15\n", + "Train - Loss: 0.7421 Accuracy: 1.0000\n", + "Valid - Loss: 263.6500 Accuracy: 0.8111\n", + "Epoch: 16\n", + "Train - Loss: 97.4083 Accuracy: 0.9700\n", + "Valid - Loss: 225.1667 Accuracy: 0.7959\n", + "Epoch: 17\n", + "Train - Loss: 3.6141 Accuracy: 0.9999\n", + "Valid - Loss: 243.8400 Accuracy: 0.8033\n", + "Epoch: 18\n", + "Train - Loss: 50.2676 Accuracy: 0.9828\n", + "Valid - Loss: 245.1898 Accuracy: 0.7980\n", + "Epoch: 19\n", + "Train - Loss: 1.5806 Accuracy: 1.0000\n", + "Valid - Loss: 255.6030 Accuracy: 0.8058\n" + ] + } + ], + "source": [ + "epochs = 20\n", + "\n", + "for epoch in range(epochs):\n", + " print('Epoch: {}'.format(epoch))\n", + " train()\n", + " validate()" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "asDVasm6TjyY" + }, + "source": [ + "### 2.6.4 Discussion: What happened?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "YWYAlfHzTjyY" + }, + "source": [ + "We can see that the training accuracy got to a fairly high level, but the validation accuracy was not as high. What happened here?\n", + "\n", + "Think about it for a bit before clicking on the '...' below to reveal the answer." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "BGB3jWGRTjyY" + }, + "source": [ + "`# SOLUTION`\n", + "This is an example of the model learning to categorize the training data, but performing poorly against new data that it has not been trained on. Essentially, it is memorizing the dataset, but not gaining a robust and general understanding of the problem. This is a common issue called *overfitting*. We will discuss overfitting in the next two lectures, as well as some ways to address it." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "OI5xFCVLTjyY" + }, + "source": [ + "## 2.7 Summary" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "cFcIcMOQTjyZ" + }, + "source": [ + "In this section you built your own neural network to perform image classification that is quite accurate. Congrats!\n", + "\n", + "At this point we should be getting somewhat familiar with the process of loading data (including labels), preparing it, creating a model, and then training the model with prepared data." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "CDYpo3ByTjyZ" + }, + "source": [ + "### 2.7.1 Clear the Memory\n", + "Before moving on, please execute the following cell to clear up the GPU memory. This is required to move on to the next notebook." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "Mm_uKgfxTjyZ" + }, + "outputs": [], + "source": [ + "import IPython\n", + "app = IPython.Application.instance()\n", + "app.kernel.do_shutdown(True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "vuUgx9kPTjyZ" + }, + "source": [ + "### 2.7.2 Next" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "WeMVBJwMTjyZ" + }, + "source": [ + "Now that you have built some very basic, somewhat effective models, we will begin to learn about more sophisticated models, including *Convolutional Neural Networks*." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "QW2ZlBpITjyZ" + }, + "source": [ + "
\"Header\"
" + ] + } + ], + "metadata": { + "accelerator": "GPU", + "colab": { + "gpuType": "T4", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.10.12" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} -- cgit v1.2.3