aboutsummaryrefslogtreecommitdiff
path: root/Fundamentals_of_Deep_Learning/07_assessment.ipynb
diff options
context:
space:
mode:
authorleshe4ka46 <alex9102naid1@ya.ru>2025-12-25 21:28:30 +0300
committerleshe4ka46 <alex9102naid1@ya.ru>2025-12-25 21:28:30 +0300
commit53f20d58628171934c097dff5602fe17765eae99 (patch)
tree83f7344f76924ffd0aa81c2fdc4ee09fa3de9459 /Fundamentals_of_Deep_Learning/07_assessment.ipynb
parent175ac10904d0f31c3ffeeeed507c8914f13d0b15 (diff)
finishHEADmain
Diffstat (limited to 'Fundamentals_of_Deep_Learning/07_assessment.ipynb')
-rw-r--r--Fundamentals_of_Deep_Learning/07_assessment.ipynb632
1 files changed, 632 insertions, 0 deletions
diff --git a/Fundamentals_of_Deep_Learning/07_assessment.ipynb b/Fundamentals_of_Deep_Learning/07_assessment.ipynb
new file mode 100644
index 0000000..47e1098
--- /dev/null
+++ b/Fundamentals_of_Deep_Learning/07_assessment.ipynb
@@ -0,0 +1,632 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<center><a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI_Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a></center>"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 7. Assessment"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Congratulations on going through today's course! Hopefully, you've learned some valuable skills along the way and had fun doing it. Now it's time to put those skills to the test. In this assessment, you will train a new model that is able to recognize fresh and rotten fruit. You will need to get the model to a validation accuracy of `92%` in order to pass the assessment, though we challenge you to do even better if you can. You will have the use the skills that you learned in the previous exercises. Specifically, we suggest using some combination of transfer learning, data augmentation, and fine tuning. Once you have trained the model to be at least 92% accurate on the validation dataset, save your model, and then assess its accuracy. Let's get started! "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import torch\n",
+ "import torch.nn as nn\n",
+ "from torch.optim import Adam\n",
+ "from torch.utils.data import Dataset, DataLoader\n",
+ "import torchvision.transforms.v2 as transforms\n",
+ "import torchvision.io as tv_io\n",
+ "\n",
+ "import glob\n",
+ "from PIL import Image\n",
+ "\n",
+ "import utils\n",
+ "\n",
+ "device = torch.device(\"cuda\" if torch.cuda.is_available() else \"cpu\")\n",
+ "torch.cuda.is_available()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.1 The Dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this exercise, you will train a model to recognize fresh and rotten fruits. The dataset comes from [Kaggle](https://www.kaggle.com/sriramr/fruits-fresh-and-rotten-for-classification), a great place to go if you're interested in starting a project after this class. The dataset structure is in the `data/fruits` folder. There are 6 categories of fruits: fresh apples, fresh oranges, fresh bananas, rotten apples, rotten oranges, and rotten bananas. This will mean that your model will require an output layer of 6 neurons to do the categorization successfully. You'll also need to compile the model with `categorical_crossentropy`, as we have more than two categories."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<img src=\"./images/fruits.png\" style=\"width: 600px;\">"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.2 Load ImageNet Base Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We encourage you to start with a model pretrained on ImageNet. Load the model with the correct weights. Because these pictures are in color, there will be three channels for red, green, and blue. We've filled in the input shape for you. If you need a reference for setting up the pretrained model, please take a look at [notebook 05b](05b_presidential_doggy_door.ipynb) where we implemented transfer learning."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 50,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from torchvision.models import vgg16\n",
+ "from torchvision.models import VGG16_Weights\n",
+ "\n",
+ "weights = VGG16_Weights.DEFAULT\n",
+ "vgg_model = vgg16(weights=weights)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.3 Freeze Base Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Next, we suggest freezing the base model, as done in [notebook 05b](05b_presidential_doggy_door.ipynb). This is done so that all the learning from the ImageNet dataset does not get destroyed in the initial training."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 51,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "False"
+ ]
+ },
+ "execution_count": 51,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Freeze base model\n",
+ "vgg_model.requires_grad_(False)\n",
+ "next(iter(vgg_model.parameters())).requires_grad"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.4 Add Layers to Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now it's time to add layers to the pretrained model. [Notebook 05b](05b_presidential_doggy_door.ipynb) can be used as a guide. Pay close attention to the last dense layer and make sure it has the correct number of neurons to classify the different types of fruit.\n",
+ "\n",
+ "The later layers of a model become more specific to the data the model trained on. Since we want the more general learnings from VGG, we can select parts of it, like so:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Sequential(\n",
+ " (0): Linear(in_features=25088, out_features=4096, bias=True)\n",
+ " (1): ReLU(inplace=True)\n",
+ " (2): Dropout(p=0.5, inplace=False)\n",
+ ")"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "vgg_model.classifier[0:3]"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Once we've taken what we've wanted from VGG16, we can then add our own modifications. No matter what additional modules we add, we still need to end with one value for each output."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Sequential(\n",
+ " (0): Sequential(\n",
+ " (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (1): ReLU(inplace=True)\n",
+ " (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (3): ReLU(inplace=True)\n",
+ " (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+ " (5): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (6): ReLU(inplace=True)\n",
+ " (7): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (8): ReLU(inplace=True)\n",
+ " (9): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+ " (10): Conv2d(128, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (11): ReLU(inplace=True)\n",
+ " (12): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (13): ReLU(inplace=True)\n",
+ " (14): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (15): ReLU(inplace=True)\n",
+ " (16): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+ " (17): Conv2d(256, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (18): ReLU(inplace=True)\n",
+ " (19): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (20): ReLU(inplace=True)\n",
+ " (21): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (22): ReLU(inplace=True)\n",
+ " (23): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+ " (24): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (25): ReLU(inplace=True)\n",
+ " (26): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (27): ReLU(inplace=True)\n",
+ " (28): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+ " (29): ReLU(inplace=True)\n",
+ " (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+ " )\n",
+ " (1): AdaptiveAvgPool2d(output_size=(7, 7))\n",
+ " (2): Flatten(start_dim=1, end_dim=-1)\n",
+ " (3): Sequential(\n",
+ " (0): Linear(in_features=25088, out_features=4096, bias=True)\n",
+ " (1): ReLU(inplace=True)\n",
+ " (2): Dropout(p=0.5, inplace=False)\n",
+ " )\n",
+ " (4): Linear(in_features=4096, out_features=500, bias=True)\n",
+ " (5): ReLU()\n",
+ " (6): Linear(in_features=500, out_features=6, bias=True)\n",
+ ")"
+ ]
+ },
+ "execution_count": 52,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "N_CLASSES = 6\n",
+ "# https://github.com/pytorch/vision/blob/96e779759a883651e6ec2b394bf89de8beb5b709/torchvision/models/vgg.py#L35\n",
+ "my_model = nn.Sequential(\n",
+ " vgg_model.features,\n",
+ " vgg_model.avgpool,\n",
+ " nn.Flatten(),\n",
+ " vgg_model.classifier[0:3],\n",
+ " nn.Linear(4096, 500),\n",
+ " nn.ReLU(),\n",
+ " nn.Linear(500, N_CLASSES)\n",
+ ")\n",
+ "my_model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.5 Compile Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now it's time to compile the model with loss and metrics options. We have 6 classes, so which loss function should we use?"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 53,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "loss_function = nn.CrossEntropyLoss()\n",
+ "optimizer = Adam(my_model.parameters())\n",
+ "my_model = torch.compile(my_model.to(device))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.6 Data Transforms"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To preprocess our input images, we will use the transforms included with the VGG16 weights."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 54,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pre_trans = weights.transforms()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Try to randomly augment the data to improve the dataset. Feel free to look at [notebook 04a](04a_asl_augmentation.ipynb) and [notebook 05b](05b_presidential_doggy_door.ipynb) for augmentation examples. There is also documentation for the [TorchVision Transforms class](https://pytorch.org/vision/stable/transforms.html).\n",
+ "\n",
+ "**Hint**: Remember not to make the data augmentation too extreme."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 55,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "IMG_WIDTH, IMG_HEIGHT = (224, 224)\n",
+ "\n",
+ "random_trans = transforms.Compose([\n",
+ " transforms.RandomRotation(25),\n",
+ " transforms.RandomResizedCrop((IMG_WIDTH, IMG_HEIGHT), scale=(.8, .8), ratio=(1, 1)),\n",
+ " transforms.RandomHorizontalFlip(),\n",
+ " transforms.RandomVerticalFlip()\n",
+ "])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.7 Load Dataset"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now it's time to load the train and validation datasets. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "DATA_LABELS = [\"freshapples\", \"freshbanana\", \"freshoranges\", \"rottenapples\", \"rottenbanana\", \"rottenoranges\"] \n",
+ " \n",
+ "class MyDataset(Dataset):\n",
+ " def __init__(self, data_dir):\n",
+ " self.imgs = []\n",
+ " self.labels = []\n",
+ " \n",
+ " for l_idx, label in enumerate(DATA_LABELS):\n",
+ " data_paths = glob.glob(data_dir + label + '/*.png', recursive=True)\n",
+ " for path in data_paths:\n",
+ " img = tv_io.read_image(path, tv_io.ImageReadMode.RGB)\n",
+ " self.imgs.append(pre_trans(img).to(device))\n",
+ " self.labels.append(torch.tensor(l_idx).to(device))\n",
+ "\n",
+ "\n",
+ " def __getitem__(self, idx):\n",
+ " img = self.imgs[idx]\n",
+ " label = self.labels[idx]\n",
+ " return img, label\n",
+ "\n",
+ " def __len__(self):\n",
+ " return len(self.imgs)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Select the batch size `n` and set `shuffle` either to `True` or `False` depending on if we are `train`ing or `valid`ating. For a reference, check out [notebook 05b](05b_presidential_doggy_door.ipynb)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "n = 32\n",
+ "\n",
+ "train_path = \"data/fruits/train/\"\n",
+ "train_data = MyDataset(train_path)\n",
+ "train_loader = DataLoader(train_data, batch_size=n, shuffle=True)\n",
+ "train_N = len(train_loader.dataset)\n",
+ "\n",
+ "valid_path = \"data/fruits/valid/\"\n",
+ "valid_data = MyDataset(valid_path)\n",
+ "valid_loader = DataLoader(valid_data, batch_size=n, shuffle=False)\n",
+ "valid_N = len(valid_loader.dataset)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 48,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1182\n",
+ "329\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(train_N)\n",
+ "print(valid_N)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.8 Train the Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Time to train the model! We've moved the `train` and `validate` functions to our [utils.py](./utils.py) file. Before running the below, make sure all your variables are correctly defined.\n",
+ "\n",
+ "It may help to rerun this cell or change the number of `epochs`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Epoch: 0\n",
+ "Train - Loss: 17.4653 Accuracy: 0.8283\n",
+ "Valid - Loss: 2.0145 Accuracy: 0.9331\n",
+ "Epoch: 1\n",
+ "Train - Loss: 4.8685 Accuracy: 0.9569\n",
+ "Valid - Loss: 2.7675 Accuracy: 0.9392\n",
+ "Epoch: 2\n",
+ "Train - Loss: 3.8462 Accuracy: 0.9653\n",
+ "Valid - Loss: 1.2702 Accuracy: 0.9574\n",
+ "Epoch: 3\n",
+ "Train - Loss: 4.5526 Accuracy: 0.9552\n",
+ "Valid - Loss: 1.7280 Accuracy: 0.9544\n",
+ "Epoch: 4\n",
+ "Train - Loss: 3.6923 Accuracy: 0.9619\n",
+ "Valid - Loss: 1.2654 Accuracy: 0.9666\n",
+ "Epoch: 5\n",
+ "Train - Loss: 2.2255 Accuracy: 0.9788\n",
+ "Valid - Loss: 2.1297 Accuracy: 0.9483\n",
+ "Epoch: 6\n",
+ "Train - Loss: 3.2439 Accuracy: 0.9636\n",
+ "Valid - Loss: 3.8254 Accuracy: 0.9331\n",
+ "Epoch: 7\n"
+ ]
+ }
+ ],
+ "source": [
+ "epochs = 10\n",
+ "\n",
+ "for epoch in range(epochs):\n",
+ " print('Epoch: {}'.format(epoch))\n",
+ " utils.train(my_model, train_loader, train_N, random_trans, optimizer, loss_function)\n",
+ " utils.validate(my_model, valid_loader, valid_N, loss_function)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.9 Unfreeze Model for Fine Tuning"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you have reached 92% validation accuracy already, this next step is optional. If not, we suggest fine tuning the model with a very low learning rate."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Unfreeze the base model\n",
+ "vgg_model.requires_grad_(True)\n",
+ "optimizer = Adam(my_model.parameters(), lr=.0001)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "epochs = 1\n",
+ "\n",
+ "for epoch in range(epochs):\n",
+ " print('Epoch: {}'.format(epoch))\n",
+ " utils.train(my_model, train_loader, train_N, random_trans, optimizer, loss_function)\n",
+ " utils.validate(my_model, valid_loader, valid_N, loss_function)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.10 Evaluate the Model"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Hopefully, you now have a model that has a validation accuracy of 92% or higher. If not, you may want to go back and either run more epochs of training, or adjust your data augmentation. \n",
+ "\n",
+ "Once you are satisfied with the validation accuracy, evaluate the model by executing the following cell. The evaluate function will return a tuple, where the first value is your loss, and the second value is your accuracy. To pass, the model will need have an accuracy value of `92% or higher`. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "utils.validate(my_model, valid_loader, valid_N, loss_function)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.11 Run the Assessment"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "To assess your model run the following two cells.\n",
+ "\n",
+ "**NOTE:** `run_assessment` assumes your model is named `my_model`. If for any reason you have modified these variable names, please update the names of the arguments passed to `run_assessment`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from run_assessment import run_assessment"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "run_assessment(my_model)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 7.12 Generate a Certificate"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If you passed the assessment, please return to the course page (shown below) and click the \"ASSESS TASK\" button, which will generate your certificate for the course."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<img src=\"./images/assess_task.png\" style=\"width: 800px;\">"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "<center><a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI_Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a></center>"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.10.12"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}