powers/Sundog-ML-Course: materials/Keras-CNN.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introducing Keras\n",
    "\n",
    "Let's use Keras on the MNIST data set again, this time using a Convolutional Neural Network that's better suited for image processing. CNN's are less sensitive to where in the image the pattern is that we're looking for.\n",
    "\n",
    "With a multi-layer perceptron, we achieved around 97% accuracy. Let's see if we can beat that.\n",
    "\n",
    "As before we'll start by importing the stuff we need, including the new layer types we talked about:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import tensorflow\n",
    "from tensorflow.keras.datasets import mnist\n",
    "from tensorflow.keras.models import Sequential\n",
    "from tensorflow.keras.layers import Dense, Dropout, Conv2D, MaxPooling2D, Flatten\n",
    "from tensorflow.keras.optimizers import RMSprop"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We'll load up our raw data set exactly as before:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "(mnist_train_images, mnist_train_labels), (mnist_test_images, mnist_test_labels) = mnist.load_data()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to shape the data differently then before. Since we're treating the data as 2D images of 28x28 pixels instead of a flattened stream of 784 pixels, we need to shape it accordingly. Depending on the data format Keras is set up for, this may be 1x28x28 or 28x28x1 (the \"1\" indicates a single color channel, as this is just grayscale. If we were dealing with color images, it would be 3 instead of 1 since we'd have red, green, and blue color channels)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "from tensorflow.keras import backend as K\n",
    "\n",
    "if K.image_data_format() == 'channels_first':\n",
    "    train_images = mnist_train_images.reshape(mnist_train_images.shape[0], 1, 28, 28)\n",
    "    test_images = mnist_test_images.reshape(mnist_test_images.shape[0], 1, 28, 28)\n",
    "    input_shape = (1, 28, 28)\n",
    "else:\n",
    "    train_images = mnist_train_images.reshape(mnist_train_images.shape[0], 28, 28, 1)\n",
    "    test_images = mnist_test_images.reshape(mnist_test_images.shape[0], 28, 28, 1)\n",
    "    input_shape = (28, 28, 1)\n",
    "    \n",
    "train_images = train_images.astype('float32')\n",
    "test_images = test_images.astype('float32')\n",
    "train_images /= 255\n",
    "test_images /= 255"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As before we need to convert our train and test labels to be categorical in one-hot format:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_labels = tensorflow.keras.utils.to_categorical(mnist_train_labels, 10)\n",
    "test_labels = tensorflow.keras.utils.to_categorical(mnist_test_labels, 10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As a sanity check let's print out one of the training images with its label:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0. 0. 0. 1. 0. 0. 0. 0. 0. 0.]\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<Figure size 640x480 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "import matplotlib.pyplot as plt\n",
    "\n",
    "def display_sample(num):\n",
    "    #Print the one-hot array of this sample's label \n",
    "    print(train_labels[num])  \n",
    "    #Print the label converted back to a number\n",
    "    label = train_labels[num].argmax(axis=0)\n",
    "    #Reshape the 768 values to a 28x28 image\n",
    "    image = train_images[num].reshape([28,28])\n",
    "    plt.title('Sample: %d  Label: %d' % (num, label))\n",
    "    plt.imshow(image, cmap=plt.get_cmap('gray_r'))\n",
    "    plt.show()\n",
    "    \n",
    "display_sample(1234)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now for the meat of the problem. Setting up a convolutional neural network involves more layers. Not all of these are strictly necessary; you could run without pooling and dropout, but those extra steps help avoid overfitting and help things run faster.\n",
    "\n",
    "We'll start with a 2D convolution of the image - it's set up to take 32 windows, or \"filters\", of each image, each filter being 3x3 in size.\n",
    "\n",
    "We then run a second convolution on top of that with 64 3x3 windows - this topology is just what comes recommended within Keras's own examples. Again you want to re-use previous research whenever possible while tuning CNN's, as it is hard to do.\n",
    "\n",
    "Next we apply a MaxPooling2D layer that takes the maximum of each 2x2 result to distill the results down into something more manageable.\n",
    "\n",
    "A dropout filter is then applied to prevent overfitting.\n",
    "\n",
    "Next we flatten the 2D layer we have at this stage into a 1D layer. So at this point we can just pretend we have a traditional multi-layer perceptron...\n",
    "\n",
    "... and feed that into a hidden, flat layer of 128 units.\n",
    "\n",
    "We then apply dropout again to further prevent overfitting.\n",
    "\n",
    "And finally, we feed that into our final 10 units where softmax is applied to choose our category of 0-9."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Sequential()\n",
    "model.add(Conv2D(32, kernel_size=(3, 3),\n",
    "                 activation='relu',\n",
    "                 input_shape=input_shape))\n",
    "# 64 3x3 kernels\n",
    "model.add(Conv2D(64, (3, 3), activation='relu'))\n",
    "# Reduce by taking the max of each 2x2 block\n",
    "model.add(MaxPooling2D(pool_size=(2, 2)))\n",
    "# Dropout to avoid overfitting\n",
    "model.add(Dropout(0.25))\n",
    "# Flatten the results to one dimension for passing into our final layer\n",
    "model.add(Flatten())\n",
    "# A hidden layer to learn with\n",
    "model.add(Dense(128, activation='relu'))\n",
    "# Another dropout\n",
    "model.add(Dropout(0.5))\n",
    "# Final categorization from 0-9 with softmax\n",
    "model.add(Dense(10, activation='softmax'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's double check the model description:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Model: \"sequential\"\n",
      "_________________________________________________________________\n",
      "Layer (type)                 Output Shape              Param #   \n",
      "=================================================================\n",
      "conv2d (Conv2D)              (None, 26, 26, 32)        320       \n",
      "_________________________________________________________________\n",
      "conv2d_1 (Conv2D)            (None, 24, 24, 64)        18496     \n",
      "_________________________________________________________________\n",
      "max_pooling2d (MaxPooling2D) (None, 12, 12, 64)        0         \n",
      "_________________________________________________________________\n",
      "dropout (Dropout)            (None, 12, 12, 64)        0         \n",
      "_________________________________________________________________\n",
      "flatten (Flatten)            (None, 9216)              0         \n",
      "_________________________________________________________________\n",
      "dense (Dense)                (None, 128)               1179776   \n",
      "_________________________________________________________________\n",
      "dropout_1 (Dropout)          (None, 128)               0         \n",
      "_________________________________________________________________\n",
      "dense_1 (Dense)              (None, 10)                1290      \n",
      "=================================================================\n",
      "Total params: 1,199,882\n",
      "Trainable params: 1,199,882\n",
      "Non-trainable params: 0\n",
      "_________________________________________________________________\n"
     ]
    }
   ],
   "source": [
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are still doing multiple categorization, so categorical_crossentropy is still the right loss function to use. We'll use the Adam optimizer, although the example provided with Keras uses RMSProp. You might want to try both if you have time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.compile(loss='categorical_crossentropy',\n",
    "              optimizer='adam',\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And now we train our model... to make things go a little faster, we'll use batches of 32.\n",
    "\n",
    "## Warning\n",
    "\n",
    "This could take hours to run, and your computer's CPU will be maxed out during that time! Don't run the next block unless you can tie up your computer for a long time. It will print progress as each epoch is run, but each epoch can take around 20 minutes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train on 60000 samples, validate on 10000 samples\n",
      "Epoch 1/10\n",
      "60000/60000 - 70s - loss: 0.1818 - accuracy: 0.9457 - val_loss: 0.0434 - val_accuracy: 0.9861\n",
      "Epoch 2/10\n",
      "60000/60000 - 66s - loss: 0.0769 - accuracy: 0.9768 - val_loss: 0.0328 - val_accuracy: 0.9892\n",
      "Epoch 3/10\n",
      "60000/60000 - 67s - loss: 0.0570 - accuracy: 0.9827 - val_loss: 0.0328 - val_accuracy: 0.9904\n",
      "Epoch 4/10\n",
      "60000/60000 - 66s - loss: 0.0491 - accuracy: 0.9848 - val_loss: 0.0312 - val_accuracy: 0.9907\n",
      "Epoch 5/10\n",
      "60000/60000 - 67s - loss: 0.0414 - accuracy: 0.9872 - val_loss: 0.0321 - val_accuracy: 0.9907\n",
      "Epoch 6/10\n",
      "60000/60000 - 66s - loss: 0.0363 - accuracy: 0.9883 - val_loss: 0.0341 - val_accuracy: 0.9905\n",
      "Epoch 7/10\n",
      "60000/60000 - 67s - loss: 0.0302 - accuracy: 0.9906 - val_loss: 0.0284 - val_accuracy: 0.9914\n",
      "Epoch 8/10\n",
      "60000/60000 - 66s - loss: 0.0282 - accuracy: 0.9917 - val_loss: 0.0301 - val_accuracy: 0.9906\n",
      "Epoch 9/10\n",
      "60000/60000 - 67s - loss: 0.0254 - accuracy: 0.9918 - val_loss: 0.0340 - val_accuracy: 0.9908\n",
      "Epoch 10/10\n",
      "60000/60000 - 66s - loss: 0.0232 - accuracy: 0.9920 - val_loss: 0.0344 - val_accuracy: 0.9908\n"
     ]
    }
   ],
   "source": [
    "history = model.fit(train_images, train_labels,\n",
    "                    batch_size=32,\n",
    "                    epochs=10,\n",
    "                    verbose=2,\n",
    "                    validation_data=(test_images, test_labels))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Was it worth the wait?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Test loss: 0.034441117036675316\n",
      "Test accuracy: 0.9908\n"
     ]
    }
   ],
   "source": [
    "score = model.evaluate(test_images, test_labels, verbose=0)\n",
    "print('Test loss:', score[0])\n",
    "print('Test accuracy:', score[1])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Over 99%! And that's with just 10 epochs! And from the looks of it, 4 or 5 would have been enough. It came at a significant cost in terms of computing power, but when you start distributing things over multiple computers each with multiple GPU's, that cost starts to feel less bad. If you're building something where life and death are on the line, like a self-driving car, every fraction of a percent matters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  },
  "widgets": {
   "state": {},
   "version": "1.1.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}