{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Dealing with Outliers"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes outliers can mess up an analysis; you usually don't want a handful of data points to skew the overall results. Let's revisit our example of income data, with some random billionaire thrown in:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAYcAAAEJCAYAAAB/pOvWAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEZtJREFUeJzt3X+snmV9x/H3Ryr4W9BW41q0OOsPNHNig6iJc9ZAwcWyKUuNjmqaNXHMOafbcPujC0qmmxuOTHGddBbjRMbMaBQlDWJ0i6BFFAXG6MDBESZ1hepG/FH97o/nAo+9nrZPz3POeXra9ys5ee77uq/7fr5Xz4HPuX8810lVIUnSdA+bdAGSpEOP4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqTOokkXMFOLFy+u5cuXT7oMSVowrr/++u9W1ZJR+i7YcFi+fDnbt2+fdBmStGAk+a9R+3pZSZLUMRwkSR3DQZLUMRwkSR3DQZLUOWA4JNmc5N4k35zW9oQk25Lc1l6Pa+1JcmGSHUluTHLStH3Wtf63JVk3rf2FSb7R9rkwSWZ7kJKkgzPKmcNHgNV7tZ0LXF1VK4Cr2zrA6cCK9rUBuAgGYQJsBF4EnAxsfDBQWp8N0/bb+70kSfPsgOFQVV8Adu3VvAbY0pa3AGdOa7+kBq4Fjk3yFOA0YFtV7aqq+4BtwOq27XFV9aUa/L3SS6YdS5I0ITO95/DkqroHoL0+qbUvBe6a1m+qte2vfWpIuyRpgmb7E9LD7hfUDNqHHzzZwOASFE996lNnUh8Ay8/99ND2b73nVTM+piQdTmZ65vCddkmI9npva58Cjp/Wbxlw9wHalw1pH6qqNlXVyqpauWTJSNODSJJmYKbhsBV48ImjdcAV09rPbk8tnQLsbpedrgJOTXJcuxF9KnBV2/b9JKe0p5TOnnYsSdKEHPCyUpKPAy8HFieZYvDU0XuAy5KsB+4EzmrdrwTOAHYADwBvAqiqXUneBXyl9Tuvqh68yf1mBk9EPRL4TPuSJE3QAcOhql63j02rhvQt4Jx9HGczsHlI+3bgeQeqQ5I0f/yEtCSpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjqGgySpYzhIkjpjhUOStyW5Kck3k3w8ySOSnJDkuiS3JflEkqNb32Pa+o62ffm047yztd+a5LTxhiRJGteMwyHJUuD3gJVV9TzgKGAt8F7ggqpaAdwHrG+7rAfuq6pnABe0fiQ5se33XGA18MEkR820LknS+Ma9rLQIeGSSRcCjgHuAVwCXt+1bgDPb8pq2Ttu+Kkla+6VV9cOqugPYAZw8Zl2SpDHMOByq6tvA+4A7GYTCbuB64P6q2tO6TQFL2/JS4K62757W/4nT24fsI0magHEuKx3H4Lf+E4BfAB4NnD6kaz24yz627at92HtuSLI9yfadO3cefNGSpJGMc1nplcAdVbWzqn4MfBJ4CXBsu8wEsAy4uy1PAccDtO2PB3ZNbx+yz8+pqk1VtbKqVi5ZsmSM0iVJ+zNOONwJnJLkUe3ewSrgZuAa4LWtzzrgira8ta3Ttn+uqqq1r21PM50ArAC+PEZdkqQxLTpwl+Gq6roklwNfBfYANwCbgE8DlyZ5d2u7uO1yMfDRJDsYnDGsbce5KcllDIJlD3BOVf1kpnVJksY343AAqKqNwMa9mm9nyNNGVfUD4Kx9HOd84PxxapEkzR4/IS1J6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6hgOkqSO4SBJ6owVDkmOTXJ5kn9PckuSFyd5QpJtSW5rr8e1vklyYZIdSW5MctK046xr/W9Lsm7cQUmSxjPumcPfAJ+tqmcDzwduAc4Frq6qFcDVbR3gdGBF+9oAXASQ5AnARuBFwMnAxgcDRZI0GTMOhySPA14GXAxQVT+qqvuBNcCW1m0LcGZbXgNcUgPXAscmeQpwGrCtqnZV1X3ANmD1TOuSJI1vnDOHpwM7gX9IckOSDyd5NPDkqroHoL0+qfVfCtw1bf+p1ravdknShIwTDouAk4CLquoFwP/xs0tIw2RIW+2nvT9AsiHJ9iTbd+7cebD1SpJGNE44TAFTVXVdW7+cQVh8p10uor3eO63/8dP2XwbcvZ/2TlVtqqqVVbVyyZIlY5QuSdqfGYdDVf03cFeSZ7WmVcDNwFbgwSeO1gFXtOWtwNntqaVTgN3tstNVwKlJjms3ok9tbZKkCVk05v5vAT6W5GjgduBNDALnsiTrgTuBs1rfK4EzgB3AA60vVbUrybuAr7R+51XVrjHrkiSNYaxwqKqvASuHbFo1pG8B5+zjOJuBzePUIkmaPX5CWpLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSR3DQZLUMRwkSZ2xwyHJUUluSPKptn5CkuuS3JbkE0mObu3HtPUdbfvyacd4Z2u/Nclp49YkSRrPbJw5vBW4Zdr6e4ELqmoFcB+wvrWvB+6rqmcAF7R+JDkRWAs8F1gNfDDJUbNQlyRphsYKhyTLgFcBH27rAV4BXN66bAHObMtr2jpt+6rWfw1waVX9sKruAHYAJ49TlyRpPOOeObwf+CPgp239icD9VbWnrU8BS9vyUuAugLZ9d+v/UPuQfSRJEzDjcEjya8C9VXX99OYhXesA2/a3z97vuSHJ9iTbd+7ceVD1SpJGN86Zw0uBVyf5FnApg8tJ7weOTbKo9VkG3N2Wp4DjAdr2xwO7prcP2efnVNWmqlpZVSuXLFkyRumSpP2ZcThU1TurallVLWdwQ/lzVfV64Brgta3bOuCKtry1rdO2f66qqrWvbU8znQCsAL4807okSeNbdOAuB+2PgUuTvBu4Abi4tV8MfDTJDgZnDGsBquqmJJcBNwN7gHOq6idzUJckaUSzEg5V9Xng8235doY8bVRVPwDO2sf+5wPnz0YtkqTx+QlpSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVJnxuGQ5Pgk1yS5JclNSd7a2p+QZFuS29rrca09SS5MsiPJjUlOmnasda3/bUnWjT8sSdI4xjlz2AO8vaqeA5wCnJPkROBc4OqqWgFc3dYBTgdWtK8NwEUwCBNgI/Ai4GRg44OBIkmajBmHQ1XdU1VfbcvfB24BlgJrgC2t2xbgzLa8BrikBq4Fjk3yFOA0YFtV7aqq+4BtwOqZ1iVJGt+s3HNIshx4AXAd8OSqugcGAQI8qXVbCtw1bbep1rav9mHvsyHJ9iTbd+7cORulS5KGGDsckjwG+Gfg96vqe/vrOqSt9tPeN1ZtqqqVVbVyyZIlB1+sJGkkY4VDkoczCIaPVdUnW/N32uUi2uu9rX0KOH7a7suAu/fTLkmakHGeVgpwMXBLVf31tE1bgQefOFoHXDGt/ez21NIpwO522ekq4NQkx7Ub0ae2NknShCwaY9+XAr8FfCPJ11rbnwDvAS5Lsh64EzirbbsSOAPYATwAvAmgqnYleRfwldbvvKraNUZdkqQxzTgcqupfGX6/AGDVkP4FnLOPY20GNs+0FknS7PIT0pKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeoYDpKkjuEgSeocMuGQZHWSW5PsSHLupOuRpCPZIREOSY4CPgCcDpwIvC7JiZOtSpKOXIdEOAAnAzuq6vaq+hFwKbBmwjVJ0hHrUAmHpcBd09anWpskaQIWTbqAJkPaquuUbAA2tNX/TXLrDN9vMfDd7vjvneHRFoahYz7MHWljPtLGC475YD1t1I6HSjhMAcdPW18G3L13p6raBGwa982SbK+qleMeZyFxzIe/I2284Jjn0qFyWekrwIokJyQ5GlgLbJ1wTZJ0xDokzhyqak+S3wWuAo4CNlfVTRMuS5KOWIdEOABU1ZXAlfP0dmNfmlqAHPPh70gbLzjmOZOq7r6vJOkId6jcc5AkHUIO63A40JQcSY5J8om2/boky+e/ytkzwnj/IMnNSW5McnWSkR9rO1SNOu1KktcmqSQL/smWUcac5Dfb9/qmJP843zXOthF+tp+a5JokN7Sf7zMmUedsSbI5yb1JvrmP7UlyYfv3uDHJSbNeRFUdll8Mbmz/J/B04Gjg68CJe/X5HeBDbXkt8IlJ1z3H4/1V4FFt+c0Lebyjjrn1eyzwBeBaYOWk656H7/MK4AbguLb+pEnXPQ9j3gS8uS2fCHxr0nWPOeaXAScB39zH9jOAzzD4jNgpwHWzXcPhfOYwypQca4AtbflyYFWSYR/IWwgOON6quqaqHmir1zL4PMlCNuq0K+8C/gL4wXwWN0dGGfNvAx+oqvsAqureea5xto0y5gIe15Yfz5DPSS0kVfUFYNd+uqwBLqmBa4FjkzxlNms4nMNhlCk5HupTVXuA3cAT56W62XewU5CsZ/Cbx0J2wDEneQFwfFV9aj4Lm0OjfJ+fCTwzyb8luTbJ6nmrbm6MMuY/A96QZIrBU49vmZ/SJmbOpxw6ZB5lnQOjTMkx0rQdC8TIY0nyBmAl8CtzWtHc2++YkzwMuAB443wVNA9G+T4vYnBp6eUMzg6/mOR5VXX/HNc2V0YZ8+uAj1TVXyV5MfDRNuafzn15EzHn/+86nM8cRpmS46E+SRYxOB3d36ncoWykKUiSvBL4U+DVVfXDeaptrhxozI8Fngd8Psm3GFyb3brAb0qP+nN9RVX9uKruAG5lEBYL1ShjXg9cBlBVXwIewWAOosPVSP+9j+NwDodRpuTYCqxry68FPlftbs8CdMDxtkssf8cgGBb6dWg4wJirandVLa6q5VW1nMF9lldX1fbJlDsrRvm5/hcGDx+QZDGDy0y3z2uVs2uUMd8JrAJI8hwG4bBzXqucX1uBs9tTS6cAu6vqntl8g8P2slLtY0qOJOcB26tqK3Axg9PPHQzOGNZOruLxjDjevwQeA/xTu+9+Z1W9emJFj2nEMR9WRhzzVcCpSW4GfgL8YVX9z+SqHs+IY3478PdJ3sbg8sobF/AveiT5OIPLgovbfZSNwMMBqupDDO6rnAHsAB4A3jTrNSzgfz9J0hw5nC8rSZJmyHCQJHUMB0lSx3CQJHUMB0laAA40Gd9efZ/WJte8Mcnnkxz0VDmGgyQtDB8BRp0K5X0M5l76JeA84M8P9s0MB0laAIZNxpfkF5N8Nsn1Sb6Y5Nlt04nA1W35GoZPSLlfhoMkLVybgLdU1QuBdwAfbO1fB17Tln8deGySg5pU9LD9hLQkHc6SPAZ4CT+b8QDgmPb6DuBvk7yRwd8y+Taw52CObzhI0sL0MOD+qvrlvTdU1d3Ab8BDIfKaqtp9sAeXJC0wVfU94I4kZ8FDfzr0+W15cZuyHuCdwOaDPb7hIEkLQJuM70vAs5JMJVkPvB5Yn+TrwE387Mbzy4Fbk/wH8GTg/IN+PyfekyTtzTMHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdQwHSVLHcJAkdf4f9lLrf8Q7TygAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"import numpy as np\n",
"\n",
"incomes = np.random.normal(27000, 15000, 10000)\n",
"incomes = np.append(incomes, [1000000000])\n",
"\n",
"import matplotlib.pyplot as plt\n",
"plt.hist(incomes, 50)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That's not very helpful to look at. One billionaire ended up squeezing everybody else into a single line in my histogram. Plus it skewed my mean income significantly:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"126881.95523553"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"incomes.mean()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"It's important to dig into what is causing your outliers, and understand where they are coming from. You also need to think about whether removing them is a valid thing to do, given the spirit of what it is you're trying to analyze. If I know I want to understand more about the incomes of \"typical Americans\", filtering out billionaires seems like a legitimate thing to do.\n",
"\n",
"Here's something a little more robust than filtering out billionaires - it filters out anything beyond two standard deviations of the median value in the data set:"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXoAAAD8CAYAAAB5Pm/hAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAEjlJREFUeJzt3X+MpdVdx/H3R7aAVttly0DW/eFCutFSkxacVGqNqaUqLI1bEzFUY7eI2UTRVGuii/1DTfwD1NiWaGhJqS4GLUhb2SBacVuiJpZ2aZGWAjKlCOOu7NYC/mj8gX79456Ru8vMzp3fd86+X8nNfZ7znHvvOfvc/cyZ8/yYVBWSpH593Vo3QJK0sgx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUuc2rHUDAM4+++zasWPHWjdDktaV+++//ytVNTFfvbEI+h07dnDo0KG1boYkrStJ/mGUek7dSFLnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM4Z9JLUOYNekjpn0EtS58biylhpXO3Y96dzbnviustXsSXS4jmil6TOOaKXVslcvx34m4FW2kgj+iQbk9yR5JEkDyd5fZJNSe5J8lh7PqvVTZIbkkwleTDJRSvbBUnSyYw6dfM+4M+r6tuA1wAPA/uAg1W1EzjY1gEuA3a2x17gxmVtsSRpQeYN+iQvA74HuBmgqv6rqp4FdgP7W7X9wFvb8m7glhr4FLAxyeZlb7kkaSSjzNGfDxwDfi/Ja4D7gXcC51bVEYCqOpLknFZ/C/DU0OunW9mR4TdNspfBiJ/t27cvpQ/SmnDOXevFKFM3G4CLgBur6kLg33lhmmY2maWsXlRQdVNVTVbV5MTEvH8gRZK0SKME/TQwXVX3tfU7GAT/0zNTMu356FD9bUOv3wocXp7mSpIWat6gr6p/Ap5K8q2t6BLgi8ABYE8r2wPc2ZYPAG9vZ99cDDw3M8UjSVp9o55H/7PArUlOBx4HrmLwQ+L2JFcDTwJXtLp3A7uAKeBrra4kaY2MFPRV9QAwOcumS2apW8A1S2yXJGmZeAsESeqct0CQOPnNy6T1zhG9JHXOoJekzhn0ktQ5g16SOufBWGmZeWBX48YRvSR1zqCXpM4Z9JLUOefodUpx/lynIkf0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zqCXpM55wZS65IVR0gsc0UtS5wx6SeqcQS9JnTPoJalzIwV9kieSfD7JA0kOtbJNSe5J8lh7PquVJ8kNSaaSPJjkopXsgCTp5BYyov/eqnptVU229X3AwaraCRxs6wCXATvbYy9w43I1VpK0cEs5vXI38Ma2vB+4F/ilVn5LVRXwqSQbk2yuqiNLaajUq7lOBX3iustXuSXq1agj+gL+Isn9Sfa2snNnwrs9n9PKtwBPDb12upVJktbAqCP6N1TV4STnAPckeeQkdTNLWb2o0uAHxl6A7du3j9gMSdJCjTSir6rD7fko8DHgdcDTSTYDtOejrfo0sG3o5VuBw7O8501VNVlVkxMTE4vvgSTppOYN+iQvTfJNM8vA9wNfAA4Ae1q1PcCdbfkA8PZ29s3FwHPOz0vS2hll6uZc4GNJZur/YVX9eZLPALcnuRp4Erii1b8b2AVMAV8Drlr2VkuSRjZv0FfV48BrZin/Z+CSWcoLuGZZWidJWjKvjJWkzhn0ktQ570cvjSkvpNJycUQvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnPI9e69pc55pLeoEjeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zpuaaV3w5mXS4jmil6TOjRz0SU5L8rkkd7X185Lcl+SxJLclOb2Vn9HWp9r2HSvTdEnSKBYyon8n8PDQ+vXAe6pqJ/AMcHUrvxp4pqpeCbyn1ZMkrZGRgj7JVuBy4INtPcCbgDtalf3AW9vy7rZO235Jqy9JWgOjjujfC/wi8L9t/RXAs1X1fFufBra05S3AUwBt+3Ot/nGS7E1yKMmhY8eOLbL5kqT5zBv0Sd4CHK2q+4eLZ6laI2x7oaDqpqqarKrJiYmJkRorSVq4UU6vfAPwg0l2AWcCL2Mwwt+YZEMbtW8FDrf608A2YDrJBuDlwFeXveWSpJHMO6KvqmuramtV7QCuBD5RVT8GfBL44VZtD3BnWz7Q1mnbP1FVLxrRS5JWx1LOo/8l4F1JphjMwd/cym8GXtHK3wXsW1oTJUlLsaArY6vqXuDetvw48LpZ6vwHcMUytE2StAy8BYK0zsx1O4gnrrt8lVui9cKg11jxnjbS8vNeN5LUOYNekjpn0EtS5wx6SeqcQS9JnTPoJalzBr0kdc7z6KVOeCGV5uKIXpI6Z9BLUucMeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1bt6gT3Jmkk8n+bskDyX5tVZ+XpL7kjyW5LYkp7fyM9r6VNu+Y2W7IEk6mVFG9P8JvKmqXgO8Frg0ycXA9cB7qmon8Axwdat/NfBMVb0SeE+rJ0laI/P+4ZGqKuDf2upL2qOANwE/2sr3A78K3AjsbssAdwC/kyTtfSStMv8giUaao09yWpIHgKPAPcCXgGer6vlWZRrY0pa3AE8BtO3PAa9YzkZLkkY3UtBX1f9U1WuBrcDrgFfNVq095yTb/l+SvUkOJTl07NixUdsrSVqgBZ11U1XPAvcCFwMbk8xM/WwFDrflaWAbQNv+cuCrs7zXTVU1WVWTExMTi2u9JGle887RJ5kA/ruqnk3y9cCbGRxg/STww8CHgT3Ane0lB9r637btn3B+/tQ11/ywpNUzb9ADm4H9SU5j8BvA7VV1V5IvAh9O8uvA54CbW/2bgT9IMsVgJH/lCrRbkjSiUc66eRC4cJbyxxnM159Y/h/AFcvSOknSknllrCR1bpSpG0kd8vz6U4cjeknqnEEvSZ0z6CWpcwa9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0mdM+glqXMGvSR1zpuaSTqONzvrjyN6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6Z9BLUue8YErLYq6LbCStvXlH9Em2JflkkoeTPJTkna18U5J7kjzWns9q5UlyQ5KpJA8muWilOyFJmtsoUzfPA79QVa8CLgauSXIBsA84WFU7gYNtHeAyYGd77AVuXPZWS5JGNm/QV9WRqvpsW/5X4GFgC7Ab2N+q7Qfe2pZ3A7fUwKeAjUk2L3vLJUkjWdDB2CQ7gAuB+4Bzq+oIDH4YAOe0aluAp4ZeNt3KJElrYOSgT/KNwEeAn6uqfzlZ1VnKapb325vkUJJDx44dG7UZkqQFGinok7yEQcjfWlUfbcVPz0zJtOejrXwa2Db08q3A4RPfs6puqqrJqpqcmJhYbPslSfMY5aybADcDD1fVbw9tOgDsact7gDuHyt/ezr65GHhuZopHkrT6RjmP/g3AjwOfT/JAK/tl4Drg9iRXA08CV7RtdwO7gCnga8BVy9piSdKCzBv0VfU3zD7vDnDJLPULuGaJ7ZIkLRNvgSBJnTPoJalzBr0kdc6gl6TOGfSS1DlvUyxpJCe7FfUT112+ii3RQjmil6TOOaLXgvgHRqT1xxG9JHXOoJekzhn0ktQ5g16SOmfQS1LnDHpJ6pxBL0md8zx6SUs21/UVXjE7HhzRS1LnDHpJ6pxBL0mdc45es/KeNlI/HNFLUucMeknqnEEvSZ0z6CWpcwa9JHVu3rNuknwIeAtwtKq+vZVtAm4DdgBPAD9SVc8kCfA+YBfwNeAdVfXZlWm6pHHnFbPjYZQR/e8Dl55Qtg84WFU7gYNtHeAyYGd77AVuXJ5mSpIWa96gr6q/Ar56QvFuYH9b3g+8daj8lhr4FLAxyeblaqwkaeEWe8HUuVV1BKCqjiQ5p5VvAZ4aqjfdyo6c+AZJ9jIY9bN9+/ZFNkNL4UVR0qlhuQ/GZpaymq1iVd1UVZNVNTkxMbHMzZAkzVhs0D89MyXTno+28mlg21C9rcDhxTdPkrRUiw36A8CetrwHuHOo/O0ZuBh4bmaKR5K0NkY5vfKPgDcCZyeZBn4FuA64PcnVwJPAFa363QxOrZxicHrlVSvQZknSAswb9FX1tjk2XTJL3QKuWWqjJEnLxytjJalz3o9e0qrzitnV5Yhekjpn0EtS5wx6Seqcc/SnAG91oPXCufuV4YhekjrniL4jjtwlzcYRvSR1zqCXpM4Z9JLUOYNekjrnwVhJY8/TLpfGEb0kdc4R/TrkaZSSFsIRvSR1zqCXpM4Z9JLUOefoJa1bno0zGoN+jHnQVVocfwAcz6kbSeqcQS9JnTPoJalzztGvIufcpfHU+5z+igR9kkuB9wGnAR+squtW4nPGlYEujadT9f/msk/dJDkN+F3gMuAC4G1JLljuz5EkjWYlRvSvA6aq6nGAJB8GdgNfXIHPWlOn6uhAOlUs55TOWk4PrUTQbwGeGlqfBr5zBT4HMGwlrb71ljsrEfSZpaxeVCnZC+xtq/+W5NEVaMtqOBv4ylo3YgXYr/Wn17513a9cv6T3+JZRKq1E0E8D24bWtwKHT6xUVTcBN63A56+qJIeqanKt27Hc7Nf602vf7NfSrcR59J8BdiY5L8npwJXAgRX4HEnSCJZ9RF9Vzyf5GeDjDE6v/FBVPbTcnyNJGs2KnEdfVXcDd6/Ee4+hdT/9NAf7tf702jf7tUSpetFxUklSR7zXjSR1zqA/QZLfTPJIkgeTfCzJxqFt1yaZSvJokh8YKr+0lU0l2TdUfl6S+5I8luS2dnCaJGe09am2fcdq9nE+c/VnnCTZluSTSR5O8lCSd7byTUnuaf/m9yQ5q5UnyQ2tTw8muWjovfa0+o8l2TNU/h1JPt9ec0OS2U4dXom+nZbkc0nuausL/h4t9Lu6Sv3amOSO9v/r4SSv72R//Xz7Dn4hyR8lOXPs9llV+Rh6AN8PbGjL1wPXt+ULgL8DzgDOA77E4GDzaW35fOD0VueC9prbgSvb8vuBn2rLPw28vy1fCdy21v0e6v+c/RmnB7AZuKgtfxPw920f/Qawr5XvG9p/u4A/Y3Cdx8XAfa18E/B4ez6rLZ/Vtn0aeH17zZ8Bl61S394F/CFw12K+R4v5rq5Sv/YDP9mWTwc2rvf9xeAC0S8DXz+0r94xbvtszf/DjvMD+CHg1rZ8LXDt0LaPty/V64GPD5Vf2x5hcJHHzA+N/68389q2vKHVy1r398R2ztbvcX0AdwLfBzwKbG5lm4FH2/IHgLcN1X+0bX8b8IGh8g+0ss3AI0Plx9VbwX5sBQ4CbwLuWsz3aKHf1VXaPy9rgZgTytf7/pq5E8Cmtg/uAn5g3PaZUzcn9xMMRgYw+60dtpyk/BXAs1X1/Anlx71X2/5cqz8O5urP2Gq//l4I3AecW1VHANrzOa3aQvfflrZ8YvlKey/wi8D/tvXFfI8W2tfVcD5wDPi9Ni31wSQvZZ3vr6r6R+C3gCeBIwz2wf2M2T47JYM+yV+2+bQTH7uH6rwbeB64daZolreqRZSf7L3GwTi37UWSfCPwEeDnqupfTlZ1lrLF7r8VkeQtwNGqun+4+CTtGPs+DdkAXATcWFUXAv/OYKpmLuuib+2Ywm4G0y3fDLyUwZ1752rLmvTrlPzDI1X15pNtbwd43gJcUu33JU5+a4fZyr8CbEyyof3kHq4/817TSTYALwe+uvgeLauRbmExDpK8hEHI31pVH23FTyfZXFVHkmwGjrbyufo1DbzxhPJ7W/nWWeqvpDcAP5hkF3Amg+mO97Lw79FCv6urYRqYrqr72vodDIJ+Pe8vgDcDX66qYwBJPgp8F+O2z1Zjfm49PYBLGdxSeeKE8ldz/MGSxxkcKNnQls/jhYMlr26v+WOOPyDz0235Go4/IHP7Wvd7qJ9z9mecHgxGOrcA7z2h/Dc5/uDeb7Tlyzn+4N6nW/kmBnPHZ7XHl4FNbdtnWt2Zg3u7VrF/b+SFg7EL+h4t5ru6Sn36a+Bb2/Kvtn21rvcXgzvzPgR8Q/vc/cDPjts+W/P/sOP2AKYYzIk90B7vH9r2bgZHwB9l6Ig+gzME/r5te/dQ+fkMzgSYajv+jFZ+ZlufatvPX+t+n/BvMGt/xukBfDeDX2EfHNpXuxjMdx4EHmvPMyEQBn8Q50vA54HJoff6ibYvpoCrhsongS+01/wOq3jAnOODfsHfo4V+V1epT68FDrV99icMgnrd7y/g14BH2mf/AYOwHqt95pWxktS5U/JgrCSdSgx6SeqcQS9JnTPoJalzBr0kdc6gl6TOGfSS1DmDXpI6938mR2jr/wcZkQAAAABJRU5ErkJggg==\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"def reject_outliers(data):\n",
" u = np.median(data)\n",
" s = np.std(data)\n",
" filtered = [e for e in data if (u - 2 * s < e < u + 2 * s)]\n",
" return filtered\n",
"\n",
"filtered = reject_outliers(incomes)\n",
"\n",
"plt.hist(filtered, 50)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"That looks better. And, our mean is more, well, meangingful now as well:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"26894.643431053548"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"np.mean(filtered)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Activity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Instead of a single outlier, add several randomly-generated outliers to the data. Experiment with different values of the multiple of the standard deviation to identify outliers, and see what effect it has on the final results."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}