Assignment 2 - Clustering¶Learning Outcomes¶In this assignment, you will do the following:· Explore...

Question

Assignment 2 - Clustering¶Learning Outcomes¶In this assignment, you will do the following:· Explore a dataset and cay out clustering using k-means algorithm· Identify the optimum number of clusters for a given datasetData:https:archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams XXXXXXXXXXProblem Description¶In this assignment, you will study the electricity demand from clients in Portugal, during 2013 and 2014. You have been provided with the data file, which you should download when you download this assignment file.The data11 available contains 370 time series, coesponding to the electric demand22 for 370 clients, between 2011 and 2014. In this guided exercise, you will use clustering techniques to understand the typical usage behaviour during XXXXXXXXXX.Both these datasets are publicly available, and can be used to cay out experiments. Their source is below:1. Data: https:archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams XXXXXXXXXX#2. Electric Demand: http:www.think-energy.net/KWvsKWH.htmWe will start by exploring the data set and continue on to the assignment. Consider this as a working notebook, you will add your work to the same notebook.In this assignment we will use the sklearn package for k-means. Please refer here for the documentation https:scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html (https:scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html).The sklearn package for k-means is one of the many clustering algorithms found in the module "sklearn.cluster". These come with a variety of functions that you can call by importing the package.For example from sklearn.cluster import AgglomerativeClusteringfrom sklearn.cluster import KMeansWork to be completed in the workbook provide - assignment2.ipynb – the questions are at the 2nd half of the workbook.Questions (15 marks total)¶Q1: (7 marks)a. Determine what a convenient number of clusters. Justify your choice. Make use of the sklearn's package for k-means for this. You may refer to the module to figure out how to come up with the optimal number of clusters. . Make a plot for each cluster, that includes:- The number of clients in the cluster (you can put this in the title of the plot)- All the curves in the cluste- The curve coesponding to the center of the cluster (make this curve thicker to distinguish it from the individual curves).  The center is also sometimes refeed to as "centroid".You have 2 separate plots for each cluster if you prefer (one for the individual curves, one for the centroid)Q2: (8 marks)In this exercise you work with the daily curves of 1 single client. First, create a list of aays, each aay containing a curve for a day. You may use X from the cells above. X = average_curves_norm.copy() The list contains 730 aays, one for each of the days of 2013 and 2014.a. Determine the optimal value of k ( number of clusters). This time you may also perform silhoutte analysis as stated in the module. Caying out silhoutte analysis is left as an exercise. What do you understand about the clusters? . Based on your results from your analyses of both methods, what do understand? Interpret it perhaps with different perspectives of timelines like weeks or months.

Sandeep Kumar · Accepted Answer

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "D9UboDIvnAKI"
   },
   "source": [
    "## Assignment 2 - Clustering"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "5AibtmcInAKK"
   },
   "source": [
    "## Learning Outcomes
",
    "
",
    "In this assignment, you will do the following:
",
    "
",
    "* Explore a dataset and carry out clustering using k-means algorithm
",
    "
",
    "* Identify the optimum number of clusters for a given dataset
",
    "
"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "TJBMBFfAnAKK"
   },
   "source": [
    "## Problem Description
",
    "
",
    "In this assignment, you will study the electricity demand from clients in Portugal, during 2013 and 2014. You have been provided with the data file, which you should download when you download this assignment file.
",
    "
",
    "The data$^1$ available contains 370 time series, corresponding to the electric demand$^2$ for 370 clients, between 2011 and 2014. 
",
    "
",
    "In this guided exercise, you will use clustering techniques to understand the typical usage behaviour during 2013-2014.
",
    "
",
    "Both these datasets are publicly available, and can be used to carry out experiments. Their source is below:
",
    "
",
    " 1. Data:
",
    "https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014#
",
    "
",
    " 2. Electric Demand:
",
    "http://www.think-energy.net/KWvsKWH.htm
",
    "
",
    "We will start by exploring the data set and continue on to the assignment.  Consider this as a working notebook, you will add your work to the same notebook.
",
    "
",
    "In this assignment we will use the sklearn package for k-means.  Please refer here for the documentation https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html
",
    "(https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html).
",
    "
",
    "The sklearn package for k-means is one of the many clustering algorithms found in the module "sklearn.cluster".  These come with a variety of functions that you can call by importing the package.
",
    "
",
    "For example 
",
    "    
",
    "    from sklearn.cluster import AgglomerativeClustering
",
    "    from sklearn.cluster import KMeans
",
    "
"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "u0fHlteBnAKL"
   },
   "source": [
    "## Data Preparation
",
    "
",
    "Start by downloading the data to a local directory and modify the "pathToFile" and "fileName" variables, if needed. The data file has been provided with this assignment. It is also available at the links provided above."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "0DJsRL9_nAKM"
   },
   "outputs": [],
   "source": [
    "pathToFile = r""
",
    "#pathToFile = r"C:\\Users\\\\Downloads\\"
",
    "
",
    "fileName = 'LD2011_2014.txt'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "yLxHF5B-nAKP"
   },
   "outputs": [],
   "source": [
    "import numpy as np
",
    "from sklearn.cluster import KMeans
",
    "import matplotlib.pyplot as plt
",
    "import random
",
    "from sklearn.metrics import silhouette_score
",
    "from sklearn.cluster import AgglomerativeClustering
",
    "random.seed(42)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "6c6CmGbYnAKR"
   },
   "outputs": [],
   "source": [
    "# Replace "," by ".", otherwise the numbers will be in the form 2,3445 instead of 2.3445
",
    "import fileinput
",
    "
",
    "with fileinput.FileInput(pathToFile+fileName, inplace=True, backup='.bak') as file:
",
    "    for line in file:
",
    "        print(line.replace(",", "."), end='')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "ACTUfls8nAKU"
   },
   "outputs": [],
   "source": [
    "# Create dataframe
",
    "import pandas as pd
",
    "data = pd.read_csv(pathToFile+fileName, sep=";", index_col=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "CfULOBctnAKW"
   },
   "source": [
    "### Quick data inspection"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "x3OHI8vRnAKX",
    "outputId": "c821694f-6e8f-48ad-ff42-336915cb4da4"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      MT_001
",
       "      MT_002
",
       "      MT_003
",
       "      MT_004
",
       "      MT_005
",
       "      MT_006
",
       "      MT_007
",
       "      MT_008
",
       "      MT_009
",
       "      MT_010
",
       "      ...
",
       "      MT_361
",
       "      MT_362
",
       "      MT_363
",
       "      MT_364
",
       "      MT_365
",
       "      MT_366
",
       "      MT_367
",
       "      MT_368
",
       "      MT_369
",
       "      MT_370
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      2011-01-01 00:15:00
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      ...
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "    
",
       "    
",
       "      2011-01-01 00:30:00
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      ...
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "      0.0
",
       "    
",
       "  
",
       "
",
       "2 rows × 370 columns
",
       ""
      ],
      "text/plain": [
       "                     MT_001  MT_002  MT_003  MT_004  MT_005  MT_006  MT_007  \
",
       "2011-01-01 00:15:00     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
",
       "2011-01-01 00:30:00     0.0     0.0     0.0     0.0     0.0     0.0     0.0   
",
       "
",
       "                     MT_008  MT_009  MT_010  ...  MT_361  MT_362  MT_363  \
",
       "2011-01-01 00:15:00     0.0     0.0     0.0  ...     0.0     0.0     0.0   
",
       "2011-01-01 00:30:00     0.0     0.0     0.0  ...     0.0     0.0     0.0   
",
       "
",
       "                     MT_364  MT_365  MT_366  MT_367  MT_368  MT_369  MT_370  
",
       "2011-01-01 00:15:00     0.0     0.0     0.0     0.0     0.0     0.0     0.0  
",
       "2011-01-01 00:30:00     0.0     0.0     0.0     0.0     0.0     0.0     0.0  
",
       "
",
       "[2 rows x 370 columns]"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "BtQvkYF_nAKd",
    "outputId": "9598f8f3-b53a-49ce-d97f-51d562f582d0"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      MT_001
",
       "      MT_002
",
       "      MT_003
",
       "      MT_004
",
       "      MT_005
",
       "      MT_006
",
       "      MT_007
",
       "      MT_008
",
       "      MT_009
",
       "      MT_010
",
       "      ...
",
       "      MT_361
",
       "      MT_362
",
       "      MT_363
",
       "      MT_364
",
       "      MT_365
",
       "      MT_366
",
       "      MT_367
",
       "      MT_368
",
       "      MT_369
",
       "      MT_370
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      2014-12-31 23:45:00
",
       "      1.269036
",
       "      21.337127
",
       "      1.737619
",
       "      166.666667
",
       "      85.365854
",
       "      285.714286
",
       "      10.17524
",
       "      225.589226
",
       "      64.685315
",
       "      72.043011
",
       "      ...
",
       "      246.252677
",
       "      28000.0
",
       "      1443.037975
",
       "      909.090909
",
       "      26.075619
",
       "      4.095963
",
       "      664.618086
",
       "      146.911519
",
       "      646.627566
",
       "      6540.540541
",
       "    
",
       "    
",
       "      2015-01-01 00:00:00
",
       "      2.538071
",
       "      19.914651
",
       "      1.737619
",
       "      178.861789
",
       "      84.146341
",
       "      279.761905
",
       "      10.17524
",
       "      249.158249
",
       "      62.937063
",
       "      69.892473
",
       "      ...
",
       "      188.436831
",
       "      27800.0
",
       "      1409.282700
",
       "      954.545455
",
       "      27.379400
",
       "      4.095963
",
       "      628.621598
",
       "      131.886477
",
       "      673.020528
",
       "      7135.135135
",
       "    
",
       "  
",
       "
",
       "2 rows × 370 columns
",
       ""
      ],
      "text/plain": [
       "                       MT_001     MT_002    MT_003      MT_004     MT_005  \
",
       "2014-12-31 23:45:00  1.269036  21.337127  1.737619  166.666667  85.365854   
",
       "2015-01-01 00:00:00  2.538071  19.914651  1.737619  178.861789  84.146341   
",
       "
",
       "                         MT_006    MT_007      MT_008     MT_009     MT_010  \
",
       "2014-12-31 23:45:00  285.714286  10.17524  225.589226  64.685315  72.043011   
",
       "2015-01-01 00:00:00  279.761905  10.17524  249.158249  62.937063  69.892473   
",
       "
",
       "                     ...      MT_361   MT_362       MT_363      MT_364  \
",
       "2014-12-31 23:45:00  ...  246.252677  28000.0  1443.037975  909.090909   
",
       "2015-01-01 00:00:00  ...  188.436831  27800.0  1409.282700  954.545455   
",
       "
",
       "                        MT_365    MT_366      MT_367      MT_368      MT_369  \
",
       "2014-12-31 23:45:00  26.075619  4.095963  664.618086  146.911519  646.627566   
",
       "2015-01-01 00:00:00  27.379400  4.095963  628.621598  131.886477  673.020528   
",
       "
",
       "                          MT_370  
",
       "2014-12-31 23:45:00  6540.540541  
",
       "2015-01-01 00:00:00  7135.135135  
",
       "
",
       "[2 rows x 370 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.tail(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "Ydr4AkL9nAKg",
    "outputId": "1b4b50f0-3fc4-4805-8160-d8872a35efae"
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(140256, 370)"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "j4xzOoTunAKj"
   },
   "source": [
    "#### As it can be seen, the dataframe contains a row for each interval of 15 minutes between Jan 1, 2011 to Dec 31 2014. There are 370 columns corresponding 370 clients. The dataframe is indexed by the timestamp.
",
    "
",
    "Since the frequency is 15 minutes, each day provides $24\times 4 = 96$ datapoints, which multiplied by 365 days and 4 years (plus 1 day in Feb 29, 2012) gives: $96 \times 365 \times 4 + 96 = 140256$, as observed in data.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "K1RXKoAwnAKj",
    "outputId": "a7cfcc67-0a4c-4ee9-8cc9-b6ea80a74d3a"
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "
",
      "Index: 140256 entries, 2011-01-01 00:15:00 to 2015-01-01 00:00:00
",
      "Columns: 370 entries, MT_001 to MT_370
",
      "dtypes: float64(370)
",
      "memory usage: 397.0+ MB
"
     ]
    }
   ],
   "source": [
    "data.info()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "LqyOk3X6nAKm",
    "outputId": "eb0ccbc7-4bc8-486f-c852-b9385db69abb"
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "
",
       "
",
       "    .dataframe tbody tr th:only-of-type {
",
       "        vertical-align: middle;
",
       "    }
",
       "
",
       "    .dataframe tbody tr th {
",
       "        vertical-align: top;
",
       "    }
",
       "
",
       "    .dataframe thead th {
",
       "        text-align: right;
",
       "    }
",
       "
",
       "
",
       "  
",
       "    
",
       "      
",
       "      MT_001
",
       "      MT_002
",
       "      MT_003
",
       "      MT_004
",
       "      MT_005
",
       "      MT_006
",
       "      MT_007
",
       "      MT_008
",
       "      MT_009
",
       "      MT_010
",
       "      ...
",
       "      MT_361
",
       "      MT_362
",
       "      MT_363
",
       "      MT_364
",
       "      MT_365
",
       "      MT_366
",
       "      MT_367
",
       "      MT_368
",
       "      MT_369
",
       "      MT_370
",
       "    
",
       "  
",
       "  
",
       "    
",
       "      count
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      ...
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "      140256.000000
",
       "    
",
       "    
",
       "      mean
",
       "      3.970785
",
       "      20.768480
",
       "      2.918308
",
       "      82.184490
",
       "      37.240309
",
       "      141.227385
",
       "      4.521338
",
       "      191.401476
",
       "      39.975354
",
       "      42.205152
",
       "      ...
",
       "      218.213701
",
       "      37607.987537
",
       "      1887.427366
",
       "      2940.031734
",
       "      65.413150
",
       "      9.269709
",
       "      424.262904
",
       "      94.704717
",
       "      625.251734
",
       "      8722.355145
",
       "    
",
       "    
",
       "      std
",
       "      5.983965
",
       "      13.272415
",
       "      11.014456
",
       "      58.248392
",
       "      26.461327
",
       "      98.439984
",
       "      6.485684
",
       "      121.981187
",
       "      29.814595
",
       "      33.401251
",
       "      ...
",
       "      204.833532
",
       "      38691.954832
",
       "      1801.486488
",
       "      2732.251967
",
       "      65.007818
",
       "      10.016782
",
       "      274.337122
",
       "      80.297301
",
       "      380.656042
",
       "      9195.155777
",
       "    
",
       "    
",
       "      min
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      ...
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "    
",
       "    
",
       "      25%
",
       "      0.000000
",
       "      2.844950
",
       "      0.000000
",
       "      36.585366
",
       "      15.853659
",
       "      71.428571
",
       "      0.565291
",
       "      111.111111
",
       "      13.986014
",
       "      9.677419
",
       "      ...
",
       "      5.710207
",
       "      0.000000
",
       "      0.000000
",
       "      0.000000
",
       "      13.037810
",
       "      0.000000
",
       "      0.000000
",
       "      30.050083
",
       "      83.944282
",
       "      0.000000
",
       "    
",
       "    
",
       "      50%
",
       "      1.269036
",
       "      24.893314
",
       "      1.737619
",
       "      87.398374
",
       "      39.024390
",
       "      157.738095
",
       "      2.826456
",
       "      222.222222
",
       "      40.209790
",
       "      40.860215
",
       "      ...
",
       "      131.334761
",
       "      24100.000000
",
       "      1050.632911
",
       "      2136.363636
",
       "      31.290743
",
       "      7.021650
",
       "      525.899912
",
       "      76.794658
",
       "      758.064516
",
       "      0.000000
",
       "    
",
       "    
",
       "      75%
",
       "      2.538071
",
       "      29.871977
",
       "      1.737619
",
       "      115.853659
",
       "      54.878049
",
       "      205.357143
",
       "      4.522329
",
       "      279.461279
",
       "      57.692308
",
       "      61.290323
",
       "      ...
",
       "      403.283369
",
       "      54800.000000
",
       "      3312.236287
",
       "      5363.636364
",
       "      108.213820
",
       "      11.702750
",
       "      627.743635
",
       "      151.919866
",
       "      875.366569
",
       "      17783.783784
",
       "    
",
       "    
",
       "      max
",
       "      48.223350
",
       "      115.220484
",
       "      151.172893
",
       "      321.138211
",
       "      150.000000
",
       "      535.714286
",
       "      44.657999
",
       "      552.188552
",
       "      157.342657
",
       "      198.924731
",
       "      ...
",
       "      852.962170
",
       "      192800.000000
",
       "      7751.054852
",
       "      12386.363636
",
       "      335.071708
",
       "      60.269163
",
       "      1138.718174
",
       "      362.270451
",
       "      1549.120235
",
       "      30918.918919
",
       "    
",
       "  
",
       "
",
       "8 rows × 370 columns
",
       ""
      ],
      "text/plain": [
       "              MT_001         MT_002         MT_003         MT_004  \
",
       "count  140256.000000  140256.000000  140256.000000  140256.000000   
",
       "mean        3.970785      20.768480       2.918308      82.184490   
",
       "std         5.983965      13.272415      11.014456      58.248392   
",
       "min         0.000000       0.000000       0.000000       0.000000   
",
       "25%         0.000000       2.844950       0.000000      36.585366   
",
       "50%         1.269036      24.893314       1.737619      87.398374   
",
       "75%         2.538071      29.871977       1.737619     115.853659   
",
       "max        48.223350     115.220484     151.172893     321.138211   
",
       "
",
       "              MT_005         MT_006         MT_007         MT_008  \
",
       "count  140256.000000  140256.000000  140256.000000  140256.000000   
",
       "mean       37.240309     141.227385       4.521338     191.401476   
",
       "std        26.461327      98.439984       6.485684     121.981187   
",
       "min         0.000000       0.000000       0.000000       0.000000   
",
       "25%        15.853659      71.428571       0.565291     111.111111   
",
       "50%        39.024390     157.738095       2.826456     222.222222   
",
       "75%        54.878049     205.357143       4.522329     279.461279   
",
       "max       150.000000     535.714286      44.657999     552.188552   
",
       "
",
       "              MT_009         MT_010  ...         MT_361         MT_362  \
",
       "count  140256.000000  140256.000000  ...  140256.000000  140256.000000   
",
       "mean       39.975354      42.205152  ...     218.213701   37607.987537   
",
       "std        29.814595      33.401251  ...     204.833532   38691.954832   
",
       "min         0.000000       0.000000  ...       0.000000       0.000000   
",
       "25%        13.986014       9.677419  ...       5.710207       0.000000   
",
       "50%        40.209790      40.860215  ...     131.334761   24100.000000   
",
       "75%        57.692308      61.290323  ...     403.283369   54800.000000   
",
       "max       157.342657     198.924731  ...     852.962170  192800.000000   
",
       "
",
       "              MT_363         MT_364         MT_365         MT_366  \
",
       "count  140256.000000  140256.000000  140256.000000  140256.000000   
",
       "mean     1887.427366    2940.031734      65.413150       9.269709   
",
       "std      1801.486488    2732.251967      65.007818      10.016782   
",
       "min         0.000000       0.000000       0.000000       0.000000   
",
       "25%         0.000000       0.000000      13.037810       0.000000   
",
       "50%      1050.632911    2136.363636      31.290743       7.021650   
",
       "75%      3312.236287    5363.636364     108.213820      11.702750   
",
       "max      7751.054852   12386.363636     335.071708      60.269163   
",
       "
",
       "              MT_367         MT_368         MT_369         MT_370  
",
       "count  140256.000000  140256.000000  140256.000000  140256.000000  
",
       "mean      424.262904      94.704717     625.251734    8722.355145  
",
       "std       274.337122      80.297301     380.656042    9195.155777  
",
       "min         0.000000       0.000000       0.000000       0.000000  
",
       "25%         0.000000      30.050083      83.944282       0.000000  
",
       "50%       525.899912      76.794658     758.064516       0.000000  
",
       "75%       627.743635     151.919866     875.366569   17783.783784  
",
       "max      1138.718174     362.270451    1549.120235   30918.918919  
",
       "
",
       "[8 rows x 370 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "colab_type": "text",
    "id": "bEz7tlUqnAKp"
   },
   "source": [
    "### Plot the first 2 days of 2012 for the first 2 clients"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {
    "colab": {},
    "colab_type": "code",
    "id": "DVIRDmI2nAKp",
    "outputId": "f76887ae-1b6d-40ab-f9a8-63ef83264ae2"
   },
   "outputs": [
    {
     "data": {
      "image/png":

Assignment 2 - Clustering¶ Learning Outcomes¶ In this assignment, you will do the following: · Explore a dataset and carry out clustering using k-means algorithm · Identify the optimum number of...

Solution