CSC 578D / Data Mining / Fall 2018 / University of Victoria

Python Notebook explaining Assignment 01 / Problem 01

The dataset for the Assignment #1 is the following:

The Weka datasets can be found at my personal Website at www.apkc.net.

Author: Andreas P. Koenzen akoenzen@uvic.ca
Version: 0.1

In [3]:
import pandas as pd
import numpy as np
import requests as rq

from scipy.io import arff
from io import StringIO
In [4]:
url_data = rq.get('http://www.apkc.net/data/weka/contact-lenses.arff').text
data = arff.loadarff(StringIO(url_data))
df = pd.DataFrame(data[0], index=pd.Index(np.arange(24) + 1), dtype='object')

# Convert all data in the columns to strings instead of binary objects.
string_df = df.select_dtypes([np.object]).stack().str.decode('UTF-8').unstack()
for col in string_df:
    df[col] = string_df[col]
df
Out[4]:
age spectacle-prescrip astigmatism tear-prod-rate contact-lenses
1 young myope no reduced none
2 young myope no normal soft
3 young myope yes reduced none
4 young myope yes normal hard
5 young hypermetrope no reduced none
6 young hypermetrope no normal soft
7 young hypermetrope yes reduced none
8 young hypermetrope yes normal hard
9 pre-presbyopic myope no reduced none
10 pre-presbyopic myope no normal soft
11 pre-presbyopic myope yes reduced none
12 pre-presbyopic myope yes normal hard
13 pre-presbyopic hypermetrope no reduced none
14 pre-presbyopic hypermetrope no normal soft
15 pre-presbyopic hypermetrope yes reduced none
16 pre-presbyopic hypermetrope yes normal none
17 presbyopic myope no reduced none
18 presbyopic myope no normal none
19 presbyopic myope yes reduced none
20 presbyopic myope yes normal hard
21 presbyopic hypermetrope no reduced none
22 presbyopic hypermetrope no normal soft
23 presbyopic hypermetrope yes reduced none
24 presbyopic hypermetrope yes normal none

Solution to Problem #1 of Assignment #1:

The problem #1 states the following:

(4 points) Construct the root and the first level of a decision tree for the contact lenses data. Use the ID3 algorithm. Show the details of your construction. Then, check your solution with Weka (the data file is included with Weka).

The full tree for this problem is the following:

tear-prod-rate = reduced: none
tear-prod-rate = normal
|  astigmatism = no
|  |  age = young: soft
|  |  age = pre-presbyopic: soft
|  |  age = presbyopic
|  |  |  spectacle-prescrip = myope: none
|  |  |  spectacle-prescrip = hypermetrope: soft
|  astigmatism = yes
|  |  spectacle-prescrip = myope: hard
|  |  spectacle-prescrip = hypermetrope
|  |  |  age = young: hard
|  |  |  age = pre-presbyopic: none
|  |  |  age = presbyopic: none

Notes:

  • 3 significant digits are used for all results.
  • results are rounded up if 4th significant digit is >= 5.

Step #1:

We need to select the ROOT node basing the decision for which attribute/feature to use on Entropy. We must use the attribute with less entropy, and then branch on the possible values. This will result in the final tree being smaller.

Entropy: It is also known as "Measure of Information", and computes the level of homogeneity in a dataset.

Formula: $ E = -(p1 log_2 p1) - ((1 - p1) log_2 (1 - p1)) $ given that p1 + p2 = 1.

When p1 and p2 are closer to each other in value the entropy score is at its biggest and viceversa. The lowest entropy is attained when the values of p1 and p2 are farther from each other. e.g. p1=0.1, p2=0.9.

Entropy is represented in this notebook as $\Delta_{\text{info}}$, meaning lower values of $\Delta_{\text{info}}$ means the dataset is homogenous and higher values is heterogenous. The weighted entropy is represented in this notebook as $I(\text{Attribute})$. The weighted entropy is the entropy times the probability $Pr(X=value)$ of a particular attribute. e.g. if we have three possible values for an attribute and the occurrence of each is 8 classes over a total of 24, then the $Pr(X=value) = \frac{8}{24} = \frac{1}{3}$.

Entropy for column "age":

age
|  young: none, none, none, none, soft, soft, hard, hard
|  pre-presbyopic: none, none, none, none, none, soft, soft, hard
|  presbyopic: none, none, none, none, none, none, soft, hard

$\Delta_{\text{info}} (\text{age=young[4, 2, 2]}) = -(\frac{4}{8} log_2 \frac{4}{8}) - (\frac{2}{8} log_2 \frac{2}{8}) - (\frac{2}{8} log_2 \frac{2}{8}) = 1.5$

$\Delta_{\text{info}} (\text{age=pre-presbyopic[5, 2, 1]}) = -(\frac{5}{8} log_2 \frac{5}{8}) - (\frac{2}{8} log_2 \frac{2}{8}) - (\frac{1}{8} log_2 \frac{1}{8}) = 1.3$

$\Delta_{\text{info}} (\text{age=presbyopic[6, 1, 1]}) = -(\frac{6}{8} log_2 \frac{6}{8}) - (\frac{1}{8} log_2 \frac{1}{8}) - (\frac{1}{8} log_2 \frac{1}{8}) = 1.07$

Weighted Entropy
$I(\text{age}) = 1.5 \times \frac{8}{24} + 1.3 \times \frac{8}{24} + 1.07 \times \frac{8}{24} = 1.28$

Entropy for column "spectacle-prescrip":

spectacle-prescrip
|  myope: none, none, none, none, none, none, none, soft, soft, hard, hard, hard
|  hypermetrope: none, none, none, none, none, none, none, none, soft, soft, soft, hard

$\Delta_{\text{info}} (\text{spectacle-prescript=myope[7, 2, 3]}) = -(\frac{7}{12} log_2 \frac{7}{12}) - (\frac{2}{12} log_2 \frac{2}{12}) - (\frac{3}{12} log_2 \frac{3}{12}) = 1.38$

$\Delta_{\text{info}} (\text{spectacle-prescript=hypermetrope[8, 2, 2]}) = -(\frac{8}{12} log_2 \frac{8}{12}) - (\frac{2}{12} log_2 \frac{2}{12}) - (\frac{2}{12} log_2 \frac{2}{12}) = 1.25$

Weighted Entropy
$I(\text{spectacle-prescript}) = 1.38 \times \frac{12}{24} + 1.25 \times \frac{12}{24} = 1.44$

Entropy for column "astigmatism":

astigmatism
|  yes: none, none, none, none, none, none, none, none, hard, hard, hard, hard
|  no: none, none, none, none, none, none, none, soft, soft, soft, soft, soft

$\Delta_{\text{info}} (\text{astigmatism=yes[8, 4]}) = -(\frac{8}{12} log_2 \frac{8}{12}) - (\frac{4}{12} log_2 \frac{4}{12}) = 0.91$

$\Delta_{\text{info}} (\text{astigmatism=no[7, 5]}) = -(\frac{7}{12} log_2 \frac{7}{12}) - (\frac{5}{12} log_2 \frac{5}{12}) = 0.97$

Weighted Entropy
$I(\text{astigmatism}) = 0.91 \times \frac{12}{24} + 0.97 \times \frac{12}{24} = 0.95$

Entropy for column "tear-prod-rate":

tear-prod-rate
|  reduced: none, none, none, none, none, none, none, none, none, none, none, none
|  normal: none, none, none, soft, soft, soft, soft, soft, hard, hard, hard, hard

$\Delta_{\text{info}} (\text{tear-prod-rate=reduced[12]}) = -(\frac{12}{12} log_2 \frac{12}{12}) = 0$

$\Delta_{\text{info}} (\text{tear-prod-rate=normal[3, 5, 4]}) = -(\frac{3}{12} log_2 \frac{3}{12}) - (\frac{5}{12} log_2 \frac{5}{12}) - (\frac{4}{12} log_2 \frac{4}{12}) = 1.55$

Weighted Entropy
$I(\text{tear-prod-rate}) = 0 + 1.55 \times \frac{12}{24} = 0.78$

Result of Step #1:

The column with the lowest entropy value is column tear-prod-rate, which has a value of 0.78. This means that this column will be the ROOT node of our decision tree.

So the tree is the following:

tear-prod-rate = reduced: none
tear-prod-rate = normal
|  ???

Step #2:

We continue to compute the tree GIVEN that tear-prod-rate=normal has occur.

Entropy for column "age | tear-prod-rate=normal"

tear-prod-rate = normal
|  age
|  |  young: soft, soft, hard, hard
|  |  pre-presbyopic: none, soft, soft, hard
|  |  presbyopic: none, none, soft, hard

$\Delta_{\text{info}} (\text{age=young[2, 2] | tear-prod-rate=normal}) = -(\frac{2}{4} log_2 \frac{2}{4}) - (\frac{2}{4} log_2 \frac{2}{4}) = 1.00$

$\Delta_{\text{info}} (\text{age=pre-presbyopic[1, 2, 1] | tear-prod-rate=normal}) = -(\frac{1}{4} log_2 \frac{1}{4}) - (\frac{2}{4} log_2 \frac{2}{4}) - (\frac{1}{4} log_2 \frac{1}{4}) = 1.5$

$\Delta_{\text{info}} (\text{age=presbyopic[2, 1, 1] | tear-prod-rate=normal}) = -(\frac{2}{4} log_2 \frac{2}{4}) - (\frac{1}{4} log_2 \frac{1}{4}) - (\frac{1}{4} log_2 \frac{1}{4}) = 1.5$

Weighted Entropy
$I(\text{age | tear-prod-rate=normal}) = 1.00 \times \frac{4}{12} + 1.5 \times \frac{4}{12} + 1.5 \times \frac{4}{12} = 1.33$

Entropy for column "spectacle-prescrip | tear-prod-rate=normal":

tear-prod-rate = normal
|  spectacle-prescrip
|  |  myope: none, soft, soft, hard, hard, hard
|  |  hypermetrope: none, none, soft, soft, soft, hard

$\Delta_{\text{info}} (\text{spectacle-prescrip=myope[1, 2, 3] | tear-prod-rate=normal}) = -(\frac{1}{6} log_2 \frac{1}{6}) - (\frac{2}{6} log_2 \frac{2}{6}) - (\frac{3}{6} log_2 \frac{3}{6}) = 1.46$

$\Delta_{\text{info}} (\text{spectacle-prescrip=hypermetrope[2, 3, 1] | tear-prod-rate=normal}) = -(\frac{2}{6} log_2 \frac{2}{6}) - (\frac{3}{6} log_2 \frac{3}{6}) - (\frac{1}{6} log_2 \frac{1}{6}) = 1.46$

Weighted Entropy
$I(\text{spectacle-prescrip | tear-prod-rate=normal}) = 1.46 \times \frac{6}{12} + 1.46 \times \frac{6}{12} = 1.46$

Entropy for column "astigmatism | tear-prod-rate=normal":

tear-prod-rate = normal
|  astigmatism
|  |  yes: none, none, hard, hard, hard, hard
|  |  no: none, soft, soft, soft, soft, soft

$\Delta_{\text{info}} (\text{astigmatism=yes[2, 4] | tear-prod-rate=normal}) = -(\frac{2}{6} log_2 \frac{2}{6}) - (\frac{4}{6} log_2 \frac{4}{6}) = 0.92$

$\Delta_{\text{info}} (\text{astigmatism=no[1, 5] | tear-prod-rate=normal}) = -(\frac{1}{6} log_2 \frac{1}{6}) - (\frac{5}{6} log_2 \frac{5}{6}) = 0.65$

Weighted Entropy
$I(\text{astigmatism | tear-prod-rate=normal}) = 0.92 \times \frac{6}{12} + 0.65 \times \frac{6}{12} = 0.79$


Final solution:

After computing the ROOT and first level of the tree we have the following partial tree.

         ----------------
        |     <NODE>     |
        | tear-prod-rate | (ROOT)
         ----------------
                |
        reduced | normal 
         -------|----------
        |                  |
     --------         ------------
    | <LEAF> |       |   <NODE>   |
    |  NONE  |       |astigmatism | (FIRST LEVEL)
     --------         ------------
                           |
                       yes | no
                   --------|-------
                  |                |
                -----             -----
               | ??? |           | ??? |
                -----             -----

END