The Weka datasets can be found at my personal Website at www.apkc.net.
Author: Andreas P. Koenzen akoenzen@uvic.ca
Version: 0.1
import pandas as pd
import numpy as np
import requests as rq
from scipy.io import arff
from io import StringIO
url_data = rq.get('http://www.apkc.net/data/weka/contactlenses.arff').text
data = arff.loadarff(StringIO(url_data))
df = pd.DataFrame(data[0], index=pd.Index(np.arange(24) + 1), dtype='object')
# Convert all data in the columns to strings instead of binary objects.
string_df = df.select_dtypes([np.object]).stack().str.decode('UTF8').unstack()
for col in string_df:
df[col] = string_df[col]
df
(4 points) Construct the root and the first level of a decision tree for the contact lenses data. Use the ID3 algorithm. Show the details of your construction. Then, check your solution with Weka (the data file is included with Weka).
tearprodrate = reduced: none
tearprodrate = normal
 astigmatism = no
  age = young: soft
  age = prepresbyopic: soft
  age = presbyopic
   spectacleprescrip = myope: none
   spectacleprescrip = hypermetrope: soft
 astigmatism = yes
  spectacleprescrip = myope: hard
  spectacleprescrip = hypermetrope
   age = young: hard
   age = prepresbyopic: none
   age = presbyopic: none
We need to select the ROOT node basing the decision for which attribute/feature to use on Entropy. We must use the attribute with less entropy, and then branch on the possible values. This will result in the final tree being smaller.
Entropy: It is also known as "Measure of Information", and computes the level of homogeneity in a dataset.
Formula: $ E = (p1 log_2 p1)  ((1  p1) log_2 (1  p1)) $ given that p1 + p2 = 1.
When p1 and p2 are closer to each other in value the entropy score is at its biggest and viceversa. The lowest entropy is attained when the values of p1 and p2 are farther from each other. e.g. p1=0.1, p2=0.9.
Entropy is represented in this notebook as $\Delta_{\text{info}}$, meaning lower values of $\Delta_{\text{info}}$ means the dataset is homogenous and higher values is heterogenous. The weighted entropy is represented in this notebook as $I(\text{Attribute})$. The weighted entropy is the entropy times the probability $Pr(X=value)$ of a particular attribute. e.g. if we have three possible values for an attribute and the occurrence of each is 8 classes over a total of 24, then the $Pr(X=value) = \frac{8}{24} = \frac{1}{3}$.
age
 young: none, none, none, none, soft, soft, hard, hard
 prepresbyopic: none, none, none, none, none, soft, soft, hard
 presbyopic: none, none, none, none, none, none, soft, hard
$\Delta_{\text{info}} (\text{age=young[4, 2, 2]}) = (\frac{4}{8} log_2 \frac{4}{8})  (\frac{2}{8} log_2 \frac{2}{8})  (\frac{2}{8} log_2 \frac{2}{8}) = 1.5$
$\Delta_{\text{info}} (\text{age=prepresbyopic[5, 2, 1]}) = (\frac{5}{8} log_2 \frac{5}{8})  (\frac{2}{8} log_2 \frac{2}{8})  (\frac{1}{8} log_2 \frac{1}{8}) = 1.3$
$\Delta_{\text{info}} (\text{age=presbyopic[6, 1, 1]}) = (\frac{6}{8} log_2 \frac{6}{8})  (\frac{1}{8} log_2 \frac{1}{8})  (\frac{1}{8} log_2 \frac{1}{8}) = 1.07$
Weighted Entropy
$I(\text{age}) = 1.5 \times \frac{8}{24} + 1.3 \times \frac{8}{24} + 1.07 \times \frac{8}{24} = 1.28$
spectacleprescrip
 myope: none, none, none, none, none, none, none, soft, soft, hard, hard, hard
 hypermetrope: none, none, none, none, none, none, none, none, soft, soft, soft, hard
$\Delta_{\text{info}} (\text{spectacleprescript=myope[7, 2, 3]}) = (\frac{7}{12} log_2 \frac{7}{12})  (\frac{2}{12} log_2 \frac{2}{12})  (\frac{3}{12} log_2 \frac{3}{12}) = 1.38$
$\Delta_{\text{info}} (\text{spectacleprescript=hypermetrope[8, 2, 2]}) = (\frac{8}{12} log_2 \frac{8}{12})  (\frac{2}{12} log_2 \frac{2}{12})  (\frac{2}{12} log_2 \frac{2}{12}) = 1.25$
Weighted Entropy
$I(\text{spectacleprescript}) = 1.38 \times \frac{12}{24} + 1.25 \times \frac{12}{24} = 1.44$
astigmatism
 yes: none, none, none, none, none, none, none, none, hard, hard, hard, hard
 no: none, none, none, none, none, none, none, soft, soft, soft, soft, soft
$\Delta_{\text{info}} (\text{astigmatism=yes[8, 4]}) = (\frac{8}{12} log_2 \frac{8}{12})  (\frac{4}{12} log_2 \frac{4}{12}) = 0.91$
$\Delta_{\text{info}} (\text{astigmatism=no[7, 5]}) = (\frac{7}{12} log_2 \frac{7}{12})  (\frac{5}{12} log_2 \frac{5}{12}) = 0.97$
Weighted Entropy
$I(\text{astigmatism}) = 0.91 \times \frac{12}{24} + 0.97 \times \frac{12}{24} = 0.95$
tearprodrate
 reduced: none, none, none, none, none, none, none, none, none, none, none, none
 normal: none, none, none, soft, soft, soft, soft, soft, hard, hard, hard, hard
$\Delta_{\text{info}} (\text{tearprodrate=reduced[12]}) = (\frac{12}{12} log_2 \frac{12}{12}) = 0$
$\Delta_{\text{info}} (\text{tearprodrate=normal[3, 5, 4]}) = (\frac{3}{12} log_2 \frac{3}{12})  (\frac{5}{12} log_2 \frac{5}{12})  (\frac{4}{12} log_2 \frac{4}{12}) = 1.55$
Weighted Entropy
$I(\text{tearprodrate}) = 0 + 1.55 \times \frac{12}{24} = 0.78$
The column with the lowest entropy value is column tearprodrate, which has a value of 0.78. This means that this column will be the ROOT node of our decision tree.
So the tree is the following:
tearprodrate = reduced: none
tearprodrate = normal
 ???
We continue to compute the tree GIVEN that tearprodrate=normal has occur.
tearprodrate = normal
 age
  young: soft, soft, hard, hard
  prepresbyopic: none, soft, soft, hard
  presbyopic: none, none, soft, hard
$\Delta_{\text{info}} (\text{age=young[2, 2]  tearprodrate=normal}) = (\frac{2}{4} log_2 \frac{2}{4})  (\frac{2}{4} log_2 \frac{2}{4}) = 1.00$
$\Delta_{\text{info}} (\text{age=prepresbyopic[1, 2, 1]  tearprodrate=normal}) = (\frac{1}{4} log_2 \frac{1}{4})  (\frac{2}{4} log_2 \frac{2}{4})  (\frac{1}{4} log_2 \frac{1}{4}) = 1.5$
$\Delta_{\text{info}} (\text{age=presbyopic[2, 1, 1]  tearprodrate=normal}) = (\frac{2}{4} log_2 \frac{2}{4})  (\frac{1}{4} log_2 \frac{1}{4})  (\frac{1}{4} log_2 \frac{1}{4}) = 1.5$
Weighted Entropy
$I(\text{age  tearprodrate=normal}) = 1.00 \times \frac{4}{12} + 1.5 \times \frac{4}{12} + 1.5 \times \frac{4}{12} = 1.33$
tearprodrate = normal
 spectacleprescrip
  myope: none, soft, soft, hard, hard, hard
  hypermetrope: none, none, soft, soft, soft, hard
$\Delta_{\text{info}} (\text{spectacleprescrip=myope[1, 2, 3]  tearprodrate=normal}) = (\frac{1}{6} log_2 \frac{1}{6})  (\frac{2}{6} log_2 \frac{2}{6})  (\frac{3}{6} log_2 \frac{3}{6}) = 1.46$
$\Delta_{\text{info}} (\text{spectacleprescrip=hypermetrope[2, 3, 1]  tearprodrate=normal}) = (\frac{2}{6} log_2 \frac{2}{6})  (\frac{3}{6} log_2 \frac{3}{6})  (\frac{1}{6} log_2 \frac{1}{6}) = 1.46$
Weighted Entropy
$I(\text{spectacleprescrip  tearprodrate=normal}) = 1.46 \times \frac{6}{12} + 1.46 \times \frac{6}{12} = 1.46$
tearprodrate = normal
 astigmatism
  yes: none, none, hard, hard, hard, hard
  no: none, soft, soft, soft, soft, soft
$\Delta_{\text{info}} (\text{astigmatism=yes[2, 4]  tearprodrate=normal}) = (\frac{2}{6} log_2 \frac{2}{6})  (\frac{4}{6} log_2 \frac{4}{6}) = 0.92$
$\Delta_{\text{info}} (\text{astigmatism=no[1, 5]  tearprodrate=normal}) = (\frac{1}{6} log_2 \frac{1}{6})  (\frac{5}{6} log_2 \frac{5}{6}) = 0.65$
Weighted Entropy
$I(\text{astigmatism  tearprodrate=normal}) = 0.92 \times \frac{6}{12} + 0.65 \times \frac{6}{12} = 0.79$
After computing the ROOT and first level of the tree we have the following partial tree.

 <NODE> 
 tearprodrate  (ROOT)


reduced  normal

 
 
 <LEAF>   <NODE> 
 NONE  astigmatism  (FIRST LEVEL)
 

yes  no

 
 
 ???   ??? 
 