CSC 578D / Data Mining / Fall 2018 / University of Victoria

Python Notebook explaining Assignment 01 / Problem 03

The dataset for the Assignment #1 is the following:

The Weka datasets can be found at my personal Website at

Author: Andreas P. Koenzen

Version: 0.1

In [1]:
import pandas as pd
import numpy as np
import requests as rq

from import arff
from io import StringIO
In [2]:
url_data = rq.get('').text
data = arff.loadarff(StringIO(url_data))
df = pd.DataFrame(data[0], index=pd.Index(np.arange(24) + 1), dtype='object')

# Convert all data in the columns to strings instead of binary objects.
string_df = df.select_dtypes([np.object]).stack().str.decode('UTF-8').unstack()
for col in string_df:
    df[col] = string_df[col]
age spectacle-prescrip astigmatism tear-prod-rate contact-lenses
1 young myope no reduced none
2 young myope no normal soft
3 young myope yes reduced none
4 young myope yes normal hard
5 young hypermetrope no reduced none
6 young hypermetrope no normal soft
7 young hypermetrope yes reduced none
8 young hypermetrope yes normal hard
9 pre-presbyopic myope no reduced none
10 pre-presbyopic myope no normal soft
11 pre-presbyopic myope yes reduced none
12 pre-presbyopic myope yes normal hard
13 pre-presbyopic hypermetrope no reduced none
14 pre-presbyopic hypermetrope no normal soft
15 pre-presbyopic hypermetrope yes reduced none
16 pre-presbyopic hypermetrope yes normal none
17 presbyopic myope no reduced none
18 presbyopic myope no normal none
19 presbyopic myope yes reduced none
20 presbyopic myope yes normal hard
21 presbyopic hypermetrope no reduced none
22 presbyopic hypermetrope no normal soft
23 presbyopic hypermetrope yes reduced none
24 presbyopic hypermetrope yes normal none

Solution to Problem #3 of Assignment #1:

The problem #3 states the following:

(4 points) Classify using Naïve Bayes method (on contact lenses data) the data item: pre-presbyopic, hypermetrope, yes, reduced, ? Then, check your solution with Weka (the data file is included with Weka).

The model computed by Weka for this problem is the following:

Attribute              soft   hard   none
                     (0.22) (0.19) (0.59)
  young                 3.0    3.0    5.0
  pre-presbyopic        3.0    2.0    6.0
  presbyopic            2.0    2.0    7.0
  [total]               8.0    7.0   18.0

  myope                 3.0    4.0    8.0
  hypermetrope          4.0    2.0    9.0
  [total]               7.0    6.0   17.0

  no                    6.0    1.0    8.0
  yes                   1.0    5.0    9.0
  [total]               7.0    6.0   17.0

  reduced               1.0    1.0   13.0
  normal                6.0    5.0    4.0
  [total]               7.0    6.0   17.0


  • 3 significant digits are used for all results.
  • results are rounded up if 4th significant digit is >= 5.
  • Laplace normalization should be applied to avoid zero-frequency problems.

Bayes' Rule:

$$P(C \mid f) = \frac{P(f \mid C)P(C)}{P(f)}$$

C = Class to predict; f = Feature to use

The formula above works for only one feature/attribute. We need a formula to allow multiple attributes. So in Naïve Bayes we applied the intersection of many features given a certain class, over the evidence (or normalization factor), which is negligible, so we replace it with $\alpha = \frac{1}{E}$ and store it for later, when we need to compute the probability of each class given the features.

Naïve Bayes:

Notation: The coma (,) in Bayes' rule can be used as the AND operator, or the intersection of two or more events. i.e. $P(A,B) = P(B,A) = P(A \mid B)P(B) = P(B \mid A)P(A)$

$$P(C_{k} \mid f_{1},...,f_{n}) \propto P(f_{1},...,f_{n},C_{k})$$

Now the numerator $P(f_{1},...,f_{n},C_{k})$ (Likelihood) can be expanded using the chain rule into:

$$P(f_{1},...,f_{n},C_{k}) = P(f_{i} \mid f_{i+1},...,f_n,C_{k})$$

$$P(f_{1},...,f_{n},C_{k}) = P(f_{1} \mid f_{n+1},...,f_{n},C_{k}) ... P(f_{n-1} \mid f_{n},C_{k}) P(f_{n} \mid C_{k}) P(C_{k})$$

Now if we consider the independence of events, as in: $P(A,B) = P(A)P(B)$, we have that:

$$P(f_{i} \mid f_{i+1},...,f_{n},C_{k}) = P(f_{i} \mid C_{k})$$


$$P(C_{k} \mid f_{1},...,f_{n}) = \frac{1}{E} \times P(C_{k}) \prod_{i=1}^{n} P(f_{i} \mid C_{k}) = P(C_{k}) \prod_{i=1}^{n} P(f_{i} \mid C_{k}) \times \alpha$$

Where $E$ is the normalising factor computed using the Law of Total Probability:

$$E = \sum_{k}^{} P(\textbf{f} \mid C_{k}) P(C_{k})$$

Naïve Bayes Classifier:

The discussion so far has derived the independent feature model, that is, the naive Bayes probability model. The naive Bayes classifier combines this model with a decision rule. One common rule is to pick the hypothesis that is most probable; this is known as the maximum a posteriori or MAP decision rule. The corresponding classifier, a Bayes classifier, is the function that assigns a class label $\hat{y} = C_k$ for some k as follows:

$$\hat{y} = {\underset{k \in \{1, \dots ,K\}}{\operatorname{argmax}} P(C_k) \prod_{i=1}^{n} P(x_i \mid C_k). \quad\quad (1)}$$


$P(\text{contact-lenses=none} \mid \text{E}) = P(\text{age=pre-presbyopic} \mid \text{contact-lenses=none}) \times P(\text{spectacle-prescrip=hypermetrope} \mid \text{contact-lenses=none}) \times P(\text{astigmatism=yes} \mid \text{contact-lenses=none}) \times P(\text{tear-prod-rate=reduced} \mid \text{contact-lenses=none}) \times P(\text{contact-lenses=none}) \times \alpha$

$P(\text{contact-lenses=none} \mid \text{E}) = \frac{5+1}{15+3} \times \frac{8+1}{15+2} \times \frac{8+1}{15+2} \times \frac{12+1}{15+2} \times \frac{15+1}{24+3} \times \alpha = 0.04\alpha$

$P(\text{contact-lenses=soft} \mid \text{E}) = P(\text{age=pre-presbyopic} \mid \text{contact-lenses=soft}) \times P(\text{spectacle-prescrip=hypermetrope} \mid \text{contact-lenses=soft}) \times P(\text{astigmatism=yes} \mid \text{contact-lenses=soft}) \times P(\text{tear-prod-rate=reduced} \mid \text{contact-lenses=soft}) \times P(\text{contact-lenses=soft}) \times \alpha$

$P(\text{contact-lenses=soft} \mid \text{E}) = \frac{2+1}{5+3} \times \frac{3+1}{5+2} \times \frac{1}{5+2} \times \frac{1}{5+2} \times \frac{5+1}{24+3} \times \alpha = 0.001\alpha$

$P(\text{contact-lenses=hard} \mid \text{E}) = P(\text{age=pre-presbyopic} \mid \text{contact-lenses=hard}) \times P(\text{spectacle-prescrip=hypermetrope} \mid \text{contact-lenses=hard}) \times P(\text{astigmatism=yes} \mid \text{contact-lenses=hard}) \times P(\text{tear-prod-rate=reduced} \mid \text{contact-lenses=hard}) \times P(\text{contact-lenses=hard}) \times \alpha$

$P(\text{contact-lenses=hard} \mid \text{E}) = \frac{1+1}{4+3} \times \frac{1+1}{4+2} \times \frac{4+1}{4+2} \times \frac{1}{4+2} \times \frac{4+1}{24+3} \times \alpha = 0.002\alpha$

Now if $\alpha = \frac{1}{P(E)}$ then:

$\frac{(0.001 + 0.002 + 0.04)}{P(E)} = 1.0 \implies P(E) = (0.001 + 0.002 + 0.04) = 0.043$

Now we calculate each individual probability and pick the greatest probability according to (1):

$P(\text{contact-lenses=none} \mid \text{E}) = \frac{0.04}{0.043} = 93\%$

$P(\text{contact-lenses=soft} \mid \text{E}) = \frac{0.001}{0.043} = 2.3\%$

$P(\text{contact-lenses=hard} \mid \text{E}) = \frac{0.002}{0.043} = 4.7\%$

Final solution:

Weka classifies the entry pre-presbyopic,hypermetrope,yes,reduced as being of class none, with a probability of 92.5% (0.925).

According to this computation, the instance pre-presbyopic,hypermetrope,yes,reduced would be classified as belonging to class none with a probability of ~93%.