Member-only story

Exploring the Naive Bayes Classifier Algorithm with Iris Dataset in Python

7 min readMar 24, 2023

In the field of machine learning, Naive Bayes classifier is a popular algorithm used for classification tasks such as text classification, spam filtering, and sentiment analysis. It is a probabilistic algorithm that uses Bayes’ theorem to predict the likelihood of a sample belonging to a certain class.

In this article, we will explore the Naive Bayes classifier algorithm and its implementation using Python’s scikit-learn library. Specifically, we will use the famous Iris dataset to train our model and make predictions. By the end of this tutorial, readers will have a better understanding of how Naive Bayes classifier works and how to apply it to real-world problems.

Get an email whenever Dr. Soumen Atta, Ph.D. publishes.

Get an email whenever Dr. Soumen Atta, Ph.D. publishes. By signing up, you will create a Medium account if you don't…

soumenatta.medium.com

Step 1: Load the Data

First, we need to load the Iris dataset. The Iris dataset contains 150 samples of Iris flowers, each with 4 features: sepal length, sepal width, petal length, and petal width. There are 3 classes: Iris Setosa, Iris Versicolor, and Iris Virginica. We’ll use the load_iris() function from scikit-learn to load the dataset.

from sklearn.datasets import load_iris
iris = load_iris()
X = iris.data
y = iris.target

In the above code snippet, load_iris() function from scikit-learn's datasets module is used to load the Iris dataset. The Iris dataset is a commonly used dataset for classification tasks in machine learning.

iris.data contains the features or independent variables of the dataset. The dataset has 4 features: sepal length, sepal width, petal length, and petal width. These features are represented as a NumPy array with shape (150, 4).

iris.target contains the target or dependent variable of the dataset. In this case, it represents the class labels for each sample in the dataset. There are 3 classes in the Iris dataset: Iris Setosa, Iris Versicolor, and Iris Virginica. The target variable is represented as a NumPy array with shape (150,).

Exploring the Naive Bayes Classifier Algorithm with Iris Dataset in Python

Get an email whenever Dr. Soumen Atta, Ph.D. publishes.

Get an email whenever Dr. Soumen Atta, Ph.D. publishes. By signing up, you will create a Medium account if you don't…

Step 1: Load the Data

Written by Dr. Soumen Atta, Ph.D.

No responses yet