How to Encode Categorical Variables with Target Encoding in Python for Machine Learning

Dr. Soumen Atta, Ph.D.
3 min readMay 2, 2023

Categorical variables are commonly found in datasets and are used to represent data in categories. These variables need to be encoded into numerical values for machine learning algorithms to process them. There are various encoding techniques that can be used to encode categorical variables, and one of the popular techniques is Target Encoding. Target Encoding involves encoding categorical variables based on the mean value of the target variable for each category. In this tutorial, we will learn how to encode categorical variables with Target Encoding in Python for machine learning.

Prerequisites

Before we start, make sure you have the following Python packages installed:

  • pandas
  • scikit-learn
  • category_encoders

You can install these packages using pip install command.

Dataset

For this tutorial, we will use the Titanic dataset which is available on Kaggle. This dataset contains information about passengers on the Titanic and whether they survived or not. We will use the “Survived” column as the target variable, and the “Pclass” and “Sex” columns as the categorical variables to encode. Here’s the link to the…

--

--

Dr. Soumen Atta, Ph.D.
Dr. Soumen Atta, Ph.D.

Written by Dr. Soumen Atta, Ph.D.

I am a Postdoctoral Researcher at the Faculty of IT, University of Jyväskylä, Finland. You can find more about me on my homepage: https://www.soumenatta.com/

No responses yet