A Comprehensive Guide to Categorical to Numerical Encoding Techniques

Dr. Soumen Atta, Ph.D.
3 min readMar 18, 2024
A Comprehensive Guide to Categorical to Numerical Encoding Techniques

Categorical variables are commonplace in many datasets, representing characteristics that don’t have inherent numerical values. However, most machine learning algorithms require numerical input. Therefore, it’s crucial to convert categorical variables into numerical representations.

In this tutorial, we’ll explore various techniques for performing this conversion, along with their advantages, disadvantages, and use cases.

Label Encoding

  • Description: Assigns a unique numerical value to each category.
  • Use Case: Suitable for categorical variables with inherent ordinal relationships.
  • Example: Converting categorical labels like “Low,” “Medium,” and “High” to 1, 2, and 3, respectively.

One-Hot Encoding

  • Description: Creates binary columns for each category, with only one column active (1) per observation.
  • Use Case: Suitable for nominal data where categories have no inherent order.
  • Example: Converting categories like “Red,” “Green,” and “Blue” into binary columns (e.g., [1, 0, 0], [0, 1, 0], [0, 0, 1]).

Dummy Coding

  • Description: Similar to one-hot…

--

--

Dr. Soumen Atta, Ph.D.

Assistant Professor, Center for Information Technologies and Applied Mathematics, School of Engineering and Management, University of Nova Gorica, Slovenia