Member-only story

Simple and multiple linear regression analysis for rainwater quality checking

Dr. Soumen Atta, Ph.D.
8 min readMar 7, 2023

--

Photo by michael podger on Unsplash

In this tutorial, we will provide a step-by-step guide on how to perform Simple Linear Regression (SLR) and Multiple Linear Regression (MLR) for rainwater quality analysis using Python.

Introduction

Rainwater is an important natural resource, and its quality can have significant impacts on human health and the environment. In order to analyze the quality of rainwater, it is often useful to use statistical models to understand the relationship between different variables. Simple linear regression (SLR) and multiple linear regression (MLR) are two commonly used techniques for this purpose.

In this tutorial, we will provide a step-by-step guide on how to perform SLR and MLR for rainwater quality analysis using Python.

Dataset

Here, we will use an artificial dataset. We will create this dataset for this tutorial. Note that this dataset is randomly created. The Python code to generate such a dataset is given below:

import pandas as pd
import random

# create an example dataset with 250 entries
data = {
'pH': [random.uniform(6, 8) for i in range(250)],
'Conductivity': [random.randint(100, 1000) for i in range(250)],
'Temperature': [random.randint(20, 30) for i in range(250)],
'TDS': [random.randint(100, 200) for i in range(250)]
}

# create a pandas DataFrame from the dictionary
df = pd.DataFrame(data)

The program creates an example dataset with 250 entries using the Python random module and the Pandas library. The dataset has four columns: pH, conductivity, temperature, and TDS. Each column has 250 random values generated using different methods:

  • The pH column has random values generated using the uniform function from the random module, which generates random floating-point numbers between 6 and 8 (inclusive).
  • The Conductivity column has random values generated using the randint function from the random module, which generates random integers between 100 and 1000 (inclusive).
  • The Temperature column has random values generated using the randint function from the random module, which generates random integers between 20 and 30 (inclusive).

--

--

Dr. Soumen Atta, Ph.D.
Dr. Soumen Atta, Ph.D.

Written by Dr. Soumen Atta, Ph.D.

I am a Postdoctoral Researcher at the Faculty of IT, University of Jyväskylä, Finland. You can find more about me on my homepage: https://www.soumenatta.com/

No responses yet

Write a response