Linear regression is a supervised learning algorithm that is commonly used for predictive analysis. It's a simple and powerful algorithm that helps in making predictions by finding the relationship between the independent and dependent variables. In simple words, linear regression helps in predicting the value of the dependent variable by using the independent variables.
Linear regression is a statistical approach that is used to model the relationship between two variables by fitting a linear equation to the observed data. The equation is represented as:
y = mx + b
Here, y is the dependent variable, x is the independent variable, m is the slope, and b is the intercept.
The primary objective of linear regression is to minimize the distance between the predicted values and the actual values. This distance is called the error, and it is calculated as the difference between the predicted value and the actual value.
There are two types of linear regression:
Simple linear regression is a regression model that is used to predict a single dependent variable based on a single independent variable. In simple linear regression, we try to fit a straight line that best represents the relationship between the two variables. The equation for simple linear regression is:
y = mx + b
Multiple linear regression is a regression model that is used to predict a single dependent variable based on multiple independent variables. In multiple linear regression, we try to fit a linear equation that best represents the relationship between the dependent variable and the independent variables. The equation for multiple linear regression is:
y = b0 + b1*x1 + b2*x2 + ... + bn*xn
Here, y is the dependent variable, b0 is the constant, b1 to bn are the coefficients, and x1 to xn are the independent variables.
In this section, we'll learn how to implement linear regression in Python using scikit-learn, a popular machine learning library.
Here's a step-by-step guide on how to implement simple linear regression in Python:
Import the necessary libraries:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
Load the dataset:
data = pd.read_csv('data.csv')
Split the dataset into training and testing sets:
X = data.iloc[:, :-1].values
y = data.iloc[:, 1].values
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=55)
Train the model:
regressor = LinearRegression()
regressor.fit(X_train, y_train)
Make predictions:
y_pred = regressor.predict(X_test)
Visualize the results:
plt.scatter(X_train, y_train, color='red')
plt.plot(X_train, regressor.predict(X_train), color='blue')
plt.title('Salary vs Experience (Training set)')
plt.xlabel('Years of Experience')
plt.ylabel('Salary')
plt.show()
This will give you a graph that shows the relationship between the years of experience and the salary of the employees.