Linear Regression
- Regression analysis is one of the most important fields in statistics and machine learning.
- There are many regression methods available. Linear regression is one of them.
- Regression searches for relationships among variables.
- Linear regression is probably one of the most important and widely used regression techniques.
- Itβs among the simplest regression methods.
- One of its main advantages is the ease of interpreting results.
When Do You Need Regression?
- you need regression to answer whether and how some phenomenon influences the other or how several variables are related.
- For example, you can use it to determine if and to what extent experience or gender impacts salaries.
Formulation
dependent variable π¦
on the set of independent variables π± = (π₯β, β¦, π₯α΅£),
where π is the number of predictors,
a linear relationship between π¦ and π±: π¦ = π½β + π½βπ₯β + β― + π½α΅£π₯α΅£ + π.
this equation is the regression equation. π½β, π½β, β¦, π½α΅£ are the regression coefficients,
π is the random error.
Linear Regression types
- simple linear regression
- multiple linear regression
- polynomial regression
Underfitting and Overfitting
-
Underfitting occurs when a model canβt accurately capture the dependencies among data, usually as a consequence of its own simplicity. It often yields a low π Β² with known data and bad generalization capabilities when applied with new data.
-
Overfitting happens when a model learns both data dependencies and random fluctuations. In other words, a model learns the existing data too well. Complex models, which have many features or terms, are often prone to overfitting. When applied to known data, such models usually yield high π Β². However, they often donβt generalize well and have significantly lower π Β² when used with new data.
Simple Linear Regression With scikit-learn
There are five basic steps when youβre implementing linear regression:
- Import the packages and classes that you need.
- Provide data to work with, and eventually do appropriate transformations.
- Create a regression model and fit it with existing data.
- Check the results of model fitting to know whether the model is satisfactory.
- Apply the model for predictions.