Table of Content

- History of Regression
- Introduction to Linear Regression
- Multiple Linear Regression
- Hands-on: python code in Jupyter Notebook
- Advantage, Disadvantage, And Conclusion

Brief history of regression: Sir Francis Galton published a paper called "Regression towards mediocrity in hereditary stature" in 1886 (source: https://simple.wikipedia.org/wiki/Regression_toward_the_mean), although he is best known for his development of correlation, most of his work on inheritance led to the development of regression, from which correlation was a somewhat ingenious deduction. Meanwhile, Arthur Lee Samuel, a computer scientist, is credited with coining the term "Machine Learning" in 1959 (source: https://en.wikipedia.org/wiki/Arthur_Samuel_(computer_scientist)).

Introduction to Linear regression, it is the simplest and fundamental requirement to understand to learn about Machine Learning. As one of the foundational techniques in statistical modeling, linear regression serves as a cornerstone for grasping more complex algorithms in the field of machine learning. Its straightforward approach to modeling the relationship between one or more independent variables (X) and a single dependent variable (y).

The fundamental assumption of linear regression is that this relationship is linear, implying that changes in the independent variable(s) result in proportional changes in the dependent variable. Leveraging this assumption, linear regression constructs a linear equation that best fits the data, enabling predictive analysis and inference within a linear framework. Its applicability spans various industries, making it a vital tool for predictive analytics and data-driven decision-making.

A simple application could be predicting a single independent variable, such as to predict house prices based on the size of the house, house size (X) is the independent (input) variable and the house price (y) is the dependent (target output) variable.

A single independent Linear Regression equation is:

y = (X * coefficient) + intercept

- Intercept represents the predicted value of the dependent variable when the independent variable is zero.
- Coefficient represents the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant.

*Knowing the coefficients and intercept values will enable us to consistently predict new values, ensuring a continuous process in linear regression analysis.*

*In regression tasks, our focus is on predicting continuous values rather than discrete classes. Therefore, the concept of a "threshold" isn't applicable in the same manner as it is in classification tasks. In classification scenarios, a threshold is frequently employed to assign class labels based on predicted probabilities or scores.*

Visualize the data points (blue dots) and the straight red line (linear regression formula)

Multiple Linear Regression serves as an extension to simple Linear Regression, addressing the reality of real-world scenarios where multiple independent variables often influence the dependent variable. This makes multiple linear regression an essential tool for predictive analytics and data-driven decision-making across various industries, some examples:

- Predict house prices based on location, total bedrooms, size, proximity to schools, transportation, shopping centers, etc.
- Predict personal insurance cost based on age, gender, occupation, income level, education, main living location (urban, suburban, rural), pre-existing condition, family medical history, etc.
- Predict used-car price based on brand and model, age, mileage, body type, vehicle history, etc.
- Predict sales revenue based on advertising expenditure, pricing strategies, seasons, competitors activities, etc.
- Predict company stock price based on profit margin, financial, company's industry trend, consumer behavior, social media trend, bank interest rate, regulatory environment, geopolitical events, etc.

Multiple independent variables Linear Regression equation is:

y = (X_{1}* coef_{1}) + (X_{2}* coef_{2}) + (X_{3}* coef_{3}) + (X_{n}* coef_{n}) + intercept

- Intercept represents the predicted value when all the independent variables are zero.
- In multiple linear regression, there is a coefficient for each independent variable.