"Will code for travel"

Search technical info
Introduction to Linear Regression
Written: 2024-02-05 14:19:25 Last update: 2024-04-26 10:06:26

Table of Content

  1. History of Regression
  2. Introduction to Linear Regression
  3. Multiple Linear Regression
  4. Hands-on: python code in Jupyter Notebook
  5. Advantage, Disadvantage, And Conclusion

Brief history of regression: Sir Francis Galton published a paper called "Regression towards mediocrity in hereditary stature" in 1886 (source: https://simple.wikipedia.org/wiki/Regression_toward_the_mean), although he is best known for his development of correlation, most of his work on inheritance led to the development of regression, from which correlation was a somewhat ingenious deduction. Meanwhile, Arthur Lee Samuel, a computer scientist, is credited with coining the term "Machine Learning" in 1959 (source: https://en.wikipedia.org/wiki/Arthur_Samuel_(computer_scientist)).

Introduction to Linear regression, it is the simplest and fundamental requirement to understand to learn about Machine Learning. As one of the foundational techniques in statistical modeling, linear regression serves as a cornerstone for grasping more complex algorithms in the field of machine learning. Its straightforward approach to modeling the relationship between one or more independent variables (X) and a single dependent variable (y).

The fundamental assumption of linear regression is that this relationship is linear, implying that changes in the independent variable(s) result in proportional changes in the dependent variable. Leveraging this assumption, linear regression constructs a linear equation that best fits the data, enabling predictive analysis and inference within a linear framework. Its applicability spans various industries, making it a vital tool for predictive analytics and data-driven decision-making.

Regressions are very common statistical way to determine whether there’s a relationship between two things also to understand the relationships between variables.

A simple application could be predicting a single independent variable, such as to predict house prices based on the size of the house, house size (X) is the independent (input) variable and the house price (y) is the dependent (target output) variable.

A single independent Linear Regression equation is:

y = (X * coefficient) + intercept
  • Intercept represents the predicted value of the dependent variable when the independent variable is zero.
  • Coefficient represents the change in the dependent variable for a one-unit change in the independent variable, holding all other variables constant.

Knowing the coefficients and intercept values will enable us to consistently predict new values, ensuring a continuous process in linear regression analysis.

In regression tasks, the objective is to predict continuous values, such as house prices, temperature, or stock prices, rather than discrete classes like "cat" or "dog." Consequently, there is no concept of a "threshold" like in classification tasks where a threshold value is used to determine class labels based on predicted probabilities.

Visualize the data points (blue dots) and the straight red line (linear regression formula)

Multiple Linear Regression serves as an extension to simple Linear Regression, addressing the reality of real-world scenarios where multiple independent variables often influence the dependent variable. This makes multiple linear regression an essential tool for predictive analytics and data-driven decision-making across various industries, some examples:

  • Predict house prices based on location, total bedrooms, size, proximity to schools, transportation, shopping centers, etc.
  • Predict personal insurance cost based on age, gender, occupation, income level, education, main living location (urban, suburban, rural), pre-existing condition, family medical history, etc.
  • Predict used-car price based on brand and model, age, mileage, body type, vehicle history, etc.
  • Predict sales revenue based on advertising expenditure, pricing strategies, seasons, competitors activities, etc.
  • Predict company stock price based on profit margin, financial, company's industry trend, consumer behavior, social media trend, bank interest rate, regulatory environment, geopolitical events, etc.

Multiple independent variables Linear Regression equation is:

y = (X1 * coef1) + (X2 * coef2) + (X3 * coef3) + (Xn * coefn) + intercept
  • Intercept represents the predicted value when all the independent variables are zero.
  • In multiple linear regression, there is a coefficient for each independent variable.

Hands-on with python code, there are numerous articles on linear regression available online, this brief article aims to provide a hands-on demonstration of implementing linear regression using Python, with the help of scikit-learn, it is simple and easy to use python code to learn Machine Learning for novice software developer. Please find the Python code in Jupyter Notebook, Github: Basic Linear Regression

Advantages of Linear Regression: One of the key advantages of linear regression is its simplicity. The linear relationship between variables is easy to understand and interpret, making it accessible to both beginners and experts alike. Additionally, linear regression provides transparent insights into the relationship between variables, allowing for meaningful interpretation of coefficients and predictions.

Disadvantages of Linear Regression: Despite its simplicity, linear regression has limitations that may impact its performance in certain scenarios. For instance, it assumes a linear relationship between variables, which may not always hold true in real-world data. In cases where the relationship is non-linear, more flexible modeling techniques such as polynomial regression or machine learning algorithms like decision trees or neural networks may yield better results.

In conclusion, the vast landscape of machine learning, mastering linear regression serves as the foundational stepping stone to understanding more complex algorithms. Its simplicity and interpretability provide invaluable insights into the relationships between variables, laying a solid groundwork for delving into advanced techniques.

After we understand the linear regression then the next learning point is Polynomial Regression.

Search more info