Linear regression doesn’t receive much attention nowadays. Compared to the excitement surrounding deep learning, artificial intelligence and automated machine learning, good old fashioned linear regression, a commonly used type of predictive analysis, rarely generates a headline. But don’t let that lead you to believe it isn’t still an important technique in a data scientist’s toolbox.
Why? Well, aside from the fact that it is a robust, effective and well-understood technique, linear regression is easily interpretable and that reason alone makes it an essential step in many companies’ journey into analytical maturity.
The 5 Stages of Analytical Maturity, first popularized by Thomas Davenport and Jeanne Harris’ book Competing on Analytics, provide an effective framework for assessing the current analytical capabilities of a firm. For companies which are working towards Stage 4: Analytical Companies (which would include a large proportion of most small- to mid-sized organizations), interpretability remains a key consideration.
Why? Well, in short – “buy-in.” Let me share a story to illustrate:
I recently met with a data scientist who monitors large volumes of transactional data for potential fraud. His algorithm of choice for scoring observations? Logistic regression. Sure, he could deploy a deep learning auto-encoder, or a gradient boosted decision tree and perhaps improve the accuracy of his model. But that’s not in the best interests of his team’s analytics practice today - because his stakeholders, who are still a bit skeptical about the benefits of advanced analytics, are not ready to accept the predictions of a black box solution.
But logistic regression? They get it. They understand each observation is scored with the probability of being a fraudulent transaction and can continue their normal work with this new information. In time, his stakeholders will begin to trust these predictions more and more. And as enthusiasm and belief in analytical capabilities grow, he will then have opportunities to iterate more advanced models.
But if he had rushed to adopt advanced models, he could have lost the support of key individuals and failed to achieve the goals of the department. This is also a great example of how important business domain knowledge is to the success of a data scientist.
So, while it may not be as exciting as the more advanced algorithms available, linear regression and similar techniques (logistic regression and generalized additive models) can still create great value for analytics teams today.