Regression analysis is a fundamental concept in the field of machine learning (ML), in that it helps establish relationships among the variables by estimating how one variable affects the other.
The coefficient of determination, R2 (pronounced “R squared”), is a measure that provides information about how well the regression line suggested by a numerical model approximates the actual data (often referred to as “goodness of fit”).
Quick aside: Here are a couple of datasets to ponder while reading through this blog post: Anscombe’s Quartet and Datasaurus Dozen.
R2 is often one of the initial metrics introduced in predictive regression analysis, and while it is commonly reported, I've found it to be less suitable for some ML applications in Earth Systems Science (ESS), for the following reasons:
R2 is best suited for Gaussian distributions
While you can calculate R2 for nonlinear models, it is less appropriate for variables with non-gaussian distributions.
R2 without slope does not tell the entire story
The R2 value provides information about the proportion of variance explained, but it does not provide insights into the direction or strength of the relationships between variables.
It is crucial to consider the slope of the regression line. A high R2 with a small or insignificant slope may indicate a weak relationship or lack of practical significance.
R2 Is sensitive to outliers
R2 is sensitive to outliers in the data, meaning that extreme values can disproportionately influence the R2 value.
Outliers can significantly impact the regression line and, consequently, the proportion of variance explained by the model.
While R2 can be useful for normally distributed prediction problems in ESS, especially for data exploration or quick feature selection workflows, I recommend using additional prediction metrics (particularly Mean Absolute Error) for day-to-day ML work to ensure a more robust and accurate assessment of ESS ML model performance. Plotting your data is always a necessary step no matter what metric you use!
By way of illustration, I've put together a short Jupyter notebook working through some basic examples of places where R2 might fall short: R2 Playground
Further Reading
If you're interested in learning more about the possible pitfalls of R2, try these:
- Coefficient of Determination Wikipedia page
- Is R-squared Useless?
- R-squared Is Not Valid for Nonlinear Regression
- Root-mean-square error (RMSE) or mean absolute error (MAE): when to use them or not
- Lecture 10: F -Tests, R2, and Other Distractions
- An evaluation of R2 as an inadequate measure for nonlinear models in pharmacological and biochemical research: a Monte Carlo approach
- Avoid R-squared to judge regression model performance
Thomas Martin is an AI/ML Software Engineer at the Unidata Program Center. Have questions? Contact support-ml@unidata.ucar.edu or book an office hours meeting with Thomas on his Calendar.