R2: Downsides and Potential Pitfalls for ESS ML Prediction

Datasaurus plot
Always plot your data!
(Click to see why.)

Regression analysis is a fundamental concept in the field of machine learning (ML), in that it helps establish relationships among the variables by estimating how one variable affects the other.

The coefficient of determination, R2 (pronounced “R squared”), is a measure that provides information about how well the regression line suggested by a numerical model approximates the actual data (often referred to as “goodness of fit”).

Quick aside: Here are a couple of datasets to ponder while reading through this blog post: Anscombe’s Quartet and Datasaurus Dozen.

R2 is often one of the initial metrics introduced in predictive regression analysis, and while it is commonly reported, I've found it to be less suitable for some ML applications in Earth Systems Science (ESS), for the following reasons:

R2 is best suited for Gaussian distributions

While you can calculate R2 for nonlinear models, it is less appropriate for variables with non-gaussian distributions.

R2 without slope does not tell the entire story

The R2 value provides information about the proportion of variance explained, but it does not provide insights into the direction or strength of the relationships between variables.

It is crucial to consider the slope of the regression line. A high R2 with a small or insignificant slope may indicate a weak relationship or lack of practical significance.

R2 Is sensitive to outliers

R2 is sensitive to outliers in the data, meaning that extreme values can disproportionately influence the R2 value.

Outliers can significantly impact the regression line and, consequently, the proportion of variance explained by the model.

While R2 can be useful for normally distributed prediction problems in ESS, especially for data exploration or quick feature selection workflows, I recommend using additional prediction metrics (particularly Mean Absolute Error) for day-to-day ML work to ensure a more robust and accurate assessment of ESS ML model performance. Plotting your data is always a necessary step no matter what metric you use!

By way of illustration, I've put together a short Jupyter notebook working through some basic examples of places where R2 might fall short: R2 Playground

Further Reading

If you're interested in learning more about the possible pitfalls of R2, try these:

Thomas Martin is an AI/ML Software Engineer at the Unidata Program Center. Have questions? Contact support-ml@unidata.ucar.edu or book an office hours meeting with Thomas on his Calendar.

Comments:

Post a Comment:
Comments are closed for this entry.
News@Unidata
News and information from the Unidata Program Center
News@Unidata
News and information from the Unidata Program Center

Welcome

FAQs

Developers’ blog

Take a poll!

What if we had an ongoing user poll in here?

Browse By Topic
Browse by Topic
« November 2024
SunMonTueWedThuFriSat
     
2
3
4
5
6
7
8
9
10
11
14
15
16
17
18
19
20
21
22
23
24
26
27
28
29
30
       
Today