Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’m proud to announce that my latest research project, reluctant generalized additive modeling (RGAM), is complete (for now)! In this post, I give a brief overview of the method: what it is trying to do and how you can fit such a model in R. (This project is joint work with my advisor, Rob Tibshirani.)
- For an in-depth description of the method, please see our arXiv preprint.
- You can download the CRAN version of the package,
relgam
, here. The latest version of the package is on Github. - For more details on how to use the package, please see the package’s vignette.
Introduction and motivation
tl;dr: Reluctant generalized additive modeling (RGAM) produces highly interpretable sparse models which allow non-linear relationships between the response and each individual feature. However, non-linear relationships are only included if deemed important in improving prediction performance. RGAMs working with quantitative, binary, count and survival responses and is computationally efficient.
Consider the supervised learning setting, where we have observations of
features
for
and
, along with n responses
. Let
denote the values of the
th feature. Generalized linear models (GLMs) assume that the relationship between the response and the features is
where is a link function and
is a mean-zero error term. Generalized additive models (GAMs) are a more flexible class of models, assuming the true relationship to be
where the ‘s are unknown functions to be determined by the model.
These two classes of models include all features in the model which is often undesirable, especially when we have tons of features. (We usually expect only a small fraction of features to have any influence on the response variable.) This is especially problematic with GAMs as overfitting can occur much more easily. A host of methods have arisen to create sparse GAMs, i.e. GAMs that involve only a handful of features. Earlier examples of such examples include COSSO (Lin & Zhang 2006) and SpAM (Ravikumar et al. 2007).
While providing sparsity, these methods dictated that the features included had to have a non-linear relationship with the response even if a linear relationship would have been sufficient to capture the relationship. More sophisticated methods were developed to give both sparsity and the possibility of linear or non-linear relationships between the features and response. Examples of such methods are GAMSEL (Chouldechova & Hastie 2015), SPLAM (Lou et al. 2016) and SPLAT (Petersen & Witten 2019). GAMSEL is available on R in the gamsel
package (see my unofficial vignette here) and I was not able to find R packages for the other two methods.
Reluctant generalized additive models (RGAM) fall in the same class as these last group of methods. It is available on R in the relgam
package. RGAMs are computationally fast and work with quantitative, binary, count and survival response variables. (To my knowledge, existing software only works for quantitative and binary variables.)
RGAM: What is it?
Reluctant generalizing additive modeling was inspired by reluctant interaction modeling (Yu et al. 2019). The idea is that
One should prefer a linear term over a non-linear term if all else is equal.
That is, we prefer a model to contain only effects that are linear in the original set of features: non-linearities are only included thereafter if they add to predictive performance.
We operationalize this principle with a three-step process that closely mimics that of reluctant interaction modeling. At a high level:
- Fit the response as well as we can using only the main effects (i.e. original features).
- For each original feature
, construct a non-linear feature associated with it.
- Refit the response on all the main effects and the additional features from Step 2.
Now for a little bit more detail:
- Fit the lasso of
on
to get coefficients
. Compute the residuals
, using the
hyperparameter chosen by cross-validation.
- For each
, fit a smoothing spline with
degrees of freedom of
on
which we denote by
. Rescale
so that
. Let
denote the matrix whose columns are the
‘s.
- Fit the lasso of
on
and
for a path of tuning parameters
.
There are three hyperparameters here: (just like the lasso),
for the smoothing spline degrees of freedom in Step 2, and
for scaling the non-linear features. The role of
might be a bit hard to understand from the technical description above. Informally,
means that the linear and non-linear features are on the same scale.
means that the non-linear features are on a smaller scale: as a result, the coefficient associated with them is less likely to survive variable selection by the lasso in Step 3.
A simple example
The CRAN vignette is the best place to start learning how to fit RGAMs in practice. Below I give an example of the types of models that can come out of RGAM. (Code for this example can be found here.)
We simulate data with observations and
features. Each entry in the
matrix is an independent draw from the standard normal distribution, and the true response is
We fit a RGAM to this data for a sequence of values. The larger the
index, the smaller the
value, the less penalty imposed in the lasso in Step 3, resulting in more flexible models.
For each value, RGAM’s predictions have the form
Let . We plot the model fits for the first 30
values in the animation below. In each of the 12 panes, we see the estimated
for each variable (in blue, green or red), and the true relationship
in black.
For small indices (i.e. large
values), we have very restricted models, with most
‘s being zero or linear. As the
index increases, we see that the RGAM model fits get closer and closer to the true relationships. Past some
index, we start to see some overfitting going on. The optimal value of
can be chosen via methods like cross-validation.
Give it a try!
I think RGAM is a neat extension to GAM and other sparse additive models. It may not always perform best but I think it is a nice tool to add to your arsenal of interpretable models to try for supervised learning problems!
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.