Quantcast
Channel: R-bloggers
Viewing all articles
Browse latest Browse all 12081

Function basis and regression

$
0
0

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the first part of the course on linear models, we’ve seen how to construct a linear model when the vector of covariates \boldsymbol{x} is given, so that \mathbb{E}(Y|\boldsymbol{X}=\boldsymbol{x}) is either simply \boldsymbol{x}^\top\boldsymbol{\beta} (for standard linear models) or a functional of \boldsymbol{x}^\top\boldsymbol{\beta} (in GLMs). But more generally, we can consider transformations of the covariates, so that a linear model can be used. In a very general setting, consider \sum_{j=1}^m\beta_j h_j(\boldsymbol{x})with h_j:\mathbb{R}^p\rightarrow\mathbb{R}. The standard linear model is obtained when m=p and h_j(\boldsymbol{x})=x_j , but of course, much more general models can be obtained, for instance with h_k(\boldsymbol{x})=x_j^2 orh_k(\boldsymbol{x})=x_{j}x_{j'}, that could be used to achieve high-order Taylor expansions. In that case, we will obtain the polynomial regression, that we will discuss first. We might also think of piecewise constant functions, h_k(\boldsymbol{x})=\boldsymbol{1}(x_j\in [a,b]) , that could be related to regression trees (but that is not in the scope in the STT5100 course). And if we go on step futher, we might think of piecewise linear or piecewise polynomial function, possibly with additional continuity constraints, that will lead us to spline basis.

  • Polynomial regression

For pedagogical purpose, when I talk about polynomial regression, I always have in mind (in the univariate case) y=\beta_0+\beta_1x+\beta_2x^2+\cdots+\beta_kx^k+\varepsilonbut if we use

1
lm(y~poly(x,k))

in R, the output is not the \beta_j‘s.

As discussed in Kennedy & Gentle (1980) Statistical Computing,

Recall that orthogonal polynomials are defined with respect to the classical inner-product (on the finite interval (a,b)){\displaystyle \langle f,g\rangle =\int _{a}^{b}f(x)g(x)~\mathrm {d} x} And a sequence of orthogonal polynomials is (P_n) where P_n is a polynomial of degree n, for all n, and such that P_m\perpP_n for all m\neq n. Note that those polyomials are orthogonal with respect to the inner product defined above, i.e. given some finite interval (a,b). But if (a,b) changes, the polynomials will be different.

A popular family of orthogonal polynomial, on finite interval (-1,+1) is the family of Legendre polynomials, satisfying{\displaystyle \int _{-1}^{1}P_{m}(x)P_{n}(x)~\mathrm {d} x=0}as soon as m\neq n. Those polynomials satisfy Bonnet’s recursion formula{\displaystyle (n+1)P_{n+1}(x)=(2n+1)xP_{n}(x)-nP_{n-1}(x)} or Rodrigues’ formula{\displaystyle P_{n}(x)={\frac {1}{2^{n}n!}}{\frac {d^{n}}{dx^{n}}}(x^{2}-1)^{n}}The first values are here{\displaystyle P_{0}(x)=1} {\displaystyle P_{1}(x)=x}{\displaystyle P_{2}(x)={\frac {3x^{2}-1}{2}}}{\displaystyle P_{3}(x)={\frac {5x^{3}-3x}{2}}} {\displaystyle P_{4}(x)={\frac {35x^{4}-30x^{2}+3}{8}}}

Interestingly, we can get those polynomial functions using

12345678910111213141516
library(orthopolynom)(leg4coef = legendre.polynomials(n=4))[[1]]1 [[2]]x  [[3]]-0.5+1.5*x^2 [[4]]-1.5*x +2.5*x^3 [[5]]0.375-3.75*x^2+4.375*x^4

Of course, there are many families of orthogonal polynomials (Jacobi polynomials, Laguerre polynomials, Hermite polynomials, etc). Now, in R, there is the standard poly function, that we use in polynomial regression.

1234567891011121314151617181920
x =seq(-1,1,length=101)y =poly(x,4)y                   1234[1,]-1.706475e-01  0.215984813-2.480753e-01  0.270362873[2,]-1.672345e-01  0.203025724-2.183063e-01  0.216290298...[100,]  1.672345e-01  0.203025724  2.183063e-01  0.216290298[101,]  1.706475e-01  0.215984813  2.480753e-01  0.270362873attr(,"coefs")attr(,"coefs")$alpha[1] 3.157229e-17 2.655145e-16 9.799244e-17 5.368224e-16 attr(,"coefs")$norm2[1]1.0000000101.000000034.34000009.33773282.44723300.6330176 attr(,"degree")[1]1234attr(,"class")[1]"poly""matrix"

But these are not Legendre polynomials… As explained in 李哲源‘s post on stackoverflow, the idea is to start with P_{-1}(x)=0, P_{0}(x)=1 and P_{1}(x)=x, and then define \ell_n=\langle P_n,P_n\rangle  as well as \alpha_n=\langle P_nP_1,P_1\rangle/\ell_n=\langle P_n^2,P_1\rangle/\ell_i= and \beta_n=\ell_n/\ell_{n-1}. Finally, define recursively{\displaystyle P_{n}(x)=(x-\alpha_{n-1})P_{n-1}(x)-\beta_{i-1}P_{i-2}(x)}and its normalized version, \tilde{P}_{n}=P_n/\sqrt{\ell_n}. That is what poly computes.

So, for pedagogical purpose, I said that I like to use y=\boldsymbol{x}^\top\boldsymbol{\beta}+\varepsilon where\boldsymbol{x}=(1,x,x^2,\cdots,xˆ{k-1},x^k)And actually, when using poly, we use the QR decomposition of that matrix. As discussed in in 李哲源‘s post, we can almost reproduce the poly function using

12345678910111213
my_poly -function(x, degree =1){    xbar =mean(x)    x = x - xbar    QR =qr(outer(x, 0:degree, "^"))    X =qr.qy(QR, diag(diag(QR$qr), length(x), degree +1))[, -1, drop= FALSE]    X2 = X * X    norm2 =colSums(X * X)       alpha =drop(crossprod(X2, x))/ norm2    beta= norm2 /(c(length(x), norm2[-degree]))colnames(X)=1:degree    scale=sqrt(norm2)    X = X *rep(1/scale, each =length(x))    X}

Nevertheless, the two models are equivalent. More precisely,

12345678
plot(cars)reg1 =lm(dist~speed+I(speed^2)+I(speed^3),data=cars)reg2 =lm(dist~poly(speed,3),data=cars)u =seq(3,26,by=.1)v1 =predict(reg1,newdata=data.frame(speed=u))v2 =predict(reg2,newdata=data.frame(speed=u))lines(u,v1,col="blue")lines(u,v2,col="red",lty=2)

We have exactly the same prediction here

123456
v1[u==15]12138.43919 v2[u==15]12138.43919

And probably also quite interesting : the coefficients do not have the same interpretation (since we do not have the same basis), but the p-value for the highest degree is exactly the same here ! Here the two models reject, with the same confidence, the polynomial of degree three,

123456789101112131415161718192021222324252627
summary(reg1) Coefficients:             Estimate Std. Errort value Pr(>|t|)(Intercept)-19.5050528.40530-0.6870.496speed         6.801116.801131.0000.323I(speed^2)-0.349660.49988-0.6990.488I(speed^3)0.010250.011300.9070.369 Residual standard error:15.2 on 46 degrees of freedomMultiple R-squared:0.6732,Adjusted R-squared:0.6519 F-statistic:31.58 on 3 and 46 DF,  p-value: 3.074e-11 summary(reg2) Coefficients:                Estimate Std. Errort value Pr(>|t|)(Intercept)42.982.1519.988< 2e-16 ***poly(speed, 3)1145.5515.219.573  1.6e-12 ***poly(speed, 3)223.0015.211.5120.137poly(speed, 3)313.8015.210.9070.369---Signif. codes:0***0.001**0.01*0.05‘.’ 0.1‘ ’ 1 Residual standard error:15.2 on 46 degrees of freedomMultiple R-squared:0.6732,Adjusted R-squared:0.6519 F-statistic:31.58 on 3 and 46 DF,  p-value: 3.074e-11
  • B-splines regression (and GAMs)

Splines are also important in regression models, especially when we start talking about Generalized Additive Models. See Perperoglou, Sauerbrei, Abrahamowicz & Schmid (2019) for a review. In the univariate case, I introduce (linear) splines through positive parts, in the sense thaty=\beta_0+\beta_1x+\beta_2(x-s_1)_++\cdots+\beta_k(x-s_{k-1})_++\varepsilonwhere (x-s)_+ equals 0 if x<s[/latex] and [latex]x-s[/latex] if [latex]x>s. Those functions are nice since they are continuous, so the model is continuous (the weighted sum of continuous functions is continuous). And we can go one step further, with y=\beta_0+\beta_1x+\beta_2x^2+\beta_3(x-s_1)^2_++\cdots+\beta_k(x-s_{k-2})^2_++\varepsilonwith quadratic splines, or y=\beta_0+\beta_1x+\beta_2x^2+\beta_3x^3+\beta_4(x-s_1)^3_++\cdots+\beta_k(x-s_{k-3})^3_++\varepsilonfor cubic splines. Interestingly, quadratic splines are not only continuous, but their first derivative is also continuous (and the second one for cubic splines). So the knot discontinuity is s_1,s_2,\cdots is now invisible…

I like those models since they are easy to interprete. For example, the simple model \beta_1 x+\beta_2(x-s)_+ is the following piecewise linear function, continuous, with a “rupture” at knot s.

Observe also the following interpretation: for small values of x, there is a linear increase, with slope \beta_1, and for lager values of x, there is a linear decrease, with slope \beta_1+\beta_2. Hence, \beta_2 is interpreted as a change of the slope.

Unfortunately, it is now what R is using when using the bs function in R, which are the standard B-splines. Just to visualize (I will skip the maths here), with R, we have

1234567
library(splines)clr6 =c("#1b9e77","#d95f02","#7570b3","#e7298a","#66a61e","#e6ab02")x =seq(5,25,by=.25)B = bs(x,knots=c(10,20),Boundary.knots=c(5,55),degre=1)matplot(x,B,type="l",lty=1,lwd=2,col=clr6)B=bs(x,knots=c(10,20),Boundary.knots=c(5,55),degre=2)matplot(x,B,type="l",col=clr6,lty=1,lwd=2)

while the functions I mentioned were (more or less) the following

123456789
pos =function(x,s)(x-s)*(x&gt;s)par(mfrow=c(1,2))clr6 =c("#1b9e77","#d95f02","#7570b3","#e7298a","#66a61e","#e6ab02")x =seq(5,25,by=.25)B =cbind(pos(x,5),pos(x,10),pos(x,20))matplot(x,B,type="l",lty=1,lwd=2,col=clr6)pos2 =function(x,s)(x-s)^2*(x&gt;s)B =cbind(pos(x,5)*20,pos2(x,5),pos2(x,10),pos2(x,20))matplot(x,B,type="l",col=clr6,lty=1,lwd=2)

And as for the polynomial regression, the two models are equivalent. For example

1234567
plot(cars)reg1 =lm(dist~speed+pos(speed,10)+pos(speed,20),data=cars)reg2 =lm(dist~bs(speed,degree=1,knots=c(10,20)),data=cars)v1 =predict(reg1,newdata=data.frame(speed=u))v2 =predict(reg2,newdata=data.frame(speed=u))lines(u,v1,col="blue")lines(u,v2,col="red",lty=2)

or more specifically

123456
v1[u==15]12139.35747 v2[u==15]12139.35747

So one more time, the two models are equivalent, but I still find the approach with the positive part more intuitive, and easy to understand. As well as the interpretation of coefficients,

1234567891011121314151617181920212223242526272829
summary(reg1) Coefficients:               Estimate Std. Errort value Pr(&gt;|t|)(Intercept)-7.630516.2941-0.4680.6418  speed            3.06301.82381.6790.0998 .pos(speed, 10)0.20872.24530.0930.9263  pos(speed, 20)4.28122.28431.8740.0673 .---Signif. codes:0***0.001**0.01*0.05‘.’ 0.1‘ ’ 1 Residual standard error:15 on 46 degrees of freedomMultiple R-squared:0.6821,Adjusted R-squared:0.6613 F-statistic:32.89 on 3 and 46 DF,  p-value: 1.643e-11 summary(reg2) Coefficients:                                          Estimate Std. Errort value Pr(&gt;|t|)(Intercept)4.6219.3440.4950.6233    bs(speed, degree =1, knots=c(10, 20))118.37810.9431.6790.0998 .  bs(speed, degree =1, knots=c(10, 20))251.09410.0405.089 6.51e-06 ***bs(speed, degree =1, knots=c(10, 20))388.85912.0477.376 2.49e-09 ***---Signif. codes:0***0.001**0.01*0.05‘.’ 0.1‘ ’ 1 Residual standard error:15 on 46 degrees of freedomMultiple R-squared:0.6821,Adjusted R-squared:0.6613 F-statistic:32.89 on 3 and 46 DF,  p-value: 1.643e-11

Here we can see directly that the first knot was not interesting (the slope did not change significantly) while the second one was…

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' };</p><p> (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Viewing all articles
Browse latest Browse all 12081

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>