Quantcast
Channel: R-bloggers
Viewing all articles
Browse latest Browse all 12081

Quantile Regression (home made, part 2)

$
0
0

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation. So since I should teach those tomorrow, let me fix them.

Median

Consider a sample \{y_1,\cdots,y_n\}. To compute the median, solve\min_\mu \left\lbrace\sum_{i=1}^n|y_i-\mu|\right\rbracewhich can be solved using linear programming techniques. More precisely, this problem is equivalent to\min_{\mu,\mathbf{a},\mathbf{b}}\left\lbrace\sum_{i=1}^na_i+b_i\right\rbracewith a_i,b_i\geq 0 and y_i-\mu=a_i-b_i, \forall i=1,\cdots,n. Heuristically, the idea is to write y_i=\mu+\varepsilon_i, and then define a_i‘s and b_i‘s so that \varepsilon_i=a_i-b_i and |\varepsilon_i|=a_i+b_i, i.e. a_i=(\varepsilon_i)_+=\max\lbrace0,\varepsilon_i\rbrace=|\varepsilon|\cdot\boldsymbol{1}_{\varepsilon_i>0}andb_i=(-\varepsilon_i)_+=\max\lbrace0,-\varepsilon_i\rbrace=|\varepsilon|\cdot\boldsymbol{1}_{\varepsilon_i<0}[/latex]denote respectively the positive and the negative parts.</p><p>Unfortunately (that was the error in my previous post), the expression of linear programs is[latex display="true"]\min_{\mathbf{z}}\left\lbrace\boldsymbol{c}^\top\mathbf{z}\right\rbrace\text{ s.t. }\boldsymbol{A}\mathbf{z}=\boldsymbol{b},\mathbf{z}\geq\boldsymbol{0}In the equation above, with the a_i‘s and b_i‘s, we’re not far away. Except that we have \mu\in\mathbb{R}, while it should be positive. So similarly, set \mu=\mu^+-\mu^- where \mu^+=(\mu)_+ and \mu^-=(-\mu)_+.

Thus, let\mathbf{z}=\big(\mu^+;\mu^-;\boldsymbol{a},\boldsymbol{b}\big)^\top\in\mathbb{R}_+^{2n+2}and then write the constraint as \boldsymbol{A}\mathbf{z}=\boldsymbol{b} with \boldsymbol{b}=\boldsymbol{y} and \boldsymbol{A}=\big[\boldsymbol{1}_n;-\boldsymbol{1}_n;\mathbb{I}_n;-\mathbb{I}_n\big]And for the objective function\boldsymbol{c}=\big(\boldsymbol{0},\boldsymbol{1}_n,-\boldsymbol{1}_n\big)^\top\in\mathbb{R}_+^{2n+2}

To illustrate, consider a sample from a lognormal distribution,

12345
n =101set.seed(1)y =rlnorm(n)median(y)[1]1.077415

For the optimization problem, use the matrix form, with 3n constraints, and 2n+1 parameters,

123456789
library(lpSolve) X =rep(1,n) A =cbind(X, -X, diag(n), -diag(n))b = yc=c(rep(0,2), rep(1,n),rep(1,n))equal_type =rep("=", n) r = lp("min", c,A,equal_type,b)head(r$solution,1)[1]1.077415

It looks like it’s working well…

Quantile

Of course, we can adapt our previous code for quantiles

1234
tau = .3quantile(y,tau)30%0.6741586

The linear program is now\min_{q^+,q^-,\mathbf{a},\mathbf{b}}\left\lbrace\sum_{i=1}^n\tau a_i+(1-\tau)b_i\right\rbracewith a_i,b_i,q^+,q^-\geq 0 and y_i=q^+-q^-+a_i-b_i, \forall i=1,\cdots,n. The R code is now

1234
c=c(rep(0,2), tau*rep(1,n),(1-tau)*rep(1,n))r = lp("min", c,A,equal_type,b)head(r$solution,1)[1]0.6741586

So far so good…

Quantile Regression

Consider the following dataset, with rents of flat, in a major German city, as function of the surface, the year of construction, etc.

1
base=read.table("http://freakonometrics.free.fr/rent98_00.txt",header=TRUE)

The linear program for the quantile regression is now\min_{\boldsymbol{\beta}^+,\boldsymbol{\beta}^-,\mathbf{a},\mathbf{b}}\left\lbrace\sum_{i=1}^n\tau a_i+(1-\tau)b_i\right\rbracewith a_i,b_i\geq 0 and y_i=\boldsymbol{x}^\top[\boldsymbol{\beta}^+-\boldsymbol{\beta}^-]+a_i-b_i\forall i=1,\cdots,n and \beta_j^+,\beta_j^-\geq 0\forall j=0,\cdots,k. So use here

123456789101112131415
require(lpSolve) tau = .3n=nrow(base)X =cbind(1, base$area)y = base$rent_euroK =ncol(X)N =nrow(X)A =cbind(X,-X,diag(N),-diag(N))c=c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N))b = base$rent_euroconst_type =rep("=",N)r = lp("min",c,A,const_type,b)beta= r$sol[1:K]-  r$sol[(1:K+K)]beta[1]148.9468643.289674

Of course, we can use R function to fit that model

12345
library(quantreg)rq(rent_euro~area, tau=tau, data=base)Coefficients:(Intercept)        area  148.9468643.289674

Here again, it seems to work quite well. We can use a different probability level, of course, and get a plot

1234567891011
plot(base$area,base$rent_euro,xlab=expression(paste("surface (",m^2,")")),     ylab="rent (euros/month)",col=rgb(0,0,1,.4),cex=.5)sf=0:250yr=r$solution[2*n+1]+r$solution[2*n+2]*sflines(sf,yr,lwd=2,col="blue")tau = .9r = lp("min",c,A,const_type,b)tail(r$solution,2)[1]121.8155057.865536yr=r$solution[2*n+1]+r$solution[2*n+2]*sflines(sf,yr,lwd=2,col="blue")

And we can adapt the later to multiple regressions, of course,

1234567891011
X =cbind(1,base$area,base$yearc)K =ncol(X)N =nrow(X)A =cbind(X,-X,diag(N),-diag(N))c=c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N))b = base$rent_euroconst_type =rep("=",N)r = lp("min",c,A,const_type,b)beta= r$sol[1:K]-  r$sol[(1:K+K)]beta[1]-5542.5032523.9781352.887234

to be compared with

12345678
library(quantreg)rq(rent_euro~ area + yearc, tau=tau, data=base) Coefficients:(Intercept)         area        yearc -5542.5032523.9781352.887234 Degrees of freedom:4571 total;4568 residual
var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


Viewing all articles
Browse latest Browse all 12081

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>