Quantile Regression (home made, part 2)

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few months ago, I posted a note with some home made codes for quantile regression… there was something odd on the output, but it was because there was a (small) mathematical problem in my equation. So since I should teach those tomorrow, let me fix them.

Median

Consider a sample \{y_1,\cdots,y_n\}. To compute the median, solve\min_\mu \left\lbrace\sum_{i=1}^n|y_i-\mu|\right\rbracewhich can be solved using linear programming techniques. More precisely, this problem is equivalent to\min_{\mu,\mathbf{a},\mathbf{b}}\left\lbrace\sum_{i=1}^na_i+b_i\right\rbracewith a_i,b_i\geq 0 and y_i-\mu=a_i-b_i, \forall i=1,\cdots,n. Heuristically, the idea is to write y_i=\mu+\varepsilon_i, and then define a_i‘s and b_i‘s so that \varepsilon_i=a_i-b_i and |\varepsilon_i|=a_i+b_i, i.e. a_i=(\varepsilon_i)_+=\max\lbrace0,\varepsilon_i\rbrace=|\varepsilon|\cdot\boldsymbol{1}_{\varepsilon_i>0}andb_i=(-\varepsilon_i)_+=\max\lbrace0,-\varepsilon_i\rbrace=|\varepsilon|\cdot\boldsymbol{1}_{\varepsilon_i<0}[/latex]denote respectively the positive and the negative parts.</p><p>Unfortunately (that was the error in my previous post), the expression of linear programs is[latex display="true"]\min_{\mathbf{z}}\left\lbrace\boldsymbol{c}^\top\mathbf{z}\right\rbrace\text{ s.t. }\boldsymbol{A}\mathbf{z}=\boldsymbol{b},\mathbf{z}\geq\boldsymbol{0}In the equation above, with the a_i‘s and b_i‘s, we’re not far away. Except that we have \mu\in\mathbb{R}, while it should be positive. So similarly, set \mu=\mu^+-\mu^- where \mu^+=(\mu)_+ and \mu^-=(-\mu)_+.

Thus, let\mathbf{z}=\big(\mu^+;\mu^-;\boldsymbol{a},\boldsymbol{b}\big)^\top\in\mathbb{R}_+^{2n+2}and then write the constraint as \boldsymbol{A}\mathbf{z}=\boldsymbol{b} with \boldsymbol{b}=\boldsymbol{y} and \boldsymbol{A}=\big[\boldsymbol{1}_n;-\boldsymbol{1}_n;\mathbb{I}_n;-\mathbb{I}_n\big]And for the objective function\boldsymbol{c}=\big(\boldsymbol{0},\boldsymbol{1}_n,-\boldsymbol{1}_n\big)^\top\in\mathbb{R}_+^{2n+2}

To illustrate, consider a sample from a lognormal distribution,

n =101set.seed(1)y =rlnorm(n)median(y)[1]1.077415

For the optimization problem, use the matrix form, with 3n constraints, and 2n+1 parameters,

123456789

library(lpSolve) X =rep(1,n) A =cbind(X, -X, diag(n), -diag(n))b = yc=c(rep(0,2), rep(1,n),rep(1,n))equal_type =rep("=", n) r = lp("min", c,A,equal_type,b)head(r$solution,1)[1]1.077415

It looks like it’s working well…

Quantile

Of course, we can adapt our previous code for quantiles

1234	tau = .3quantile(y,tau)30%0.6741586

The linear program is now\min_{q^+,q^-,\mathbf{a},\mathbf{b}}\left\lbrace\sum_{i=1}^n\tau a_i+(1-\tau)b_i\right\rbracewith a_i,b_i,q^+,q^-\geq 0 and y_i=q^+-q^-+a_i-b_i, \forall i=1,\cdots,n. The R code is now

1234	c=c(rep(0,2), taurep(1,n),(1-tau)rep(1,n))r = lp("min", c,A,equal_type,b)head(r$solution,1)[1]0.6741586

So far so good…

Quantile Regression

Consider the following dataset, with rents of flat, in a major German city, as function of the surface, the year of construction, etc.

1	base=read.table("http://freakonometrics.free.fr/rent98_00.txt",header=TRUE)

The linear program for the quantile regression is now\min_{\boldsymbol{\beta}^+,\boldsymbol{\beta}^-,\mathbf{a},\mathbf{b}}\left\lbrace\sum_{i=1}^n\tau a_i+(1-\tau)b_i\right\rbracewith a_i,b_i\geq 0 and y_i=\boldsymbol{x}^\top[\boldsymbol{\beta}^+-\boldsymbol{\beta}^-]+a_i-b_i\forall i=1,\cdots,n and \beta_j^+,\beta_j^-\geq 0\forall j=0,\cdots,k. So use here

123456789101112131415

require(lpSolve) tau = .3n=nrow(base)X =cbind(1, base$area)y = base$rent_euroK =ncol(X)N =nrow(X)A =cbind(X,-X,diag(N),-diag(N))c=c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N))b = base$rent_euroconst_type =rep("=",N)r = lp("min",c,A,const_type,b)beta= r$sol[1:K]-  r$sol[(1:K+K)]beta[1]148.9468643.289674

Of course, we can use R function to fit that model

library(quantreg)rq(rent_euro~area, tau=tau, data=base)Coefficients:(Intercept)        area  148.9468643.289674

Here again, it seems to work quite well. We can use a different probability level, of course, and get a plot

1234567891011

plot(base$area,base$rent_euro,xlab=expression(paste("surface (",m^2,")")),     ylab="rent (euros/month)",col=rgb(0,0,1,.4),cex=.5)sf=0:250yr=r$solution[2*n+1]+r$solution[2*n+2]*sflines(sf,yr,lwd=2,col="blue")tau = .9r = lp("min",c,A,const_type,b)tail(r$solution,2)[1]121.8155057.865536yr=r$solution[2*n+1]+r$solution[2*n+2]*sflines(sf,yr,lwd=2,col="blue")

And we can adapt the later to multiple regressions, of course,

1234567891011

X =cbind(1,base$area,base$yearc)K =ncol(X)N =nrow(X)A =cbind(X,-X,diag(N),-diag(N))c=c(rep(0,2*ncol(X)),tau*rep(1,N),(1-tau)*rep(1,N))b = base$rent_euroconst_type =rep("=",N)r = lp("min",c,A,const_type,b)beta= r$sol[1:K]-  r$sol[(1:K+K)]beta[1]-5542.5032523.9781352.887234

to be compared with

12345678

library(quantreg)rq(rent_euro~ area + yearc, tau=tau, data=base) Coefficients:(Intercept)         area        yearc -5542.5032523.9781352.887234 Degrees of freedom:4571 total;4568 residual

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Quantile Regression (home made, part 2)

Median

Quantile

Quantile Regression

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112