Quantcast
Channel: R-bloggers
Viewing all 12075 articles
Browse latest View live

R Shiny {golem} – Designing the UI – Part 1 – Development to Production

$
0
0

[This article was first published on Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Welcome to the first in a series of blog posts surrounding R’s {golem} package by Colin Fay. In case you missed it, we laid out what these posts will cover in last week’s post.

One main step of our development process is to think about how we want the app to look structurally and aesthetically. This is an important step since it directly influences the user’s experience. Before sketching out the app, we must have a better understanding of the data that is available to us. In our scenario with The Office, we have data regarding: advertising, revenue, ratings, and the script.

Let’s take a look at these datasets:

Script

image1.png

Each row in the script dataset contains information surrounding the season, episode, script text, the character speaking, and whether the line was deleted or not. This dataset is great for a text analysis. We can likely disregard the “deleted” column and work on breaking down our script analysis by seasons and episodes.

Ratings

image2.png

The rows in this dataset provide us with a season number, title, air date, rating, number of votes, description, director(s), and writer(s) for each episode. However, we notice that the episode’s number is not included. We will add an episode number identifier later on. Similar to the script data, we can gain insight for ratings across episodes, writers, and directors.

Revenue and Advertising

image3.png

For the purpose of this example, we simulated data to represent revenue and advertising. The simulated revenue and advertising values start on the first episode air date, 3/24/05, and end on the last episode air date. The advertising sectors included are: Google, Facebook, Instagram, and display.

Based on specifications decided by the client, we agreed to the following layout for the Shiny dashboard:

image3.png

From this glimpse of the app’s structure, we see that the sales analysis contains an overview page and a breakdown by seasons. We’re able to analyze ratings further by looking into episodes, characters, writers, and directors. The dashboard also includes a script analysis. Additionally, we can begin to think about possible plots and charts to incorporate.. Perhaps a plot of ratings across episodes for each season, a table of average seasonal ratings, or a pie chart of advertising expenditures.

In thinking ahead, we’ll stay organized by creating modules for each of these tabs to help keep it organized.

In our next post we will initialize our project in R and discuss the files/tools that come with golem. Until next time..

“This is a dream that I’ve had…since lunch…and I’m not giving it up now.” – Michael Scott

SCRIPT_DATA <- readr::read_csv('The Office - Script Data.csv')head(SCRIPT_DATA)IMDB_DATA <- readr::read_csv('The Office - IMDB Data.csv')head(IMDB_DATA)REVENUE_ADV <-  readr::read_csv('revenue_and_adv.csv')head(REVENUE_ADV)

Data sources used:

https://data.world/abhinavr8/the-office-scripts-dataset

https://www.kaggle.com/kapastor/the-office-imdb-ratings-per-episode

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post R Shiny {golem} - Designing the UI - Part 1 - Development to Production first appeared on R-bloggers.


Lilliefors, Kolmogorov-Smirnov and cross-validation

$
0
0

[This article was first published on R-english – Freakonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In statistics, Kolmogorov–Smirnov test is a popular procedure to test, from a sample \(\{x_1,\cdots,x_n\}\) is drawn from a distribution \(F\), or usually \(F_{\theta_0}\), where \(F_{\theta}\) is some parametric distribution. For instance, we can test \(H_0:X_i\sim\mathcal{N(0,1)}\) (where \(\theta_0=(\mu_0,\sigma_0^2)=(0,1)\)) using that test. More specifically, I wanted to discuss today \(p\)-values. Given \(n\) let us draw \(\mathcal{N}(0,1)\) samples of size \(n\), and compute the \(p\)-values of Kolmogorov–Smirnov tests

n=300p = rep(NA,1e5)for(s in 1:1e5){X = rnorm(n,0,1)p[s] = ks.test(X,"pnorm",0,1)$p.value}

We can visualise the distribution of the \(p\)-values below (I added some Beta distribution fit here)

library(fitdistrplus)fit.dist = fitdist(p,"beta")hist(p,probability = TRUE,main="",xlab="",ylab="")vu = seq(0,1,by=.01)vv = dbeta(vu,shape1 = fit.dist$estimate[1], shape2 = fit.dist$estimate[2])lines(vu,vv,col="dark red", lwd=2)

It looks like it is quite uniform (theoretically, the \(p\)-value is uniform). More specifically, the \(p\)-value was lower than 5% in 5% of the samples

[note: here I compute‘mean(p<=.05)’ but I have some trouble with the ‘<‘ and ‘>’ symbols, as always]

mean(p<=.05)[1] 0.0479

i.e. we wrongly reject \(H_0:X_i\sim\mathcal{N(0,1)}\) is 5% of the samples.

As discussed previously on the blog, in many cases, we do care about the distribution, and not really the parameters, so we wish to test something like \(H_0:X_i\sim\mathcal{N(\mu,\sigma^2)}\), for some \(\mu\) and \(\sigma^2\). Therefore, a natural idea can be to test \(H_0:X_i\sim\mathcal{N(\hat\mu,\hat\sigma^2)}\), for some estimates of \(\mu\) and \(\sigma^2\). That’s the idea of Lilliefors test. More specifically, Lilliefors test suggests to use , Kolmogorov–Smirnov statistics, but corrects the \(p\)-value. Indeed, if we draw many samples, and use Kolmogorov–Smirnov statistics and its classical \(p\)-value to test for \(H_0:X_i\sim\mathcal{N(\hat\mu,\hat\sigma^2)}\),

n=300p = rep(NA,1e5)for(s in 1:1e5){X = rnorm(n,0,1)p[s] = ks.test(X,"pnorm",mean(X),sd(X))$p.value}

we see clearly that the distribution of \(p\)-values is no longer uniform

fit.dist = fitdist(p,"beta")hist(p,probability = TRUE,main="",xlab="",ylab="")vu = seq(0,1,by=.01)vv = dbeta(vu,shape1 = fit.dist$estimate[1], shape2 = fit.dist$estimate[2])lines(vu,vv,col="dark red", lwd=2)

More specifically, if \(x_i\)‘s are actually drawn from some Gaussian distribution, there are no chance to reject \(H_0\), the \(p\)-value being almost never below 5%

mean(p<=.05)[1] 0.00012

Usually, to interpret that result, the heuristics is that \(\hat\mu\) and \(\hat\sigma^2\) are both based on the sample, while previously \(0\) and \(1\) where based on some prior knowledge. Somehow, it reminded me on the classical problem when mention when we introduce cross-validation, which is Goodhart’s law

When a measure becomes a target, it ceases to be a good measure

i.e. we cannot assess goodness of fit using the same data as the ones used to estimate parameters. So here, why not use some hold-out (or cross-validation) procedure : split the dataset in two parts, \(\{x_1,\cdots,x_k\}\) (with \(kKolmogorov–Smirnov statistics on it to test if [latex]x_i\)‘s are drawn from some Gaussian distribution. More precisely, will the \(p\)-value computed using the standard Kolmogorov–Smirnov procedure be ok here. Here, I tried two scenarios, \(k/n\) being either \(1/3\) or \(2/3\),

p = matrix(NA,1e5,4)for(s in 1:1e5){X = rnorm(n,0,1)p[s,1] = ks.test(X,"pnorm",0,1)$p.valuep[s,2] = ks.test(X,"pnorm",mean(X),sd(X))$p.valuep[s,3] = ks.test(X[1:200],"pnorm",mean(X[201:300]),sd(X[201:300]))$p.valuep[s,4] = ks.test(X[201:300],"pnorm",mean(X[1:200]),sd(X[1:200]))$p.value}

Again, we can visualize the distributions of \(p\)-values,  in the case where \(1/3\) of the data is used to estimate \(\mu\) and \(\sigma^2\), and \(2/3\) of the data is used to test

fit.dist = fitdist(p[,3],"beta")hist(p[,3],probability = TRUE,main="",xlab="",ylab="")vu=seq(0,1,by=.01)vv=dbeta(vu,shape1 = fit.dist$estimate[1], shape2 = fit.dist$estimate[2])lines(vu,vv,col="dark red", lwd=2)

and in the case where \(2/3\) of the data is used to estimate \(\mu\) and \(\sigma^2\), and \(1/3\) of the data is used to test

fit.dist = fitdist(p[,4],"beta")hist(p[,4],probability = TRUE,main="",xlab="",ylab="")vu=seq(0,1,by=.01)vv=dbeta(vu,shape1 = fit.dist$estimate[1], shape2 = fit.dist$estimate[2])lines(vu,vv,col="dark red", lwd=2)

Observe here that we (wrongly) reject too frequently \(H_0\), since the \(p\)-values are  below 5% in 25% of the scenarios, in the first case (less data used to estimate), and 9% of the scenarios, in the second case (less data used to test)

mean(p[,3]<=.05)[1] 0.24168mean(p[,4]<=.05)[1] 0.09334

We can actually compute that probability as a function of \(k/n\)

n=300p = matrix(NA,1e4,99)for(s in 1:1e4){  X = rnorm(n,0,1)  KS = function(p) ks.test(X[1:(p*n)],"pnorm",mean(X[(p*n+1):n]),sd(X[(p*n+1):n]))$p.value  p[s,] = Vectorize(KS)((1:99)/100)}

The evolution of the probability is the following

prob5pc = apply(p,2,function(x) mean(x<=.05))plot((1:99)/100,prob5pc)

so, it looks like we can use some sort of hold-out procedure to test for \(H_0:X_i\sim\mathcal{N(\mu,\sigma^2)}\), for some \(\mu\) and \(\sigma^2\), using Kolmogorov–Smirnov test with \(\mu=\hat\mu\) and \(\sigma^2=\hat\sigma^2\) but the proportion of data used to estimate those quantities should be (much) larger that the one used to compute the statistics. Otherwise, we clearly reject too frequently \(\H_0\).

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-english – Freakonometrics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Lilliefors, Kolmogorov-Smirnov and cross-validation first appeared on R-bloggers.

Exploring the game “First Orchard” with simulation in R

$
0
0

[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

My daughter received the board game First Orchard as a Christmas present and she’s hooked on it so far. In playing the game with her, a few probability/statistics questions came to mind. This post outlines how I answered some of them using simulation in R. All code for this blog post can be found here.

(In my googling I found that Matt Lane has an excellent blog post on this game, answering some of the questions that I was interested in.)

Gameplay

Before we get to the questions, let me give a quick explanation of how the game works (see Reference 1 for a more colorful explanation as well as an applet to play the game online).

  • It’s a cooperative game, with all players playing against the board.
  • The game starts with 16 fruit: 4 of each color (red, green, blue, yellow), and a raven at one end of a path that is 5 steps long.
  • On each player’s turn, the player rolls a 6-sided die.
    • If the die comes up red, green, blue or yellow, the player gets to “harvest” a fruit of that color if there are any left to harvest. If all 4 fruits of that color have been harvested, nothing happens.
    • If the die shows a fruit basket, the player gets to harvest a fruit of any color.
    • If the die shows a raven, the raven moves one step along the path.
  • The game ends when either all the fruit are harvested (players win) or when the raven reaches the end of the path (raven wins).

As you can see this is a really simple game (hence it’s “suitable for 2+ years” rating). The only strategy is in choosing what fruit to take if a fruit basket shows up: see Reference 1 for some simulations for different strategies. Intuitively it seems like choosing the color with the most fruit remaining is the best strategy, and that’s what I bake into my code. (Is there a proof for this, and is this still true in more general circumstances described in Reference 1?)

Code for simulating the game

The state of the game can be captured in a numeric vector of length 5. The first 4 numbers refer to the number of fruit left for each color, and the 5th number keeps track of the number of steps the raven has taken so far. I created 3 functions to simulate one game of First Orchard (see full code here):

  • SimulateTurn(state, verbose) takes one dice roll and updates the state of the game. For simplicity, if a 1-4 is rolled, a fruit is harvested from that corresponding tree. If 5 is rolled, the raven takes a step. A rolled 6 is taken to mean “fruit basket”, and I remove a fruit from the tree with the most remaining fruits.
  • CheckGameState(state, max_raven_steps) checks if the game has ended or not, and if so, who won.
  • SimulateGame(fruit_count, max_raven_steps, verbose) runs an entire game of First Orchard: while the game has not ended, run SimulateTurn. Once the game has ended, this function returns (i) who won, (ii) the number of turns taken, (iii) the number of steps the raven took, and (iv) the number of fruit left.

We allow for two game parameters to be defined by the user: the number of fruit of each type at the start of the game (fruit_count, default is 4) and the number of steps the raven must take in order for it to win (max_raven_steps, default is 5). The verbose option for these functions so that the user can see what happened in the game. The code below is an example of the output from SimulateGame:

set.seed(1)results <- SimulateGame(fruit_count = 2, max_raven_steps = 3,                         verbose = TRUE)# Roll: 1 , State: 1,2,2,2,0# Roll: 4 , State: 1,2,2,1,0# Roll: 1 , State: 0,2,2,1,0# Roll: 2 , State: 0,1,2,1,0# Roll: 5 , State: 0,1,2,1,1# Roll: 3 , State: 0,1,1,1,1# Roll: 6 , State: 0,0,1,1,1# Roll: 2 , State: 0,0,1,1,1# Roll: 3 , State: 0,0,0,1,1# Roll: 3 , State: 0,0,0,1,1# Roll: 1 , State: 0,0,0,1,1# Roll: 5 , State: 0,0,0,1,2# Roll: 5 , State: 0,0,0,1,3# Raven wins# # of turns: 13# # of steps raven took: 3# # fruit left: 1

Simulation time!

What is the probability of winning the game? How does that change as we vary (i) the number of fruit of each color, and (ii) the number of steps the raven must take in order for the players to lose?

We let the number of fruit of each color vary from 1 to 8, and the number of steps the raven must take from 1 to 8. For each parameter setting, we simulate the game 10,000 times and compute the player win probability. We plot the results as a heatmap.

As one might expect, win probability goes down as the number of fruit increases and as the number of steps the raven must take decreases. For the original game (4 fruit, 5 steps), the win probability is approximately 62%. Sounds reasonable: we would like a win probability >50% so that kids will not get discouraged by too much losing, but not so high that they think the game is trivial.

For the original game (4 fruit, 5 steps), what is the expected number of steps until the game ends? Does this change depending on whether the player or the raven wins?

We simulate the game 100,000 times and keep track of the number of steps taken in each game. The shortest game took 5 steps while the longest took 45 steps, with the modal number of steps being 21 (it also happens to be the mean and median). Here is the histogram for all 100,000 runs:

Here are the histograms split by the outcome:

Games where the raven wins tend to be shorter than those when players win. Maybe that’s not too surprising, since a game where the raven wins needs just 5 steps, while a game where the players win needs at least 16 steps. On average, the game takes 19 steps for raven wins and 22 steps for player wins.

For the original game (4 fruit, 5 steps), given that the raven loses, what is distribution of the number of steps the raven has taken?

Because we programmed SimulateGame to return the number of steps the raven has taken as well, we don’t have to rerun the simulations: we can just use the 100,000 simulations we ran previously and look at the ones that the raven lost. Here is the histogram of steps the raven took in losing games, with the vertical red line representing the mean:

For the original game (4 fruit, 5 steps), given that the raven wins, what is distribution of the number of unharvested fruit?

Again, we can just use the 100,000 simulations we ran previously and look at the ones that the raven won. Here is the histogram along with the mean and median marked out with vertical lines:

The modal number of fruit left in player-losing games is 1: ugh tantalizingly close!

If there was no raven, how many turns would it take to harvest all the fruit?

“No raven” is the same as saying that the raven needs to take an infinite number of steps in order to win. Hence, we can use our existing simulation code with max_raven_steps = Inf to simulate this setting.

The shortest game took 16 turns while the longest game took 63 turns, with 22 and 24 turns being the modal and mean number of turns respectively. (In theory, a game could go on forever.) Here is the histogram:

References:

  1. Matt Lane. (2018). Harvesting Wins.
var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Exploring the game “First Orchard” with simulation in R first appeared on R-bloggers.

Trades Jan. 4, 2021

$
0
0

[This article was first published on R and Trading, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

 Added to my base stock position, very small. +1.7 %

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R and Trading.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Trades Jan. 4, 2021 first appeared on R-bloggers.

Professional Financial Reports with RMarkdown

$
0
0

[This article was first published on R on technistema, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By: Brad LindbladLinkedIn | Github | Blog | Twitter

I recently gave a lightning talk at the Financial Industry R Meetup on how you can use RMarkdown to create extremely professional reports using RMarkdown and a slew of other popular R tools.

You can re-watch the talk at this link, use password: 9np$u.=p

You can also read the directions and overview at the Github repo.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R on technistema.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Professional Financial Reports with RMarkdown first appeared on R-bloggers.

Custom Google Analytics Dashboards with R: Building The Dashboard

$
0
0

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Photo of dashboard

Photo by Alan Biglow on Unsplash

img.screenshot { border: 0.5px solid #888; padding: 5px; background-color: #eee;}

Back in November, I took readers step by step through the somewhat long process of authenticating and downloading Google Analytics web site data into R. This post will be much simpler; I’m going to walk you through creating a dashboard showing blog post popularity using the flexdashboard package.

Before we go there, however, I want to re-emphasize a correction that we made to the original credentials post. Mark Edmonston, the author of the terrific googleAnalyticsR package, has created a new version of his package that eliminates the need for OAUTH credentials when running on a server. Once that update is available on CRAN, I’ll update this post to document the simpler process of only submitting service account credentials. In the meantime, though, we’ll continue using both OAUTH and service account credentials.

Where to Find The Code and Data

All the code and data presented in this post is in a GitHub repository at https://github.com/rstudio/a-flexdashboard-for-google-analytics in the Part2 folder. The code from Part 1 of this blog series is also available in the Part1 folder; however, users should be aware that they’ll need to provide their own authentication secrets for that code to work. My previous article, Custom Google Analytics Dashboards with R: Downloading Data, provides detailed instructions for how to obtain those credentials.

To make it easy for readers to reproduce this dashboard, I’ve constructed a synthetic set of Google Analytics data named clickbait_GA_data.csv for a hypothetical blog at the address clickbait.com. At the time of this writing, that domain was currently for sale and therefore shouldn’t be confused with any real blog. While the synthetic traffic comes from the Google Analytics log from an actual blog, the titles and URLs of all the articles are made up (although I wish I could find out the 3 Ways That Birds Are Confused About Bacon). The dataset contains more than 32,000 visits and 105,000 page views conducted over one month.

Creating Our Dashboard

So let’s begin building our dashboard. To do this, we’re going to open a new flexdashboard R file. We do that by selecting File > New File > R Markdown…. as shown below.

We next select From Template > Flex Dashboard.

That selection yields a new file which looks like this:

If you knit that file, you end up with this output in your Preview window.

The preconfigured template has provided us with window panes in which to put our Google Analytics graphs and tables. We simply have to fill them in!

Our process for building our Google Analytics (GA) dashboard will go like this:

  1. Read in the Google Analytics data in the setup chunk of our document.
  2. Use dplyr and ggplot2 to create a graph of pageviews by day for Chart A.
  3. Build a table of the top 10 most popular titles in Chart B using the reactable package.
  4. Delete the R Markdown code for Chart C.

So let’s build this dashboard.

Reading in the Data

We begin our dashboard by reading in the data from Google Analytics. In our last post, we built code to authenticate and read in the GA data using the Google Analytics API. In a production dashboard, we would put that code in the setup section here.

However, because we have our synthetic data in a .csv file, reading in the data will be a much simpler process. We will simply load the libraries we intend to use, apply the read_csv function from the readr package to our dataset, and put all of this in the setup chunk of our R Markdown file as shown below. I’ve shown the first few lines of the output to provide a sense of what that content looks like.

library(flexdashboard)library(readr)library(ggplot2)library(dplyr)library(reactable)gadata <- read_csv("./data/clickbait_GA_data.csv")show(gadata %>% head(7))## # A tibble: 7 x 5##   date       pageviews users pageTitle            landingPagePath               ##                                                       ## 1 2020-12-01         2     2 3 Ways That Turtles… www.clickbait.com/2011/02/28/…## 2 2020-12-01         2     1 3 Ways That Turtles… www.clickbait.com/2011/02/28/…## 3 2020-12-01         3     3 Shocking Finding: W… www.clickbait.com/2012/06/04/…## 4 2020-12-01         1     1 Unexpected Research… www.clickbait.com/2012/11/29/…## 5 2020-12-01        11    10 Unexpected Research… www.clickbait.com/2013/06/10/…## 6 2020-12-01         1     1 3 Ways That Europea… www.clickbait.com/2013/10/22/…## 7 2020-12-01         2     2 Why Monkeys Deal wi… www.clickbait.com/2014/01/17/…

Plotting Blog Traffic by Day

With the GA data in a tibble, we can use dplyr to group and sum the page views by day and then plot the data over time with ggplot2. This code will go in the R chunk under the heading Chart A.

theme_set(theme_minimal())gadata_by_day <- gadata %>%   group_by(date) %>%   summarize(pagesums = sum(pageviews))g <- ggplot(gadata_by_day, aes(x = date, y = pagesums)) +  geom_point(color = "blue") +  geom_line(color = "blue") +  scale_x_date() +  labs(x = "", y = "", title = "")show(g)

Building a Table of the Most Popular Results

We’d also like to present a table of the most popular blog posts on our blog. We could do this with a variety of packages such as kable or DT, but for this example, we’ll use the reactable package. Reactable gives users interactive features such as the ability to search and sort the table. All this is done using client-side Javascript, which makes the table interactive without requiring server involvement.

We can compute and display the most popular blog posts by inserting this code into the chunk under Chart B. We added arguments to change the column names, specify the widths of the columns, and permit scrolling, searching, and striping just to make it prettier. Those could have been omitted if we weren’t fussy about the formatting.

gadata_most_popular <- gadata %>%   count(pageTitle, wt = pageviews, sort=TRUE) %>%   head(10)## For those who aren't as comfortable with the options in count, the following## code would also work# gadata_most_popular <- gadata %>% #   group_by(pageTitle) %>% #   summarize(n = sum(pageviews)) %>% #   arrange(desc(n))reactable(gadata_most_popular,           columns = list(pageTitle     = colDef(name = "Title",                                            align = "left",                                             maxWidth = 250),                          n             = colDef(name = "Page Views",                                             maxWidth = 100)),            pagination = FALSE,            searchable = TRUE,            striped = TRUE)
{"x":{"tag":{"name":"Reactable","attribs":{"data":{"pageTitle":["Amazing Ways That Elephants Embrace Squirrels","Unexpected Research Results: Pandas Don't Comprehend Puppies","3 Ways That Birds Avoid Friends","New Discovery: Americans Experience Friends","3 Ways That Birds Are Confused About Bacon","Unexpected Research Results: Birds Like Birthdays","New Discovery: Monkeys Observe Their Past","13 Ways That Koalas Observe Carbs","Discover How Dogs Can't Get Enough of Carbs","Learn How Cats Embrace Kittens"],"n":[16412,8888,6015,4858,3751,2823,2741,2452,2270,2220]},"columns":[{"accessor":"pageTitle","name":"Title","type":"character","maxWidth":250,"align":"left"},{"accessor":"n","name":"Page Views","type":"numeric","maxWidth":100}],"searchable":true,"defaultPageSize":10,"paginationType":"numbers","showPageInfo":true,"minRows":1,"striped":true,"dataKey":"e21fde16a6509bc62084d4fb648b3f06","key":"e21fde16a6509bc62084d4fb648b3f06"},"children":[]},"class":"reactR_markup"},"evals":[],"jsHooks":[]}

The Final Result

Finally, we change the heading of our R Markdown code to have a meaningful title, rename the headings from Chart A and Chart B to something more reasonable, delete the heading and chunk for Chart C, and add some explanatory text about what our dashboard is about. Our finished dashboard R Markdown code should look like the code in dashboard1.Rmd

When we knit the results, we see this:

If we have access to an RStudio Connect server, we can publish this dashboard to that server by clicking the Publish button at the top right of the Viewer window. On the RStudio Connect server, we can schedule the dashboard to regularly download and analyze the Google Analytics data and allow others to interact with it. We can literally go from a desktop R Markdown document to a dashboard running in production for others to see in just a few clicks.

Conclusions

This post shows how:

  1. A little R Markdown code can create a Google Analytics dashboard. Overall, the process of creating this dashboard is not really any more difficult than creating a report in R Markdown. The flexdashboard framework uses the same headings and code chunk structure as a regular R Markdown document. This means that we don’t have to learn a new language to build our dashboard.
  2. Flexdashboard allows us to exploit other tools we already know. The R Markdown template for flexdashboard provides visual containers into which we can drop code that uses other packages that we know such as ggplot2, dplyr, and reactable. Again, we don’t have to learn new and unfamiliar tools to create our dashboard.
  3. We can publish our dashboard and add new features incrementally. For organizations with an RStudio Connect server, we can put our dashboard into scheduled production with only a few clicks. Any time we wish to add another insight or plot to our dashboard, we simply change the R Markdown document on our desktop and republish the result.

However, while we’ve successfully created a simple Google Analytics dashboard, we haven’t tackled the question that kicked off this series of blog posts, namely:

Which of your blog articles received the most views in the first 15 days they were posted?

That’s the question we’ll tackle in part 3 of this series, where we’ll derive the dates of publication for our blog posts and create a dashboard that ranks blog posts on the basis of a 15-day window of visitors. This approach will ensure that we don’t favor older blog posts that have just had more time to gather views.

For More Information

If you would like to learn more about some of the packages and products we’ve used, we recommend:

  • flexdashboard: Easy interactive dashboards for R, a web site that gives a broad overview of the many capabilities of the flexdashboard package.
  • R Markdown, RStudio’s web site that describes the many ways you can use R Markdown to create reports, slides, web sites, and more.
  • RStudio Connect, RStudio’s publishing platform for R and Python, which provides push-button publishing from the RStudio IDE, scheduled execution of reports, and a host of other production capabilities.
var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Custom Google Analytics Dashboards with R: Building The Dashboard first appeared on R-bloggers.

IMDb datasets: 3 centuries of movie rankings visualized

$
0
0

[This article was first published on novyden, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Question

I am a sucker for IMDb ratings so don’t judge me. They are my priors before watching almost anything on a screen (home screen that is). But between movies (feature films), TV movies and TV (mini) series IMDb ratings are highly inconsistent. For example, series The Boys has rating 8.7 and so does movie Goodfellas by Martin Scorsese. Does it make sense The Boys ranked as high as #16 rated movie title in the whole IMDb database (among those with at least 25,000 user votes)? Or, in other words, if and how much apples vs. oranges those ratings are?

 

Rating Distributions

To start I downloaded IMDb datasets (here). Let’s show distributions of title ratings depending on the types: movie (i.e. feature film), TV movie, TV mini series, and TV series between fiction and documentaries:

Title ratings drift towards higher values depending on their types (shown on the right): movie, TV movie, TV mini series, and TV series. So indeed ratings of movies and TV series come from different distributions representing different things like apples and oranges. But how much different they are? (we will focus on fiction titles only from this point on.)

 

Percentiles

If a title has all time best rating then no doubt it’s worth giving a try (let’s say among titles with at least 1000 votes – number of votes is rather important consideration but we let it slide here and may come back to votes later). Why? Because 100% of other titles are rated below or at best the same and that indicates exceptional qualities. In statistics such rating has a name: 100th percentile. Following the same logic 99th percentile represents rating above 99% of all titles in the database (again, don’t forget about minimum threshold for number of votes to be considered).

Based on above we can assign IMDb titles to groups based on the highest percentile they belong to: 99% percentile suggests that the title is very best, 95% – excellent, 90% – very good, 75% – good, 50% – average, and 25% – bad. Feel free to assign and name percentiles differently in your analysis but we stick with this convention for this post. Last piece of the puzzle is taking percentiles not across whole IMDb set but rather for each title type separately and compare them:

Going back to our example, 8.7 in TV Series places The Boys firmly in “Excellent” (95th percentile), while Goodfellas at 8.7 sits at the top of “Very Best” (99th percentile) in movies – noticeable difference between the two.

The difference becomes even more meaningful when looking at the lower tiers “Very Good” (90th percentile) and below: while rating of 7.6 suffices for a movie (e.g. Love Actually) to place in “Very Good”, a TV series must achieve rating of 8.4 to qualify for the same 90th percentile. In fact, a TV Series with 7.6 rating (like Grey’s Anatomy) places just above “Average” 50th percentile. Furthermore, the rating of 8 would place a movie firmly in top 5% while the same 8 for a TV series barely cracks top 25%.

 

Percentiles Extra

Comparing and analyzing ratings between title types can be helped by organizing and visualizing the same percentile data in a few different ways:

  • Overlapping bar charts by title types:

  • Line chart by title types:

  • Line chart by percentiles:

What About Documentaries?

The title percentiles above excluded documentaries. To be able to compare ratings between fiction and documentary titles the following visual computes and dissects rating percentiles between fiction and documentaries by title types:

For whatever reason IMDb users rate documentaries more generously than their fiction counterparts across all title types.

 

Historical Perspective Mixed with Film Trivia

The oldest film on IMDb is Passage de Venus made in 1874, is ranked 6.9 with 1282 votes (as of January 2020), and is filed under title type short and genre Documentary. In chronological order it is followed by 2 titles in 1878 (short animation Le singe musicien and short documentary Sallie Gardner at a Gallop), 1 in 1881 (short documentary Athlete Swinging a Pick), 1 in 1883 (short documentary Buffalo Running), and 1 in 1885 (short animation L’homme machine). Starting with 1887 that cranked up 45 titles total there are no more gap years, but such production feast will be surpassed only 1894 with 97 titles. First movie title (and only that year) Reproduction of the Corbett and Fitzsimmons Fight was filmed in 1897 under Documentary, News, and Sport genres. Lastly, first year when total number of titles exceeded its year numerical value is 1952 with 2059 shorts, movies, etc. under the belt. Did I just say last? One more factoid if you excuse me: movie production in 2020 (35,109 titles total) dropped us exactly 10 years back when 35,062 titles were produced in 2010, while the absolute record belongs to 2017 with 51231 films total.

What about visualizing film production over time?

 

Final Thoughts

IMDb dataset turned out to be richer and deeper than I expected and I just scratched the surface. There is plenty to play with – genres, runtimes, adult movies (yes, probably for compliance IMDb flags each title as adult or not), and, of course, ratings. IMDb uses adjusted (weighted) rating formula (based on averages and number of user votes) in their rankings (see Weighted Average Ratings) so the title averageRating we looked at can’t be taken at the face value after all.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: novyden.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post IMDb datasets: 3 centuries of movie rankings visualized first appeared on R-bloggers.

Emil Hvitfeldt – palette2vec – A new way to explore color paletttes

$
0
0

[This article was first published on Why R? Foundation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Three months ago we finished Why R? 2020 conference. One of the memorable moments of the conference were invited highlighted talks! Today we would like to remind you about the talk by Emil Hvitfeldt (from University of Southern California). The video from the recording is at the end of the post.

There are many palettes available in various R packages. Having a way to explore all of these palettes are already found within the https://github.com/EmilHvitfeldt/r-color-palettes repository and the paletteer package.

This talk shows what happens when we take one step further into explorability. Using handcrafted color features, dimensionality reduction, and interactive tools will we create and explore a color palette embedding. In this embedded space will we interactively be able to cluster palettes, find neighboring palettes, and even generate new palettes in a whole new way.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Why R? Foundation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Emil Hvitfeldt - palette2vec - A new way to explore color paletttes first appeared on R-bloggers.


COVID-19 Data: The Long Run

$
0
0

[This article was first published on R Views, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The world seems to have moved to a new phase of paying attention to COVID-19. We have gone from pondering daily plots of case counts, to puzzling through models and forecasts, and are now moving on to the vaccines and the science behind them. For data scientists, however, the focus needs to remain on the data and the myriad issues and challenges that efforts to collect and curate COVID data have uncovered. My intuition is that not only will COVID-19 data continue to be important for quite some time in the future, but that efforts to improve the quality of this data will be crucial for successfully dealing with the next pandemic.

An incredible amount of work has been done by epidemiologists, universities, government agencies and data journalists to collect, organize, and reconcile data from thousands of sources. Nevertheless, the experts caution that there is much yet to be done.

Roni Rosenfeld, head of the Machine Learning Department of the School of Computer Science at Carnegie Mellon University and project lead for the Delphi Group put it this way in a recent COPSS-NISS webinar:

Data is a big problem in this pandemic. Availability of high quality, comprehensive, geographically detailed data is very far from where it should be.

There are over 6,000 hospitals in the United States, and over 160,000 hospitals worldwide. Many of these are collecting COVID-19 data yet there few standards for recording cases, dealing with missing data, updating case count data, and coping with the time lag between recording and reporting cases. Nowcasting epidemiological and heath care data has become a vital field of statistical research.

The following slide from the COPSS-NISS webinar shows a hierarchy of relevant COVID data organized on the Severity Pyramid that epidemiologists use to study disease progression.

The Delphi Group is making fundamental contributions to the long term improvement of COVID data by archiving the data shown in such a way that versions can be retrieved by date, and also by collecting massive data sets of leading indicators.

The webinar is well worth watching, and I highly recommend listening through the Q&A session at the end. The speakers explain the importance of nowcasting and Professor Rosenfeld presents a vision of making epidemic forecasting comparable to weather forecasting. It seems to me that this would be a worthwhile project to help advance.

Note that the Delphi’s COVID-19 indicators, probably the nation’s largest public repository of diverse, geographically-detailed, real-time indicators of COVID activity in the US, are freely available through the public API which is easily accessible to R and Python users.

Also note that R users can contribute to R Consortium sponsored COVID related projects that include the COVID-19 Data Hub an organized archive of global COVID-19 case count data, and the RECON COVID-19 Challenge, an open source project to improve epidemiological tools.

_____='https://rviews.rstudio.com/2021/01/06/covid-19-data-the-long-run/'; var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R Views.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post COVID-19 Data: The Long Run first appeared on R-bloggers.

Quantify the Covid19 Impact on the SFO Airport Passenger Air Traffic

$
0
0

[This article was first published on Rami Krispin, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Covid19 pandemic had, and unfortunately still have, a significant impact on most of the major industries. While for some sectors, the impact was positive (such as online retails, internet and steaming providers, etc.), it was negative for others (such as transportation, tourism, entertainment, etc.). In both cases, we can leverage time series modeling to quantify the effect of the Covid19 on the sector.

One simplistic approach for quantifying the impact of the pandemic (whether it is positive or negative) would include the following steps:

  • Split the series into pre-covid and post-covid
  • Train a time series model with the pre-covid data. This would enable us to simulate the value of the series if there was no pandemic
  • Using the trained model to forecast the horizon of the post-covid series
  • Use the difference between the forecast and actual (post-covid series) to quantify the Covid19 impact on the series

To demonstrate this approach, I will use the sfo_passengers dataset from the sfo package. The sfo_passengers dataset provides monthly statistics about the San Francisco International Airport (SFO) air traffic between July 2005 and September 2020. More details about the dataset available in the following post and vignette.

For this analysis, we will use the following packages:

  • sfo– the passenger air traffic data
  • dplyr– data prep
  • plotly– data visualization
  • TSstudio– time series analysis and forecasting
library(sfo)
library(dplyr)
library(plotly)
library(TSstudio)

Data

As mentioned above, the sfo_passengers dataset provides monthly statistics about the air passenger traffic at SFO airport since 2005. That includes monthly information about the number of passengers by different categories such as operating airline, region, terminal, etc. Let’s load the data:

data("sfo_passengers")

str(sfo_passengers)
## 'data.frame':    22576 obs. of  12 variables:
##  $ activity_period            : int  202009 202009 202009 202009 202009 202009 202009 202009 202009 202009 ...
##  $ operating_airline          : chr  "United Airlines" "United Airlines" "United Airlines" "United Airlines" ...
##  $ operating_airline_iata_code: chr  "UA" "UA" "UA" "UA" ...
##  $ published_airline          : chr  "United Airlines" "United Airlines" "United Airlines" "United Airlines" ...
##  $ published_airline_iata_code: chr  "UA" "UA" "UA" "UA" ...
##  $ geo_summary                : chr  "International" "International" "International" "International" ...
##  $ geo_region                 : chr  "Mexico" "Mexico" "Mexico" "Mexico" ...
##  $ activity_type_code         : chr  "Enplaned" "Enplaned" "Enplaned" "Deplaned" ...
##  $ price_category_code        : chr  "Other" "Other" "Other" "Other" ...
##  $ terminal                   : chr  "Terminal 3" "Terminal 3" "International" "International" ...
##  $ boarding_area              : chr  "F" "E" "G" "G" ...
##  $ passenger_count            : int  6712 396 376 6817 3851 3700 71 83 65 45 ...

Before we will convert the data into time series format, we will transform the activity_period into a Date format:

df <- sfo_passengers %>%
  mutate(date = as.Date(paste(substr(sfo_passengers$activity_period, 1,4), 
                              substr(sfo_passengers$activity_period, 5,6), 
                              "01", sep ="/"))) 

Next, we will transform the dataset into a time series format by grouping the passenger by the date variable:

df <- df %>%
  group_by(date) %>%
  summarise(y = sum(passenger_count), .groups = "drop")

head(df)  
## # A tibble: 6 x 2
##   date             y
##          
## 1 2005-07-01 3225769
## 2 2005-08-01 3195866
## 3 2005-09-01 2740553
## 4 2005-10-01 2770715
## 5 2005-11-01 2617333
## 6 2005-12-01 2671797

Now, we have a monthly time series:

plot_ly(data = df,
        x = ~ date,
        y = ~ y,
        type = "scatter", 
        mode = "line",
        name = "Total Passengers") %>%
  add_segments(x = as.Date("2020-02-01"), 
               xend = as.Date("2020-02-01"),
               y = min(df$y),
               yend = max(df$y) * 1.05,
               line = list(color = "black", dash = "dash"),
               showlegend = FALSE) %>%
  add_annotations(text = "Pre-Covid19",
                  x = as.Date("2018-09-01"),
                  y = max(df$y) * 1.05, 
                  showarrow = FALSE) %>%
  add_annotations(text = "Post-Covid19",
                  x = as.Date("2021-08-01"),
                  y = max(df$y) * 1.05, 
                  showarrow = FALSE) %>%
  layout(title = "Total Number of Air Passengers - SFO Airport",
         yaxis = list(title = "Number of Passengers"),
         xaxis = list(title = "Source: San Francisco Open Data Portal"))

{"x":{"visdat":{"4a4120f364d7":["function () ","plotlyVisDat"]},"cur_data":"4a4120f364d7","attrs":{"4a4120f364d7":{"x":{},"y":{},"mode":"line","name":"Total Passengers","alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"scatter"},"4a4120f364d7.1":{"x":"2020-02-01","y":138817,"mode":"lines","name":"Total Passengers","alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"scatter","xend":"2020-02-01","yend":6029558.85,"line":{"color":"black","dash":"dash"},"showlegend":false,"inherit":true}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"annotations":[{"text":"Pre-Covid19","x":"2018-09-01","y":6029558.85,"showarrow":false},{"text":"Post-Covid19","x":"2021-08-01","y":6029558.85,"showarrow":false}],"title":"Total Number of Air Passengers - SFO Airport","yaxis":{"domain":[0,1],"automargin":true,"title":"Number of Passengers"},"xaxis":{"domain":[0,1],"automargin":true,"title":"Source: San Francisco Open Data Portal"},"hovermode":"closest","showlegend":false},"source":"A","config":{"showSendToCloud":false},"data":[{"x":["2005-07-01","2005-08-01","2005-09-01","2005-10-01","2005-11-01","2005-12-01","2006-01-01","2006-02-01","2006-03-01","2006-04-01","2006-05-01","2006-06-01","2006-07-01","2006-08-01","2006-09-01","2006-10-01","2006-11-01","2006-12-01","2007-01-01","2007-02-01","2007-03-01","2007-04-01","2007-05-01","2007-06-01","2007-07-01","2007-08-01","2007-09-01","2007-10-01","2007-11-01","2007-12-01","2008-01-01","2008-02-01","2008-03-01","2008-04-01","2008-05-01","2008-06-01","2008-07-01","2008-08-01","2008-09-01","2008-10-01","2008-11-01","2008-12-01","2009-01-01","2009-02-01","2009-03-01","2009-04-01","2009-05-01","2009-06-01","2009-07-01","2009-08-01","2009-09-01","2009-10-01","2009-11-01","2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01","2011-01-01","2011-02-01","2011-03-01","2011-04-01","2011-05-01","2011-06-01","2011-07-01","2011-08-01","2011-09-01","2011-10-01","2011-11-01","2011-12-01","2012-01-01","2012-02-01","2012-03-01","2012-04-01","2012-05-01","2012-06-01","2012-07-01","2012-08-01","2012-09-01","2012-10-01","2012-11-01","2012-12-01","2013-01-01","2013-02-01","2013-03-01","2013-04-01","2013-05-01","2013-06-01","2013-07-01","2013-08-01","2013-09-01","2013-10-01","2013-11-01","2013-12-01","2014-01-01","2014-02-01","2014-03-01","2014-04-01","2014-05-01","2014-06-01","2014-07-01","2014-08-01","2014-09-01","2014-10-01","2014-11-01","2014-12-01","2015-01-01","2015-02-01","2015-03-01","2015-04-01","2015-05-01","2015-06-01","2015-07-01","2015-08-01","2015-09-01","2015-10-01","2015-11-01","2015-12-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01","2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01","2020-01-01","2020-02-01","2020-03-01","2020-04-01","2020-05-01","2020-06-01","2020-07-01","2020-08-01","2020-09-01"],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224,1885466,138817,286570,555119,765274,852578,905992],"mode":"line","name":"Total Passengers","type":"scatter","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"line":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["2020-02-01","2020-02-01"],"y":[138817,6029558.85],"mode":"lines","name":"Total Passengers","type":"scatter","line":{"color":"black","dash":"dash"},"showlegend":false,"marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]} As can be observed in the plot above, the Covid19 effect is well pronounced since March 2020. To quantify the effect of the pandemic on the number of passengers, we will split the series into the following two series:

  • Pre-Covid19 series- all observations prior to March 2020
  • Post-Covid19 series- all observations post (include) to March 2020

We will use the Pre-Covid19 series to train a time series model. That will enable us to forecast the number of passengers as if there was no pandemic.

pre_covid <- df %>% 
  dplyr::filter(date < as.Date("2020-03-01")) %>%
  dplyr::arrange(date)

post_covid <- df %>% 
  dplyr::filter(date >= as.Date("2020-03-01")) %>%
  dplyr::arrange(date)

We will use the pre_covid series to train a time series model. That will enable us to forecast the number of passengers as there was no pandemic. Once we forecast the corresponding observations of the post_covid series with the pre_covid data, we could quantify the impact of the Covid19 on the total number of passengers.

Analyzing the data

Before we forecast the series, let’s run a quick exploratory analysis on the series to identify its main characteristics. We will use the TSstudio package to visualize the pre_covid series. Note that the package does not support, yet, the tsibble object. Therefore, we will convert the series into a ts object first:

ts.obj <- ts(pre_covid$y, start = c(2005, 7), frequency = 12)

The series before the outbreak of the pandemic:

ts_plot(ts.obj,
        title = "Total Number of Air Passengers - SFO Airport",
        Ytitle = "Number of Passengers",
        slider = TRUE)
{"x":{"visdat":{"4a4161c88445":["function () ","plotlyVisDat"]},"cur_data":"4a4161c88445","attrs":{"4a4161c88445":{"x":{},"y":{},"mode":"lines","line":{"width":2,"dash":null,"color":"#00526d"},"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"scatter"}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"xaxis":{"domain":[0,1],"automargin":true,"rangeslider":{"type":"date"},"title":"","showgrid":false},"yaxis":{"domain":[0,1],"automargin":true,"title":"Number of Passengers","showgrid":false},"title":"Total Number of Air Passengers - SFO Airport","hovermode":"closest","showlegend":false},"source":"A","config":{"showSendToCloud":false},"data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"#00526d","width":2,"dash":[]},"type":"scatter","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

Like most series that describes monthly air passenger traffic, the series has a strong monthly seasonal pattern and a positive trend. You can also notice that the seasonal component’s oscillation has become larger since 2017 (compared to previous years). Let’s use the ts_seasonal function to create a seasonal plot of the series:

ts_seasonal(ts.obj = ts.obj, type = "all")
{"x":{"data":[{"x":["Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3225769,3195866,2740553,2770715,2617333,2671797],"type":"scatter","mode":"lines","name":"2005","line":{"color":"#440154FF"},"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200],"type":"scatter","mode":"lines","name":"2006","line":{"color":"#481A6CFF"},"marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637],"type":"scatter","mode":"lines","name":"2007","line":{"color":"#472F7DFF"},"marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937],"type":"scatter","mode":"lines","name":"2008","line":{"color":"#414487FF"},"marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209],"type":"scatter","mode":"lines","name":"2009","line":{"color":"#39568CFF"},"marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124],"type":"scatter","mode":"lines","name":"2010","line":{"color":"#31688EFF"},"marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693],"type":"scatter","mode":"lines","name":"2011","line":{"color":"#2A788EFF"},"marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039],"type":"scatter","mode":"lines","name":"2012","line":{"color":"#23888EFF"},"marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984],"type":"scatter","mode":"lines","name":"2013","line":{"color":"#1F988BFF"},"marker":{"color":"rgba(188,189,34,1)","line":{"color":"rgba(188,189,34,1)"}},"error_y":{"color":"rgba(188,189,34,1)"},"error_x":{"color":"rgba(188,189,34,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835],"type":"scatter","mode":"lines","name":"2014","line":{"color":"#22A884FF"},"marker":{"color":"rgba(23,190,207,1)","line":{"color":"rgba(23,190,207,1)"}},"error_y":{"color":"rgba(23,190,207,1)"},"error_x":{"color":"rgba(23,190,207,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052],"type":"scatter","mode":"lines","name":"2015","line":{"color":"#35B779FF"},"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369],"type":"scatter","mode":"lines","name":"2016","line":{"color":"#54C568FF"},"marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504],"type":"scatter","mode":"lines","name":"2017","line":{"color":"#7AD151FF"},"marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449],"type":"scatter","mode":"lines","name":"2018","line":{"color":"#A5DB36FF"},"marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992],"type":"scatter","mode":"lines","name":"2019","line":{"color":"#D2E21BFF"},"marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["Jan","Feb"],"y":[4241751,3742224],"type":"scatter","mode":"lines","name":"2020","line":{"color":"#FDE725FF"},"marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[2448889,2507430,2670053,2644539,2785466,2883810,3211600,3204637,3432625,3550084,3748529,3897685,4190367,4156821,4241751],"type":"scatter","mode":"lines","name":"Jan","line":{"color":"#E41A1C"},"showlegend":true,"legendgroup":"all_Jan","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[2223024,2304990,2595676,2359800,2515361,2610667,2998119,2966477,3078405,3248144,3543639,3481405,3882181,3752763,3742224],"type":"scatter","mode":"lines","name":"Feb","line":{"color":"#66628D"},"showlegend":true,"legendgroup":"all_Feb","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2708778,2820085,3127387,2925918,3105958,3129205,3472440,3593364,3765504,4001521,4137679,4335287,4674035,4599189],"type":"scatter","mode":"lines","name":"Mar","line":{"color":"#419486"},"showlegend":true,"legendgroup":"all_Mar","marker":{"color":"rgba(188,189,34,1)","line":{"color":"rgba(188,189,34,1)"}},"error_y":{"color":"rgba(188,189,34,1)"},"error_x":{"color":"rgba(188,189,34,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2773293,2869247,3029021,3024973,3139059,3200527,3563007,3604104,3881893,4021677,4172512,4425920,4713183,4692941],"type":"scatter","mode":"lines","name":"Apr","line":{"color":"#5A9D5A"},"showlegend":true,"legendgroup":"all_Apr","marker":{"color":"rgba(23,190,207,1)","line":{"color":"rgba(23,190,207,1)"}},"error_y":{"color":"rgba(23,190,207,1)"},"error_x":{"color":"rgba(23,190,207,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2829000,3056934,3305954,3177100,3380355,3547804,3820570,3933016,4147096,4361140,4573996,4698067,5025595,5008001],"type":"scatter","mode":"lines","name":"May","line":{"color":"#91569A"},"showlegend":true,"legendgroup":"all_May","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[3071396,3263621,3453751,3419595,3612886,3766323,4107195,4146797,4321833,4558511,4922125,5134110,5427144,5466688],"type":"scatter","mode":"lines","name":"Jun","line":{"color":"#D96D3B"},"showlegend":true,"legendgroup":"all_Jun","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[3225769,3227605,3382382,3603946,3649702,3765824,3935589,4284443,4176486,4499221,4801148,5168724,5496516,5692572,5612312],"type":"scatter","mode":"lines","name":"Jul","line":{"color":"#FFAD12"},"showlegend":true,"legendgroup":"all_Jul","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[3195866,3143839,3436417,3612297,3650668,3771842,3917884,4356216,4347059,4524918,4796653,5110638,5516837,5545859,5742437],"type":"scatter","mode":"lines","name":"Aug","line":{"color":"#F6EF32"},"showlegend":true,"legendgroup":"all_Aug","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2740553,2720100,2957530,3004720,3191526,3356365,3564970,3819379,3781168,3919072,4201394,4543759,4736005,4649100,4471408],"type":"scatter","mode":"lines","name":"Sep","line":{"color":"#B6742A"},"showlegend":true,"legendgroup":"all_Sep","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2770715,2834959,3129309,3124451,3249428,3490100,3602455,3844987,3910790,4059443,4374749,4571997,4868674,4861782,4824559],"type":"scatter","mode":"lines","name":"Oct","line":{"color":"#D26D7A"},"showlegend":true,"legendgroup":"all_Oct","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2617333,2653887,2922500,2744485,2971484,3163659,3326859,3478890,3466878,3628786,4013814,4266481,4572702,4508606,4370463],"type":"scatter","mode":"lines","name":"Nov","line":{"color":"#DD87B4"},"showlegend":true,"legendgroup":"all_Nov","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019],"y":[2671797,2698200,2903637,2962937,3074209,3167124,3441693,3443039,3814984,3855835,4129052,4343369,4660504,4576449,4720992],"type":"scatter","mode":"lines","name":"Dec","line":{"color":"#999999"},"showlegend":true,"legendgroup":"all_Dec","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x2","yaxis":"y2","frame":null},{"fillcolor":"rgba(228,26,28, 0.5)","y":[2448889,2507430,2670053,2644539,2785466,2883810,3211600,3204637,3432625,3550084,3748529,3897685,4190367,4156821,4241751],"type":"box","line":{"color":"#E41A1C"},"marker":{"color":"#E41A1C","line":{"color":"rgba(188,189,34,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Jan","showlegend":false,"legendgroup":"all_Jan","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(102,98,141, 0.5)","y":[2223024,2304990,2595676,2359800,2515361,2610667,2998119,2966477,3078405,3248144,3543639,3481405,3882181,3752763,3742224],"type":"box","line":{"color":"#66628D"},"marker":{"color":"#66628D","line":{"color":"rgba(23,190,207,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Feb","showlegend":false,"legendgroup":"all_Feb","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(65,148,134, 0.5)","y":[2708778,2820085,3127387,2925918,3105958,3129205,3472440,3593364,3765504,4001521,4137679,4335287,4674035,4599189],"type":"box","line":{"color":"#419486"},"marker":{"color":"#419486","line":{"color":"rgba(31,119,180,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Mar","showlegend":false,"legendgroup":"all_Mar","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(90,157,90, 0.5)","y":[2773293,2869247,3029021,3024973,3139059,3200527,3563007,3604104,3881893,4021677,4172512,4425920,4713183,4692941],"type":"box","line":{"color":"#5A9D5A"},"marker":{"color":"#5A9D5A","line":{"color":"rgba(255,127,14,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Apr","showlegend":false,"legendgroup":"all_Apr","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(145,86,154, 0.5)","y":[2829000,3056934,3305954,3177100,3380355,3547804,3820570,3933016,4147096,4361140,4573996,4698067,5025595,5008001],"type":"box","line":{"color":"#91569A"},"marker":{"color":"#91569A","line":{"color":"rgba(44,160,44,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"May","showlegend":false,"legendgroup":"all_May","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(217,109,59, 0.5)","y":[3071396,3263621,3453751,3419595,3612886,3766323,4107195,4146797,4321833,4558511,4922125,5134110,5427144,5466688],"type":"box","line":{"color":"#D96D3B"},"marker":{"color":"#D96D3B","line":{"color":"rgba(214,39,40,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Jun","showlegend":false,"legendgroup":"all_Jun","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(255,173,18, 0.5)","y":[3225769,3227605,3382382,3603946,3649702,3765824,3935589,4284443,4176486,4499221,4801148,5168724,5496516,5692572,5612312],"type":"box","line":{"color":"#FFAD12"},"marker":{"color":"#FFAD12","line":{"color":"rgba(148,103,189,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Jul","showlegend":false,"legendgroup":"all_Jul","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(246,239,50, 0.5)","y":[3195866,3143839,3436417,3612297,3650668,3771842,3917884,4356216,4347059,4524918,4796653,5110638,5516837,5545859,5742437],"type":"box","line":{"color":"#F6EF32"},"marker":{"color":"#F6EF32","line":{"color":"rgba(140,86,75,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Aug","showlegend":false,"legendgroup":"all_Aug","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(182,116,42, 0.5)","y":[2740553,2720100,2957530,3004720,3191526,3356365,3564970,3819379,3781168,3919072,4201394,4543759,4736005,4649100,4471408],"type":"box","line":{"color":"#B6742A"},"marker":{"color":"#B6742A","line":{"color":"rgba(227,119,194,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Sep","showlegend":false,"legendgroup":"all_Sep","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(210,109,122, 0.5)","y":[2770715,2834959,3129309,3124451,3249428,3490100,3602455,3844987,3910790,4059443,4374749,4571997,4868674,4861782,4824559],"type":"box","line":{"color":"#D26D7A"},"marker":{"color":"#D26D7A","line":{"color":"rgba(127,127,127,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Oct","showlegend":false,"legendgroup":"all_Oct","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(221,135,180, 0.5)","y":[2617333,2653887,2922500,2744485,2971484,3163659,3326859,3478890,3466878,3628786,4013814,4266481,4572702,4508606,4370463],"type":"box","line":{"color":"#DD87B4"},"marker":{"color":"#DD87B4","line":{"color":"rgba(188,189,34,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Nov","showlegend":false,"legendgroup":"all_Nov","xaxis":"x3","yaxis":"y3","frame":null},{"fillcolor":"rgba(153,153,153, 0.5)","y":[2671797,2698200,2903637,2962937,3074209,3167124,3441693,3443039,3814984,3855835,4129052,4343369,4660504,4576449,4720992],"type":"box","line":{"color":"#999999"},"marker":{"color":"#999999","line":{"color":"rgba(23,190,207,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Dec","showlegend":false,"legendgroup":"all_Dec","xaxis":"x3","yaxis":"y3","frame":null}],"layout":{"xaxis":{"domain":[0,1],"automargin":true,"type":"category","categoryorder":"array","categoryarray":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"anchor":"y"},"xaxis2":{"domain":[0,1],"automargin":true,"anchor":"y2"},"xaxis3":{"domain":[0,1],"automargin":true,"anchor":"y3"},"yaxis3":{"domain":[0,0.313333333333333],"automargin":true,"title":"By Frequency Unit","anchor":"x3"},"yaxis2":{"domain":[0.353333333333333,0.646666666666667],"automargin":true,"title":"By Frequency Unit","anchor":"x2"},"yaxis":{"domain":[0.686666666666667,1],"automargin":true,"title":"By Frequency Cycle","anchor":"x"},"annotations":[],"shapes":[],"images":[],"margin":{"b":40,"l":60,"t":25,"r":10},"hovermode":"closest","showlegend":true,"title":"Seasonality Plot - ts.obj"},"attrs":{"4a416a46dfc1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3225769,3195866,2740553,2770715,2617333,2671797],"type":"scatter","mode":"lines","name":2005,"line":{"color":"#440154FF"},"inherit":true},"4a416a46dfc1.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200],"type":"scatter","mode":"lines","name":2006,"line":{"color":"#481A6CFF"},"inherit":true},"4a416a46dfc1.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637],"type":"scatter","mode":"lines","name":2007,"line":{"color":"#472F7DFF"},"inherit":true},"4a416a46dfc1.3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937],"type":"scatter","mode":"lines","name":2008,"line":{"color":"#414487FF"},"inherit":true},"4a416a46dfc1.4":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209],"type":"scatter","mode":"lines","name":2009,"line":{"color":"#39568CFF"},"inherit":true},"4a416a46dfc1.5":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124],"type":"scatter","mode":"lines","name":2010,"line":{"color":"#31688EFF"},"inherit":true},"4a416a46dfc1.6":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693],"type":"scatter","mode":"lines","name":2011,"line":{"color":"#2A788EFF"},"inherit":true},"4a416a46dfc1.7":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039],"type":"scatter","mode":"lines","name":2012,"line":{"color":"#23888EFF"},"inherit":true},"4a416a46dfc1.8":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984],"type":"scatter","mode":"lines","name":2013,"line":{"color":"#1F988BFF"},"inherit":true},"4a416a46dfc1.9":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835],"type":"scatter","mode":"lines","name":2014,"line":{"color":"#22A884FF"},"inherit":true},"4a416a46dfc1.10":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052],"type":"scatter","mode":"lines","name":2015,"line":{"color":"#35B779FF"},"inherit":true},"4a416a46dfc1.11":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369],"type":"scatter","mode":"lines","name":2016,"line":{"color":"#54C568FF"},"inherit":true},"4a416a46dfc1.12":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504],"type":"scatter","mode":"lines","name":2017,"line":{"color":"#7AD151FF"},"inherit":true},"4a416a46dfc1.13":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449],"type":"scatter","mode":"lines","name":2018,"line":{"color":"#A5DB36FF"},"inherit":true},"4a416a46dfc1.14":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"],"y":[4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992],"type":"scatter","mode":"lines","name":2019,"line":{"color":"#D2E21BFF"},"inherit":true},"4a416a46dfc1.15":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["Jan","Feb"],"y":[4241751,3742224],"type":"scatter","mode":"lines","name":2020,"line":{"color":"#FDE725FF"},"inherit":true},"4a41530c2e69":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[null,2448889,2507430,2670053,2644539,2785466,2883810,3211600,3204637,3432625,3550084,3748529,3897685,4190367,4156821,4241751],"type":"scatter","mode":"lines","name":"Jan","line":{"color":"#E41A1C"},"showlegend":true,"legendgroup":"all_Jan","inherit":true},"4a41530c2e69.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[null,2223024,2304990,2595676,2359800,2515361,2610667,2998119,2966477,3078405,3248144,3543639,3481405,3882181,3752763,3742224],"type":"scatter","mode":"lines","name":"Feb","line":{"color":"#66628D"},"showlegend":true,"legendgroup":"all_Feb","inherit":true},"4a41530c2e69.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[null,2708778,2820085,3127387,2925918,3105958,3129205,3472440,3593364,3765504,4001521,4137679,4335287,4674035,4599189,null],"type":"scatter","mode":"lines","name":"Mar","line":{"color":"#419486"},"showlegend":true,"legendgroup":"all_Mar","inherit":true},"4a41530c2e69.3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[null,2773293,2869247,3029021,3024973,3139059,3200527,3563007,3604104,3881893,4021677,4172512,4425920,4713183,4692941,null],"type":"scatter","mode":"lines","name":"Apr","line":{"color":"#5A9D5A"},"showlegend":true,"legendgroup":"all_Apr","inherit":true},"4a41530c2e69.4":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[null,2829000,3056934,3305954,3177100,3380355,3547804,3820570,3933016,4147096,4361140,4573996,4698067,5025595,5008001,null],"type":"scatter","mode":"lines","name":"May","line":{"color":"#91569A"},"showlegend":true,"legendgroup":"all_May","inherit":true},"4a41530c2e69.5":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[null,3071396,3263621,3453751,3419595,3612886,3766323,4107195,4146797,4321833,4558511,4922125,5134110,5427144,5466688,null],"type":"scatter","mode":"lines","name":"Jun","line":{"color":"#D96D3B"},"showlegend":true,"legendgroup":"all_Jun","inherit":true},"4a41530c2e69.6":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[3225769,3227605,3382382,3603946,3649702,3765824,3935589,4284443,4176486,4499221,4801148,5168724,5496516,5692572,5612312,null],"type":"scatter","mode":"lines","name":"Jul","line":{"color":"#FFAD12"},"showlegend":true,"legendgroup":"all_Jul","inherit":true},"4a41530c2e69.7":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[3195866,3143839,3436417,3612297,3650668,3771842,3917884,4356216,4347059,4524918,4796653,5110638,5516837,5545859,5742437,null],"type":"scatter","mode":"lines","name":"Aug","line":{"color":"#F6EF32"},"showlegend":true,"legendgroup":"all_Aug","inherit":true},"4a41530c2e69.8":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[2740553,2720100,2957530,3004720,3191526,3356365,3564970,3819379,3781168,3919072,4201394,4543759,4736005,4649100,4471408,null],"type":"scatter","mode":"lines","name":"Sep","line":{"color":"#B6742A"},"showlegend":true,"legendgroup":"all_Sep","inherit":true},"4a41530c2e69.9":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[2770715,2834959,3129309,3124451,3249428,3490100,3602455,3844987,3910790,4059443,4374749,4571997,4868674,4861782,4824559,null],"type":"scatter","mode":"lines","name":"Oct","line":{"color":"#D26D7A"},"showlegend":true,"legendgroup":"all_Oct","inherit":true},"4a41530c2e69.10":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[2617333,2653887,2922500,2744485,2971484,3163659,3326859,3478890,3466878,3628786,4013814,4266481,4572702,4508606,4370463,null],"type":"scatter","mode":"lines","name":"Nov","line":{"color":"#DD87B4"},"showlegend":true,"legendgroup":"all_Nov","inherit":true},"4a41530c2e69.11":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[2005,2006,2007,2008,2009,2010,2011,2012,2013,2014,2015,2016,2017,2018,2019,2020],"y":[2671797,2698200,2903637,2962937,3074209,3167124,3441693,3443039,3814984,3855835,4129052,4343369,4660504,4576449,4720992,null],"type":"scatter","mode":"lines","name":"Dec","line":{"color":"#999999"},"showlegend":true,"legendgroup":"all_Dec","inherit":true},"4a4157adff41":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(228,26,28, 0.5)","line":{"color":"#E41A1C"},"marker":{"color":"#E41A1C"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Jan","showlegend":false,"legendgroup":"all_Jan","inherit":true},"4a416051315f":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(102,98,141, 0.5)","line":{"color":"#66628D"},"marker":{"color":"#66628D"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Feb","showlegend":false,"legendgroup":"all_Feb","inherit":true},"4a417280855f":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(65,148,134, 0.5)","line":{"color":"#419486"},"marker":{"color":"#419486"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Mar","showlegend":false,"legendgroup":"all_Mar","inherit":true},"4a4153b45ab3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(90,157,90, 0.5)","line":{"color":"#5A9D5A"},"marker":{"color":"#5A9D5A"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Apr","showlegend":false,"legendgroup":"all_Apr","inherit":true},"4a4165aec8b3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(145,86,154, 0.5)","line":{"color":"#91569A"},"marker":{"color":"#91569A"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"May","showlegend":false,"legendgroup":"all_May","inherit":true},"4a4135fa93ec":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(217,109,59, 0.5)","line":{"color":"#D96D3B"},"marker":{"color":"#D96D3B"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Jun","showlegend":false,"legendgroup":"all_Jun","inherit":true},"4a41560586a3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(255,173,18, 0.5)","line":{"color":"#FFAD12"},"marker":{"color":"#FFAD12"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Jul","showlegend":false,"legendgroup":"all_Jul","inherit":true},"4a414ca6374":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(246,239,50, 0.5)","line":{"color":"#F6EF32"},"marker":{"color":"#F6EF32"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Aug","showlegend":false,"legendgroup":"all_Aug","inherit":true},"4a413475721":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(182,116,42, 0.5)","line":{"color":"#B6742A"},"marker":{"color":"#B6742A"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Sep","showlegend":false,"legendgroup":"all_Sep","inherit":true},"4a4140a93935":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(210,109,122, 0.5)","line":{"color":"#D26D7A"},"marker":{"color":"#D26D7A"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Oct","showlegend":false,"legendgroup":"all_Oct","inherit":true},"4a4125eae7bd":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(221,135,180, 0.5)","line":{"color":"#DD87B4"},"marker":{"color":"#DD87B4"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Nov","showlegend":false,"legendgroup":"all_Nov","inherit":true},"4a4161143cbd":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":{},"type":"box","fillcolor":"rgba(153,153,153, 0.5)","line":{"color":"#999999"},"marker":{"color":"#999999"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"name":"Dec","showlegend":false,"legendgroup":"all_Dec","inherit":true}},"source":"A","config":{"showSendToCloud":false},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"subplot":true,"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

As can see in the seasonal plot, the monthly seasonal effect kept overtime while the series continue to grow from year to year.

Similarly, we can review the series correlation with its past lags using the ACF and PACF functions:

ts_cor(ts.obj = ts.obj, lag.max = 36)
{"x":{"data":[{"x":[0],"y":[1],"type":"bar","marker":{"color":"black","line":{"color":"rgba(31,119,180,1)"}},"width":0.1,"name":"Lag-Zero","legendgroup":"lagzero","showlegend":false,"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[12,24,36],"y":[0.839347399697626,0.660323916895977,0.48855329302762],"type":"bar","marker":{"color":"red","line":{"color":"rgba(255,127,14,1)"}},"width":[0.1,0.1,0.1],"legendgroup":"seasonal","name":"Seasonal Lag 12","error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,21,22,23,25,26,27,28,29,30,31,32,33,34,35],"y":[0.907496237374919,0.83279466817422,0.72899860992135,0.628991972602187,0.557775689620089,0.478097206339324,0.524533020351198,0.561410747060952,0.636175747113173,0.711612098956667,0.767144331688843,0.756572968150344,0.684142127180947,0.585808374488201,0.488893522507546,0.421788386720394,0.346932119869205,0.386100855978608,0.417432459757581,0.483630255840517,0.550859875380697,0.598132089077132,0.582594510064595,0.515651673299807,0.424388577864718,0.333383599362387,0.270821502576945,0.201195085265781,0.234882297220404,0.261748862287295,0.322908598111998,0.384301917637048,0.429380244463142],"type":"bar","marker":{"color":"#00526d","line":{"color":"rgba(44,160,44,1)"}},"width":[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],"legendgroup":"nonseasonal","name":"Non-Seasonal","error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0,36],"y":[0.147737844075666,0.147737844075666],"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Upper Bound","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[0,36],"y":[-0.147737844075666,-0.147737844075666],"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Lower Bound","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[12,24,36],"y":[0.198371325342963,-0.0342135566914267,-0.00265718241345955],"type":"bar","marker":{"color":"red","line":{"color":"rgba(140,86,75,1)"}},"width":[0.1,0.1,0.1],"legendgroup":"seasonal","showlegend":false,"name":"Seasonal Lag 12","error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y2","frame":null},{"x":[1,2,3,4,5,6,7,8,9,10,11,13,14,15,16,17,18,19,20,21,22,23,25,26,27,28,29,30,31,32,33,34,35],"y":[0.907496237374919,0.0523956757132866,-0.197252939766858,-0.0670290579728831,0.133686327879302,-0.0687062421231785,0.645875684033285,0.100877113148706,0.14174742275876,0.116805743812194,0.0883951432396718,-0.667366543972704,-0.143553755731278,0.199435603974801,-0.0277880183418477,0.165985426348997,0.0975199325359795,-0.152565245501793,0.0116715582546935,-0.0681558124431932,-0.0237257624354226,0.0360816371190263,-0.161893186870977,0.107651174933879,0.0366482305978391,-0.0205000274326863,0.0295837348284957,-0.0590034066411768,-0.100347428644784,0.0553542610999415,0.013524073906336,0.0012492344982644,0.0811762499762344],"type":"bar","marker":{"color":"#00526d","line":{"color":"rgba(227,119,194,1)"}},"width":[0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1,0.1],"legendgroup":"nonseasonal","showlegend":false,"name":"Non-Seasonal","error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y2","frame":null},{"x":[1,36],"y":[0.147737844075666,0.147737844075666],"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Upper Bound","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y2","frame":null},{"x":[1,36],"y":[-0.147737844075666,-0.147737844075666],"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Lower Bound","marker":{"color":"rgba(188,189,34,1)","line":{"color":"rgba(188,189,34,1)"}},"error_y":{"color":"rgba(188,189,34,1)"},"error_x":{"color":"rgba(188,189,34,1)"},"xaxis":"x","yaxis":"y2","frame":null}],"layout":{"xaxis":{"domain":[0,1],"automargin":true,"dtick":12,"title":"Lag","anchor":"y2"},"yaxis2":{"domain":[0,0.48],"automargin":true,"title":"PACF","anchor":"x"},"yaxis":{"domain":[0.52,1],"automargin":true,"title":"ACF","anchor":"x"},"annotations":[],"shapes":[],"images":[],"margin":{"b":40,"l":60,"t":25,"r":10},"hovermode":"compare","showlegend":true,"title":"ts.obj ACF and PACF Plots"},"attrs":{"4a411c93020e":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36],"y":[1,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null,null],"type":"bar","marker":{"color":"black"},"width":0.1,"name":"Lag-Zero","legendgroup":"lagzero","showlegend":false,"inherit":true},"4a411c93020e.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36],"y":[null,null,null,null,null,null,null,null,null,null,null,null,0.839347399697626,null,null,null,null,null,null,null,null,null,null,null,0.660323916895977,null,null,null,null,null,null,null,null,null,null,null,0.48855329302762],"type":"bar","marker":{"color":"red"},"width":0.1,"legendgroup":"seasonal","name":"Seasonal Lag 12","inherit":true},"4a411c93020e.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36],"y":[null,0.907496237374919,0.83279466817422,0.72899860992135,0.628991972602187,0.557775689620089,0.478097206339324,0.524533020351198,0.561410747060952,0.636175747113173,0.711612098956667,0.767144331688843,null,0.756572968150344,0.684142127180947,0.585808374488201,0.488893522507546,0.421788386720394,0.346932119869205,0.386100855978608,0.417432459757581,0.483630255840517,0.550859875380697,0.598132089077132,null,0.582594510064595,0.515651673299807,0.424388577864718,0.333383599362387,0.270821502576945,0.201195085265781,0.234882297220404,0.261748862287295,0.322908598111998,0.384301917637048,0.429380244463142,null],"type":"bar","marker":{"color":"#00526d"},"width":0.1,"legendgroup":"nonseasonal","name":"Non-Seasonal","inherit":true},"4a411c93020e.3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":0,"y":0.147737844075666,"xend":36,"yend":0.147737844075666,"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Upper Bound","inherit":true},"4a411c93020e.4":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":0,"y":-0.147737844075666,"xend":36,"yend":-0.147737844075666,"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Lower Bound","inherit":true},"4a41776bf3c9":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36],"y":[null,null,null,null,null,null,null,null,null,null,null,0.198371325342963,null,null,null,null,null,null,null,null,null,null,null,-0.0342135566914267,null,null,null,null,null,null,null,null,null,null,null,-0.00265718241345955],"type":"bar","marker":{"color":"red"},"width":0.1,"legendgroup":"seasonal","showlegend":false,"name":"Seasonal Lag 12","inherit":true},"4a41776bf3c9.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36],"y":[0.907496237374919,0.0523956757132866,-0.197252939766858,-0.0670290579728831,0.133686327879302,-0.0687062421231785,0.645875684033285,0.100877113148706,0.14174742275876,0.116805743812194,0.0883951432396718,null,-0.667366543972704,-0.143553755731278,0.199435603974801,-0.0277880183418477,0.165985426348997,0.0975199325359795,-0.152565245501793,0.0116715582546935,-0.0681558124431932,-0.0237257624354226,0.0360816371190263,null,-0.161893186870977,0.107651174933879,0.0366482305978391,-0.0205000274326863,0.0295837348284957,-0.0590034066411768,-0.100347428644784,0.0553542610999415,0.013524073906336,0.0012492344982644,0.0811762499762344,null],"type":"bar","marker":{"color":"#00526d"},"width":0.1,"legendgroup":"nonseasonal","showlegend":false,"name":"Non-Seasonal","inherit":true},"4a41776bf3c9.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":1,"y":0.147737844075666,"xend":36,"yend":0.147737844075666,"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Upper Bound","inherit":true},"4a41776bf3c9.3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":1,"y":-0.147737844075666,"xend":36,"yend":-0.147737844075666,"type":"scatter","mode":"lines","line":{"color":"green","dash":"dash"},"legendgroup":"ci","showlegend":false,"name":"CI Lower Bound","inherit":true}},"source":"A","config":{"showSendToCloud":false},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"subplot":true,"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

And as expected, we can see strong correlation between the series and the first and seasonal lags. We will leverage this information to select time-series models for seasonal data.

Forecast the Pre-Covid19 series

One of my favorite forecasting strategies is a combination of horse racing between different time series models and backtesting as a training approach. Backtesting is the time series equivalent of the machine learning cross-validation training approach. The idea here is simple - test each model with backtesting, and select the one that performed best, on average, on the different testing partition.

The train_model function from the TSstudio package enables us to apply this strategy seamlessly using models from the forecast and stats packages. For simplicity, we will use different flavors of ETS and Holt-Winters models and out-of-the-box auto.arima and tslm models. For the backtesting, we will split the series into 6 testing partitions, each 12 months spaced by 3 months from each other.

The methods argument defines the models to use and the train_method argument defines the setting of the backtesting. Can find more details about the function here.

methods <- list(ets1 = list(method = "ets",
                            method_arg = list(opt.crit = "lik"),
                            notes = "ETS model opt.crit=lik"),
                ets2 = list(method = "ets",
                            method_arg = list(opt.crit = "amse"),
                            notes = "ETS model opt.crit=amse"),
                ets3 = list(method = "ets",
                            method_arg = list(opt.crit = "mse"),
                            notes = "ETS model opt.crit=mse"),
                auto_arima = list(method = "auto.arima",
                              notes = "Auto ARIMA"),
                hw1 = list(method = "HoltWinters",
                          method_arg = NULL,
                          notes = "HoltWinters Model"),
                 hw2 = list(method = "HoltWinters",
                          method_arg = list(seasonal = "multiplicative"),
                          notes = "HW with multip. seasonality"),
                tslm = list(method = "tslm",
                            method_arg = list(formula = input ~ trend + season),
                            notes = "tslm with trend and seasonal"))


train_method = list(partitions = 6,
                    sample.out = 12,
                    space = 3)

After we defined the methods and train_method arguments we will use the train_model function to train the models. Note that the forecast horizon is set the the length of the post_covid series. In addition we will set the MAPA as the error metric to evaluate the performance of the different models on the testing partitions:

md <- train_model(input = ts.obj,
                  methods = methods,
                  train_method = train_method,
                  horizon = nrow(post_covid),
                  error = "MAPE")
## # A tibble: 7 x 7
##   model_id   model       notes                        avg_mape avg_rmse `avg_coverage_80%` `avg_coverage_95%`
##                                                                           
## 1 hw1        HoltWinters HoltWinters Model              0.0277  149046.              0.792              0.958
## 2 ets2       ets         ETS model opt.crit=amse        0.0284  161754.              0.792              0.972
## 3 hw2        HoltWinters HW with multip. seasonality    0.0300  171702.              0.528              0.847
## 4 ets1       ets         ETS model opt.crit=lik         0.0307  173337.              0.833              0.958
## 5 ets3       ets         ETS model opt.crit=mse         0.0311  174238.              0.861              0.972
## 6 auto_arima auto.arima  Auto ARIMA                     0.0334  184381.              0.597              0.889
## 7 tslm       tslm        tslm with trend and seasonal   0.0370  223194.              0.569              0.75

Based on the leaderboard table from the train_model function, the model that performed best on average on the different testing partitions is the Holt-Winters model (first version - hw1). The model achieved, on average, the lowest MAPE (2.76%) and RMSE (149046) compared to the other models which evaluated. In addition, the model achieved a close to perfect coverage of the model prediction intervals with an average coverage of 79.2% and 95.8% for the 80% and 95% prediction interval, respectively. We can review the error distribution across the different partitions for each model with the plot_error function:

plot_error(md)
{"x":{"data":[{"x":[1,2,3,4,5,6],"y":[2.81777692095792,3.27484547765425,3.742835843832,3.61726938108199,2.72150596118445,2.24976991154767],"type":"scatter","mode":"lines","name":"ets1","showlegend":true,"legendgroup":"ets1","line":{"color":"#E41A1C"},"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6],"y":[2.2985359047454,3.10429695143456,4.22424509847459,2.77926349475718,2.03900208327119,2.57204477335701],"type":"scatter","mode":"lines","name":"ets2","showlegend":true,"legendgroup":"ets2","line":{"color":"#3E8E93"},"marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6],"y":[3.80520796819263,3.25813601573966,2.81456690436634,3.56176836067291,2.77840144057573,2.45400628977534],"type":"scatter","mode":"lines","name":"ets3","showlegend":true,"legendgroup":"ets3","line":{"color":"#7E6E85"},"marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6],"y":[2.03715781020461,3.92755796516081,5.44897695988796,4.2675958573861,2.08360652766786,2.25672049776931],"type":"scatter","mode":"lines","name":"auto_arima","showlegend":true,"legendgroup":"auto_arima","line":{"color":"#FF7F00"},"marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6],"y":[2.91935384812561,2.26890213981866,2.64351262847368,4.02901912985384,2.79233704209503,1.96174102219649],"type":"scatter","mode":"lines","name":"hw1","showlegend":true,"legendgroup":"hw1","line":{"color":"#E1C62F"},"marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6],"y":[2.96994555310263,4.38436729162556,3.26970059180152,2.80044692946732,2.39146011780482,2.21123604633037],"type":"scatter","mode":"lines","name":"hw2","showlegend":true,"legendgroup":"hw2","line":{"color":"#DB728C"},"marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":[1,2,3,4,5,6],"y":[3.6620345405476,4.0077909501968,3.17806135786671,2.65643653846798,4.15382975908853,4.55945278162432],"type":"scatter","mode":"lines","name":"tslm","showlegend":true,"legendgroup":"tslm","line":{"color":"#999999"},"marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","frame":null},{"fillcolor":"rgba(228,26,28, 0.5)","y":[2.81777692095792,3.27484547765425,3.742835843832,3.61726938108199,2.72150596118445,2.24976991154767],"name":"ets1","type":"box","line":{"color":"#E41A1C"},"marker":{"color":"#E41A1C","line":{"color":"rgba(127,127,127,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"ets1","xaxis":"x2","yaxis":"y","frame":null},{"fillcolor":"rgba(62,142,147, 0.5)","y":[2.2985359047454,3.10429695143456,4.22424509847459,2.77926349475718,2.03900208327119,2.57204477335701],"name":"ets2","type":"box","line":{"color":"#3E8E93"},"marker":{"color":"#3E8E93","line":{"color":"rgba(188,189,34,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"ets2","xaxis":"x2","yaxis":"y","frame":null},{"fillcolor":"rgba(126,110,133, 0.5)","y":[3.80520796819263,3.25813601573966,2.81456690436634,3.56176836067291,2.77840144057573,2.45400628977534],"name":"ets3","type":"box","line":{"color":"#7E6E85"},"marker":{"color":"#7E6E85","line":{"color":"rgba(23,190,207,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"ets3","xaxis":"x2","yaxis":"y","frame":null},{"fillcolor":"rgba(255,127,0, 0.5)","y":[2.03715781020461,3.92755796516081,5.44897695988796,4.2675958573861,2.08360652766786,2.25672049776931],"name":"auto_arima","type":"box","line":{"color":"#FF7F00"},"marker":{"color":"#FF7F00","line":{"color":"rgba(31,119,180,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"auto_arima","xaxis":"x2","yaxis":"y","frame":null},{"fillcolor":"rgba(225,198,47, 0.5)","y":[2.91935384812561,2.26890213981866,2.64351262847368,4.02901912985384,2.79233704209503,1.96174102219649],"name":"hw1","type":"box","line":{"color":"#E1C62F"},"marker":{"color":"#E1C62F","line":{"color":"rgba(255,127,14,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"hw1","xaxis":"x2","yaxis":"y","frame":null},{"fillcolor":"rgba(219,114,140, 0.5)","y":[2.96994555310263,4.38436729162556,3.26970059180152,2.80044692946732,2.39146011780482,2.21123604633037],"name":"hw2","type":"box","line":{"color":"#DB728C"},"marker":{"color":"#DB728C","line":{"color":"rgba(44,160,44,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"hw2","xaxis":"x2","yaxis":"y","frame":null},{"fillcolor":"rgba(153,153,153, 0.5)","y":[3.6620345405476,4.0077909501968,3.17806135786671,2.65643653846798,4.15382975908853,4.55945278162432],"name":"tslm","type":"box","line":{"color":"#999999"},"marker":{"color":"#999999","line":{"color":"rgba(214,39,40,1)"}},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"tslm","xaxis":"x2","yaxis":"y","frame":null}],"layout":{"xaxis":{"domain":[0,0.48],"automargin":true,"anchor":"y"},"xaxis2":{"domain":[0.52,1],"automargin":true,"anchor":"y"},"yaxis":{"domain":[0,1],"automargin":true,"title":"MAPE","ticksuffix":"%","anchor":"x"},"annotations":[],"shapes":[],"images":[],"margin":{"b":40,"l":60,"t":25,"r":10},"hovermode":"closest","showlegend":true,"title":"Model Performance by Testing Partition - MAPE"},"attrs":{"4a4145063f71":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[2.81777692095792,3.27484547765425,3.742835843832,3.61726938108199,2.72150596118445,2.24976991154767],"type":"scatter","mode":"lines","name":"ets1","showlegend":true,"legendgroup":"ets1","line":{"color":"#E41A1C"},"inherit":true},"4a4145063f71.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[2.2985359047454,3.10429695143456,4.22424509847459,2.77926349475718,2.03900208327119,2.57204477335701],"type":"scatter","mode":"lines","name":"ets2","showlegend":true,"legendgroup":"ets2","line":{"color":"#3E8E93"},"inherit":true},"4a4145063f71.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[3.80520796819263,3.25813601573966,2.81456690436634,3.56176836067291,2.77840144057573,2.45400628977534],"type":"scatter","mode":"lines","name":"ets3","showlegend":true,"legendgroup":"ets3","line":{"color":"#7E6E85"},"inherit":true},"4a4145063f71.3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[2.03715781020461,3.92755796516081,5.44897695988796,4.2675958573861,2.08360652766786,2.25672049776931],"type":"scatter","mode":"lines","name":"auto_arima","showlegend":true,"legendgroup":"auto_arima","line":{"color":"#FF7F00"},"inherit":true},"4a4145063f71.4":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[2.91935384812561,2.26890213981866,2.64351262847368,4.02901912985384,2.79233704209503,1.96174102219649],"type":"scatter","mode":"lines","name":"hw1","showlegend":true,"legendgroup":"hw1","line":{"color":"#E1C62F"},"inherit":true},"4a4145063f71.5":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[2.96994555310263,4.38436729162556,3.26970059180152,2.80044692946732,2.39146011780482,2.21123604633037],"type":"scatter","mode":"lines","name":"hw2","showlegend":true,"legendgroup":"hw2","line":{"color":"#DB728C"},"inherit":true},"4a4145063f71.6":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":[1,2,3,4,5,6],"y":[3.6620345405476,4.0077909501968,3.17806135786671,2.65643653846798,4.15382975908853,4.55945278162432],"type":"scatter","mode":"lines","name":"tslm","showlegend":true,"legendgroup":"tslm","line":{"color":"#999999"},"inherit":true},"4a411d2f371e":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[2.81777692095792,3.27484547765425,3.742835843832,3.61726938108199,2.72150596118445,2.24976991154767],"name":"ets1","type":"box","fillcolor":"rgba(228,26,28, 0.5)","line":{"color":"#E41A1C"},"marker":{"color":"#E41A1C"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"ets1","inherit":true},"4a411d2f371e.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[2.2985359047454,3.10429695143456,4.22424509847459,2.77926349475718,2.03900208327119,2.57204477335701],"name":"ets2","type":"box","fillcolor":"rgba(62,142,147, 0.5)","line":{"color":"#3E8E93"},"marker":{"color":"#3E8E93"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"ets2","inherit":true},"4a411d2f371e.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[3.80520796819263,3.25813601573966,2.81456690436634,3.56176836067291,2.77840144057573,2.45400628977534],"name":"ets3","type":"box","fillcolor":"rgba(126,110,133, 0.5)","line":{"color":"#7E6E85"},"marker":{"color":"#7E6E85"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"ets3","inherit":true},"4a411d2f371e.3":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[2.03715781020461,3.92755796516081,5.44897695988796,4.2675958573861,2.08360652766786,2.25672049776931],"name":"auto_arima","type":"box","fillcolor":"rgba(255,127,0, 0.5)","line":{"color":"#FF7F00"},"marker":{"color":"#FF7F00"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"auto_arima","inherit":true},"4a411d2f371e.4":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[2.91935384812561,2.26890213981866,2.64351262847368,4.02901912985384,2.79233704209503,1.96174102219649],"name":"hw1","type":"box","fillcolor":"rgba(225,198,47, 0.5)","line":{"color":"#E1C62F"},"marker":{"color":"#E1C62F"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"hw1","inherit":true},"4a411d2f371e.5":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[2.96994555310263,4.38436729162556,3.26970059180152,2.80044692946732,2.39146011780482,2.21123604633037],"name":"hw2","type":"box","fillcolor":"rgba(219,114,140, 0.5)","line":{"color":"#DB728C"},"marker":{"color":"#DB728C"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"hw2","inherit":true},"4a411d2f371e.6":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"y":[3.6620345405476,4.0077909501968,3.17806135786671,2.65643653846798,4.15382975908853,4.55945278162432],"name":"tslm","type":"box","fillcolor":"rgba(153,153,153, 0.5)","line":{"color":"#999999"},"marker":{"color":"#999999"},"boxpoints":"all","jitter":0.3,"pointpos":-1.8,"showlegend":false,"legendgroup":"tslm","inherit":true}},"source":"A","config":{"showSendToCloud":false},"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"subplot":true,"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

The plot_model enables us to animate the forecasted values of each model on the different testing partitions of the backtesting:

plot_model(md)
{"x":{"visdat":{"4a4122da6e82":["function () ","plotlyVisDat"]},"cur_data":"4a4122da6e82","attrs":{"4a4122da6e82":{"x":{},"y":{},"mode":"lines","line":{"simplyfy":false},"split":{},"frame":{},"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"type":"scatter"}},"layout":{"margin":50,"title":"md Models Performance by Testing Partitions","xaxis":{"domain":[0,1],"automargin":true,"title":"Date","zeroline":false,"range":[2004.77083333333,2020.8125]},"yaxis":{"domain":[0,1],"automargin":true,"title":"","zeroline":false,"range":[2041241.59127711,6040454.58318061]},"font":{"color":"black"},"plot_bgcolor":"white","paper_bgcolor":"white","hovermode":"closest","showlegend":true,"sliders":[{"currentvalue":{"prefix":"partition: ","xanchor":"right","font":{"size":16,"color":"rgba(204,204,204,1)"}},"steps":[{"method":"animate","args":[["1"],{"transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false},"mode":"immediate"}],"label":"1","value":"1"},{"method":"animate","args":[["2"],{"transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false},"mode":"immediate"}],"label":"2","value":"2"},{"method":"animate","args":[["3"],{"transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false},"mode":"immediate"}],"label":"3","value":"3"},{"method":"animate","args":[["4"],{"transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false},"mode":"immediate"}],"label":"4","value":"4"},{"method":"animate","args":[["5"],{"transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false},"mode":"immediate"}],"label":"5","value":"5"},{"method":"animate","args":[["6"],{"transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false},"mode":"immediate"}],"label":"6","value":"6"}],"visible":true,"pad":{"t":40}}],"updatemenus":[{"type":"buttons","direction":"right","showactive":false,"y":0,"x":1,"yanchor":"bottom","xanchor":"right","pad":{"t":60,"r":5},"buttons":[{"label":"Play","method":"animate","args":[null,{"fromcurrent":true,"mode":"immediate","transition":{"duration":0,"easing":"linear"},"frame":{"duration":500,"redraw":false}}]}]}]},"source":"A","config":{"showSendToCloud":false},"data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"1","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4662648.38877152,4250653.18849264,3906822.84077006,4641620.05859456,4701934.41503581,5013033.85030003,5360887.38857595,5645502.75106732,5648238.30120258,4976139.46503331,5084951.60089021,4762253.17200161],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"1","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4663516.96901709,4255313.84232917,3907097.71271057,4741046.16749003,4811730.49023165,5157706.84287766,5504110.66720021,5761883.62528134,5765293.64584336,5017628.41643024,5156637.26874575,4725896.44425262],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"1","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4640379.52046931,4242622.2080514,3876192.64731705,4701919.24052639,4766413.79720938,5125768.02195122,5457324.48311187,5718960.57351841,5744028.27150913,4994984.71687352,5139996.03893908,4701342.67995677],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"1","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4676026.70341037,4267702.76443747,3912567.19118709,4744561.7209032,4819279.42075196,5192146.47308412,5552726.26685121,5821520.18118917,5858672.17445772,5115048.21763965,5258740.76783462,4834139.45020265],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"1","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4666393.2161633,4293664.17067734,4011592.01106643,4815287.0119868,4866758.61788972,5172443.17478515,5520752.26327711,5736917.83052639,5645818.24091496,4926456.84150425,5065527.74258316,4753964.28825543],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"1","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4642402.43351819,4226158.66723458,3878149.97853508,4723352.54514793,4808702.86841276,5188770.95451046,5543840.2579286,5801950.40530063,5790957.01113968,5039936.83968447,5158675.47395369,4732221.25318844],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"1","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4421723.28911565,4128345.87244898,3873375.87244898,4473160.78911565,4521669.70578231,4782152.95578231,5027745.53911565,5220355.82417583,5232954.20879121,4706447.05494506,4806102.13186813,4498079.13186813],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"1","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"frames":[{"name":"1","data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"1","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4662648.38877152,4250653.18849264,3906822.84077006,4641620.05859456,4701934.41503581,5013033.85030003,5360887.38857595,5645502.75106732,5648238.30120258,4976139.46503331,5084951.60089021,4762253.17200161],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"1","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4663516.96901709,4255313.84232917,3907097.71271057,4741046.16749003,4811730.49023165,5157706.84287766,5504110.66720021,5761883.62528134,5765293.64584336,5017628.41643024,5156637.26874575,4725896.44425262],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"1","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4640379.52046931,4242622.2080514,3876192.64731705,4701919.24052639,4766413.79720938,5125768.02195122,5457324.48311187,5718960.57351841,5744028.27150913,4994984.71687352,5139996.03893908,4701342.67995677],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"1","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4676026.70341037,4267702.76443747,3912567.19118709,4744561.7209032,4819279.42075196,5192146.47308412,5552726.26685121,5821520.18118917,5858672.17445772,5115048.21763965,5258740.76783462,4834139.45020265],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"1","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4666393.2161633,4293664.17067734,4011592.01106643,4815287.0119868,4866758.61788972,5172443.17478515,5520752.26327711,5736917.83052639,5645818.24091496,4926456.84150425,5065527.74258316,4753964.28825543],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"1","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4642402.43351819,4226158.66723458,3878149.97853508,4723352.54514793,4808702.86841276,5188770.95451046,5543840.2579286,5801950.40530063,5790957.01113968,5039936.83968447,5158675.47395369,4732221.25318844],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"1","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333],"y":[4421723.28911565,4128345.87244898,3873375.87244898,4473160.78911565,4521669.70578231,4782152.95578231,5027745.53911565,5220355.82417583,5232954.20879121,4706447.05494506,4806102.13186813,4498079.13186813],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"1","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"traces":[0,1,2,3,4,5,6,7]},{"name":"2","data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"2","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4635498.26473432,4703002.4246484,5016232.1584116,5378532.61137673,5675265.54884079,5677631.65417329,4993714.39459093,5103553.46491872,4785708.87752772,4886761.47142227,4449439.84479792,4127157.99146437],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"2","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4694023.29088653,4764199.31490137,5100784.97642778,5430172.68477655,5688977.41137164,5705658.51320955,4958575.89500432,5104193.19125513,4673426.04092019,4809638.27512677,4388244.80311878,4027855.8206912],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"2","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4690108.26809639,4754823.14520313,5107215.57315146,5434155.12037056,5689388.60016428,5702880.65044018,4966149.48498048,5091919.70013869,4666269.80058465,4794897.51514569,4366152.52533414,3997605.84793541],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"2","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4690123.85581537,4754825.84291249,5112553.97065868,5445646.55019046,5702539.17152855,5716701.11896082,4974712.60513972,5105905.94230563,4672102.78928358,4810468.04261377,4370504.59967854,4001852.93608808],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"2","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4715378.50323141,4764625.00563288,5067720.28893118,5414892.58194263,5630242.95399559,5537483.76598811,4815134.62986563,4952092.73632284,4639104.92053093,4728210.46027457,4310098.15951198,4038502.39825593],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"2","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4717545.51238518,4802371.56780862,5181991.88994422,5536999.07606922,5794911.54282024,5783721.68850285,5033729.98840906,5152385.68465187,4726650.78332539,4845975.03949002,4396374.94529504,4046498.33657716],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"2","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333],"y":[4479114.76923077,4527623.68589744,4788106.93589744,5033699.51923077,5226767.80276134,5239366.18737673,4712859.03353057,4812514.11045365,4504491.11045365,4607462.03353057,4300487.72583826,4041424.18737673],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"2","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"traces":[0,1,2,3,4,5,6,7]},{"name":"3","data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"3","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5390788.09628057,5688188.54874233,5690496.28572271,5005797.11607379,5115681.02384566,4798133.06051862,4898929.94181807,4461131.01437741,4138734.55238777,4913661.73091621,4966048.97893475,5278785.24661489],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"3","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5384793.45947407,5635950.85502372,5655737.41528692,4909514.10934149,5045259.47233217,4629619.70363544,4758954.14456425,4339307.38594084,3982688.94633511,4832024.7133303,4901408.16795106,5262855.67349992],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"3","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5400913.86725694,5663165.31447035,5666375.60882748,4944603.45939681,5088951.5142398,4651974.9224152,4815569.41775386,4357928.47915088,4013051.30745249,4858355.92485614,4923887.46967346,5296172.12448908],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"3","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5377025.45509581,5621944.9485568,5632364.38920464,4906181.61263101,5022871.00379107,4592893.76818003,4719415.96307912,4284066.84827056,3927441.32732642,4750691.85370403,4800725.51278116,5153614.9824567],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"3","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5373068.03780479,5587461.19401677,5493976.73706181,4771144.16434478,4907384.99759296,4593552.21207195,4681936.82786854,4263235.35641167,3991013.1521621,4804494.7553284,4858695.48262749,5172612.54811835],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"3","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5422819.98810941,5673958.05610035,5660119.72891141,4924879.28134695,5039675.12648788,4622789.7870938,4735919.82569263,4293030.81769417,3949316.69017673,4791813.07861884,4868367.08735023,5244415.51604366],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"3","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333],"y":[5044935.29393939,5238867.86783216,5251466.25244755,4724959.0986014,4824614.17552447,4516591.17552447,4619562.0986014,4312587.79090909,4053524.25244755,4668083.71398601,4715872.56013986,4980350.32937063],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"3","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"traces":[0,1,2,3,4,5,6,7]},{"name":"4","data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"4","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4917260.53955709,5027046.01854174,4708841.19418495,4810204.89808531,4373467.90230177,4051329.08417201,4825205.67536981,4877683.88689129,5190548.56551835,5573100.50086841,5852940.4702259,5776724.92215794],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"4","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4892001.56872729,5019945.90793439,4602276.02807347,4740644.91764336,4316533.64491442,3963685.00045483,4795893.29591842,4866394.89867534,5223227.72850455,5571470.00863788,5837331.12086497,5834574.11890363],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"4","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4889550.10962328,5028267.93300592,4585563.02142363,4721553.39945599,4290501.26357444,3932527.81828718,4755879.24439943,4804883.02500698,5157962.61727658,5499710.5400596,5750706.63182673,5743267.27862078],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"4","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4893694.72384532,5027663.27670909,4604350.7893917,4750433.1246551,4318793.84679466,3966020.01045542,4791197.25970962,4851711.87798513,5210917.2798399,5565869.92899339,5825358.90517002,5819580.36538501],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"4","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4832320.75940566,4969444.74635132,4657297.57001931,4746727.95145746,4328225.18582411,4056245.18918645,4870863.37718987,4926163.27077866,5241003.15794775,5615081.7007792,5841498.54573214,5705897.09500937],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"4","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4878957.20616345,4991526.26325917,4577523.00395782,4688043.12444622,4249075.85500789,3908759.79256329,4742192.74914606,4817456.41313958,5189241.38160159,5568303.66855506,5826574.99069776,5776387.67469483],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"4","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333],"y":[4746977.19692308,4846632.27384616,4538609.27384616,4641580.19692308,4334605.88923077,4075542.35076923,4690101.81230769,4737890.65846154,5002368.42769231,5259957.65846154,5458469.72197802,5459688.72197802],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"4","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"traces":[0,1,2,3,4,5,6,7]},{"name":"5","data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"5","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4617033.03659663,4175175.7808753,3851892.55540483,4630759.14343157,4682750.43299449,4994924.36416275,5382776.20367856,5664569.35413125,5582224.34264077,4774311.78369324,4942665.98153598,4606047.69916005],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"5","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4619602.85097377,4197210.57207315,3846356.56219067,4656296.57974694,4731967.16459112,5073555.660794,5391989.064122,5650963.93130424,5653687.57949318,4933502.52989144,5069257.87519736,4626580.40537775],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"5","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4598953.45208383,4177857.4166074,3838871.28172083,4629361.55681626,4685586.03630788,5018678.94309697,5357098.7304945,5601717.47486168,5576119.24369685,4832580.64316744,4960852.04959173,4552169.07794741],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"5","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4623706.96481357,4213879.86329844,3856960.21310833,4677823.73740462,4738795.73628405,5083225.31587882,5428703.56375124,5688244.6958872,5683140.43800209,4912891.29471494,5052028.10535585,4645861.23241533],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"5","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4616140.46628524,4195015.60054453,3920322.65714775,4732886.78781886,4786107.23373075,5098700.85727043,5471356.01827499,5696787.22106488,5559486.19527122,4758293.01075169,4974342.22367673,4643901.51926796],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"5","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4590237.06216616,4161129.65789208,3828348.3852076,4644814.48423339,4718931.6339519,5082790.58241712,5450941.97182412,5699346.31239192,5647829.13566237,4881487.97937498,5043611.1636782,4634498.06896407],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"5","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333],"y":[4639453.20174165,4332478.89404935,4073415.35558781,4687974.81712627,4735763.66328012,5000241.43251088,5257830.66328012,5456190.79856936,5457409.79856936,4904456.08428364,5012184.51285507,4700936.29856935],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"5","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"traces":[0,1,2,3,4,5,6,7]},{"name":"6","data":[{"x":[2005.5,2005.58333333333,2005.66666666667,2005.75,2005.83333333333,2005.91666666667,2006,2006.08333333333,2006.16666666667,2006.25,2006.33333333333,2006.41666666667,2006.5,2006.58333333333,2006.66666666667,2006.75,2006.83333333333,2006.91666666667,2007,2007.08333333333,2007.16666666667,2007.25,2007.33333333333,2007.41666666667,2007.5,2007.58333333333,2007.66666666667,2007.75,2007.83333333333,2007.91666666667,2008,2008.08333333333,2008.16666666667,2008.25,2008.33333333333,2008.41666666667,2008.5,2008.58333333333,2008.66666666667,2008.75,2008.83333333333,2008.91666666667,2009,2009.08333333333,2009.16666666667,2009.25,2009.33333333333,2009.41666666667,2009.5,2009.58333333333,2009.66666666667,2009.75,2009.83333333333,2009.91666666667,2010,2010.08333333333,2010.16666666667,2010.25,2010.33333333333,2010.41666666667,2010.5,2010.58333333333,2010.66666666667,2010.75,2010.83333333333,2010.91666666667,2011,2011.08333333333,2011.16666666667,2011.25,2011.33333333333,2011.41666666667,2011.5,2011.58333333333,2011.66666666667,2011.75,2011.83333333333,2011.91666666667,2012,2012.08333333333,2012.16666666667,2012.25,2012.33333333333,2012.41666666667,2012.5,2012.58333333333,2012.66666666667,2012.75,2012.83333333333,2012.91666666667,2013,2013.08333333333,2013.16666666667,2013.25,2013.33333333333,2013.41666666667,2013.5,2013.58333333333,2013.66666666667,2013.75,2013.83333333333,2013.91666666667,2014,2014.08333333333,2014.16666666667,2014.25,2014.33333333333,2014.41666666667,2014.5,2014.58333333333,2014.66666666667,2014.75,2014.83333333333,2014.91666666667,2015,2015.08333333333,2015.16666666667,2015.25,2015.33333333333,2015.41666666667,2015.5,2015.58333333333,2015.66666666667,2015.75,2015.83333333333,2015.91666666667,2016,2016.08333333333,2016.16666666667,2016.25,2016.33333333333,2016.41666666667,2016.5,2016.58333333333,2016.66666666667,2016.75,2016.83333333333,2016.91666666667,2017,2017.08333333333,2017.16666666667,2017.25,2017.33333333333,2017.41666666667,2017.5,2017.58333333333,2017.66666666667,2017.75,2017.83333333333,2017.91666666667,2018,2018.08333333333,2018.16666666667,2018.25,2018.33333333333,2018.41666666667,2018.5,2018.58333333333,2018.66666666667,2018.75,2018.83333333333,2018.91666666667,2019,2019.08333333333,2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224],"mode":"lines","line":{"color":"rgba(31,119,180,1)","simplyfy":false},"frame":"6","type":"scatter","name":"actual","marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4556417.33215598,4608352.19063473,4920447.28882811,5308818.6522776,5590775.71553289,5507786.1692101,4698612.86730991,4867429.00172161,4530794.42598564,4611688.16323384,4182308.27420555,3812841.8081478],"mode":"lines","line":{"color":"rgba(255,127,14,1)","simplyfy":false},"frame":"6","type":"scatter","name":"auto_arima","marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4599849.06386018,4663967.5985531,4996359.59413334,5301633.34231846,5567532.99717434,5566598.95201211,4821010.40689378,4947781.7318315,4527755.31841526,4648907.84700911,4235897.65291068,3877589.93622952],"mode":"lines","line":{"color":"rgba(44,160,44,1)","simplyfy":false},"frame":"6","type":"scatter","name":"ets1","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4554733.72384316,4605449.93670568,4937376.25349822,5256233.34980786,5489912.96185494,5484390.88793902,4740048.42039336,4876366.61626044,4465517.76873044,4570429.38010434,4159941.5896482,3806756.67868424],"mode":"lines","line":{"color":"rgba(214,39,40,1)","simplyfy":false},"frame":"6","type":"scatter","name":"ets2","marker":{"color":"rgba(214,39,40,1)","line":{"color":"rgba(214,39,40,1)"}},"error_y":{"color":"rgba(214,39,40,1)"},"error_x":{"color":"rgba(214,39,40,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4564901.62137508,4614839.21392002,4951905.39949017,5281579.74732353,5520953.82833837,5507412.8295166,4764347.95418127,4888795.18394066,4475910.22225272,4587660.33700873,4162168.48148406,3810165.8169425],"mode":"lines","line":{"color":"rgba(148,103,189,1)","simplyfy":false},"frame":"6","type":"scatter","name":"ets3","marker":{"color":"rgba(148,103,189,1)","line":{"color":"rgba(148,103,189,1)"}},"error_y":{"color":"rgba(148,103,189,1)"},"error_x":{"color":"rgba(148,103,189,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4627337.11748396,4679168.37519042,4990261.83348827,5361942.97786304,5586690.07907723,5448251.488966,4645010.56034054,4859733.56351226,4528042.6732882,4615827.93340732,4203889.5662424,3871265.90846242],"mode":"lines","line":{"color":"rgba(140,86,75,1)","simplyfy":false},"frame":"6","type":"scatter","name":"hw1","marker":{"color":"rgba(140,86,75,1)","line":{"color":"rgba(140,86,75,1)"}},"error_y":{"color":"rgba(140,86,75,1)"},"error_x":{"color":"rgba(140,86,75,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4593027.23315722,4664772.75806622,5023991.4861452,5389558.81907895,5635482.69956459,5582794.3215905,4824405.23066968,4986184.89091763,4582539.84845744,4674081.24832202,4238603.5923092,3881156.22105164],"mode":"lines","line":{"color":"rgba(227,119,194,1)","simplyfy":false},"frame":"6","type":"scatter","name":"hw2","marker":{"color":"rgba(227,119,194,1)","line":{"color":"rgba(227,119,194,1)"}},"error_y":{"color":"rgba(227,119,194,1)"},"error_x":{"color":"rgba(227,119,194,1)"},"xaxis":"x","yaxis":"y","visible":true},{"x":[2019.16666666667,2019.25,2019.33333333333,2019.41666666667,2019.5,2019.58333333333,2019.66666666667,2019.75,2019.83333333333,2019.91666666667,2020,2020.08333333333],"y":[4677987.05906593,4725775.90521978,4990253.67445055,5247842.90521978,5445489.62921899,5446708.62921899,4893754.91493328,5001483.34350471,4690235.12921899,4790696.91493328,4475675.91493328,4206255.62921899],"mode":"lines","line":{"color":"rgba(127,127,127,1)","simplyfy":false},"frame":"6","type":"scatter","name":"tslm","marker":{"color":"rgba(127,127,127,1)","line":{"color":"rgba(127,127,127,1)"}},"error_y":{"color":"rgba(127,127,127,1)"},"error_x":{"color":"rgba(127,127,127,1)"},"xaxis":"x","yaxis":"y","visible":true}],"traces":[0,1,2,3,4,5,6,7]}],"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

We will select the Holt-Winters model (hw1) to calculate the Covid19 effect. We will add the selected forecast to the post_covid dataset:

post_covid$yhat <- as.numeric(md$forecast$hw1$forecast$mean)
post_covid$upper95 <- as.numeric(md$forecast$hw1$forecast$upper[,2])
post_covid$lower95 <- as.numeric(md$forecast$hw1$forecast$lower[,2])

Quantify the Covid19 impact

After we added forecasted values, it is straightforward to calculate the monthly impact of the Covid19 on the number of passengers at SFO airport:

post_covid$passengers_loss <- post_covid$y - post_covid$yhat

post_covid
## # A tibble: 7 x 6
##   date             y     yhat  upper95  lower95 passengers_loss
##                                 
## 1 2020-03-01 1885466 4675574. 4860610. 4490537.       -2790108.
## 2 2020-04-01  138817 4756550. 4963034. 4550066.       -4617733.
## 3 2020-05-01  286570 5062504. 5288613. 4836395.       -4775934.
## 4 2020-06-01  555119 5476241. 5720592. 5231889.       -4921122.
## 5 2020-07-01  765274 5641839. 5903342. 5380336.       -4876565.
## 6 2020-08-01  852578 5650253. 5928018. 5372488.       -4797675.
## 7 2020-09-01  905992 4541560. 4834848. 4248272.       -3635568.

And, the estimated total number of passengers decrease between March and September 2020 as result of the pandemic:

sum(post_covid$passengers_loss)
## [1] -30414705

Similarly, we can visualize the Covid19 effect on the air passenger traffic:

plot_ly() %>%
  add_ribbons(x = post_covid$date,
              ymin = post_covid$y,
              ymax = post_covid$yhat,
              line = list(color = 'rgba(255, 0, 0, 0.05)'),
              fillcolor = 'rgba(255, 0, 0, 0.6)',
              name = "Estimated Loss") %>%
  add_segments(x = as.Date("2020-02-01"), 
               xend = as.Date("2020-02-01"),
               y = min(df$y),
               yend = max(df$y) * 1.05,
               line = list(color = "black", dash = "dash"),
               showlegend = FALSE) %>%
  add_annotations(text = "Pre-Covid19",
                  x = as.Date("2017-09-01"),
                  y = max(df$y) * 1.05, 
                  showarrow = FALSE) %>%
  add_annotations(text = "Post-Covid19",
                  x = as.Date("2020-09-01"),
                  y = max(df$y) * 1.05, 
                  showarrow = FALSE) %>%
  add_annotations(text = paste("Estimated decrease in", " ",
                               "passengers volume: ~30M",
                               sep = ""),
                  x = as.Date("2020-05-01"),
                  y = 2 * 10 ^ 6, 
                  arrowhead = 1,
                  ax = -130,
                  ay = -40,
                  showarrow = TRUE) %>%
  add_lines(x = df$date,
            y = df$y,
            line = list(color = "#1f77b4"),
            name = "Actual") %>%
  layout(title = "Covid19 Impact on SFO Air Passenger Traffic",
         yaxis = list(title = "Number of Passengers"),
         xaxis = list(title = "Time Series Model - Holt-Winters",
                      range = c(as.Date("2015-01-01"), as.Date("2021-01-01"))),
         legend = list(x = 0, y = 0.95))
{"x":{"visdat":{"4a41195ea04f":["function () ","plotlyVisDat"]},"cur_data":"4a41195ea04f","attrs":{"4a41195ea04f":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["2020-03-01","2020-04-01","2020-05-01","2020-06-01","2020-07-01","2020-08-01","2020-09-01"],"ymin":[1885466,138817,286570,555119,765274,852578,905992],"ymax":[4675573.74820966,4756549.77479561,5062503.81566185,5476240.63384323,5641839.11286111,5650253.20887009,4541560.26496136],"type":"scatter","mode":"lines","hoveron":"points","fill":"toself","line":{"color":"rgba(255, 0, 0, 0.05)"},"fillcolor":"rgba(255, 0, 0, 0.6)","name":"Estimated Loss","inherit":true},"4a41195ea04f.1":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":"2020-02-01","y":138817,"xend":"2020-02-01","yend":6029558.85,"type":"scatter","mode":"lines","line":{"color":"black","dash":"dash"},"showlegend":false,"inherit":true},"4a41195ea04f.2":{"alpha_stroke":1,"sizes":[10,100],"spans":[1,20],"x":["2005-07-01","2005-08-01","2005-09-01","2005-10-01","2005-11-01","2005-12-01","2006-01-01","2006-02-01","2006-03-01","2006-04-01","2006-05-01","2006-06-01","2006-07-01","2006-08-01","2006-09-01","2006-10-01","2006-11-01","2006-12-01","2007-01-01","2007-02-01","2007-03-01","2007-04-01","2007-05-01","2007-06-01","2007-07-01","2007-08-01","2007-09-01","2007-10-01","2007-11-01","2007-12-01","2008-01-01","2008-02-01","2008-03-01","2008-04-01","2008-05-01","2008-06-01","2008-07-01","2008-08-01","2008-09-01","2008-10-01","2008-11-01","2008-12-01","2009-01-01","2009-02-01","2009-03-01","2009-04-01","2009-05-01","2009-06-01","2009-07-01","2009-08-01","2009-09-01","2009-10-01","2009-11-01","2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01","2011-01-01","2011-02-01","2011-03-01","2011-04-01","2011-05-01","2011-06-01","2011-07-01","2011-08-01","2011-09-01","2011-10-01","2011-11-01","2011-12-01","2012-01-01","2012-02-01","2012-03-01","2012-04-01","2012-05-01","2012-06-01","2012-07-01","2012-08-01","2012-09-01","2012-10-01","2012-11-01","2012-12-01","2013-01-01","2013-02-01","2013-03-01","2013-04-01","2013-05-01","2013-06-01","2013-07-01","2013-08-01","2013-09-01","2013-10-01","2013-11-01","2013-12-01","2014-01-01","2014-02-01","2014-03-01","2014-04-01","2014-05-01","2014-06-01","2014-07-01","2014-08-01","2014-09-01","2014-10-01","2014-11-01","2014-12-01","2015-01-01","2015-02-01","2015-03-01","2015-04-01","2015-05-01","2015-06-01","2015-07-01","2015-08-01","2015-09-01","2015-10-01","2015-11-01","2015-12-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01","2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01","2020-01-01","2020-02-01","2020-03-01","2020-04-01","2020-05-01","2020-06-01","2020-07-01","2020-08-01","2020-09-01"],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224,1885466,138817,286570,555119,765274,852578,905992],"type":"scatter","mode":"lines","line":{"color":"#1f77b4"},"name":"Actual","inherit":true}},"layout":{"margin":{"b":40,"l":60,"t":25,"r":10},"annotations":[{"text":"Pre-Covid19","x":"2017-09-01","y":6029558.85,"showarrow":false},{"text":"Post-Covid19","x":"2020-09-01","y":6029558.85,"showarrow":false},{"text":"Estimated decrease in<br>passengers volume: ~30M","x":"2020-05-01","y":2000000,"arrowhead":1,"ax":-130,"ay":-40,"showarrow":true}],"title":"Covid19 Impact on SFO Air Passenger Traffic","yaxis":{"domain":[0,1],"automargin":true,"title":"Number of Passengers"},"xaxis":{"domain":[0,1],"automargin":true,"title":"Time Series Model - Holt-Winters","range":["2015-01-01","2021-01-01"]},"legend":{"x":0,"y":0.95},"hovermode":"closest","showlegend":true},"source":"A","config":{"showSendToCloud":false},"data":[{"fillcolor":"rgba(255, 0, 0, 0.6)","x":["2020-03-01","2020-04-01","2020-05-01","2020-06-01","2020-07-01","2020-08-01","2020-09-01","2020-09-01","2020-09-01","2020-08-01","2020-07-01","2020-06-01","2020-05-01","2020-04-01","2020-03-01"],"type":"scatter","mode":"lines","hoveron":"points","fill":"toself","line":{"color":"rgba(255, 0, 0, 0.05)"},"name":"Estimated Loss","y":[1885466,138817,286570,555119,765274,852578,905992,905992,4541560.26496136,5650253.20887009,5641839.11286111,5476240.63384323,5062503.81566185,4756549.77479561,4675573.74820966],"marker":{"color":"rgba(31,119,180,1)","line":{"color":"rgba(31,119,180,1)"}},"error_y":{"color":"rgba(31,119,180,1)"},"error_x":{"color":"rgba(31,119,180,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["2020-02-01","2020-02-01"],"y":[138817,6029558.85],"type":"scatter","mode":"lines","line":{"color":"black","dash":"dash"},"showlegend":false,"marker":{"color":"rgba(255,127,14,1)","line":{"color":"rgba(255,127,14,1)"}},"error_y":{"color":"rgba(255,127,14,1)"},"error_x":{"color":"rgba(255,127,14,1)"},"xaxis":"x","yaxis":"y","frame":null},{"x":["2005-07-01","2005-08-01","2005-09-01","2005-10-01","2005-11-01","2005-12-01","2006-01-01","2006-02-01","2006-03-01","2006-04-01","2006-05-01","2006-06-01","2006-07-01","2006-08-01","2006-09-01","2006-10-01","2006-11-01","2006-12-01","2007-01-01","2007-02-01","2007-03-01","2007-04-01","2007-05-01","2007-06-01","2007-07-01","2007-08-01","2007-09-01","2007-10-01","2007-11-01","2007-12-01","2008-01-01","2008-02-01","2008-03-01","2008-04-01","2008-05-01","2008-06-01","2008-07-01","2008-08-01","2008-09-01","2008-10-01","2008-11-01","2008-12-01","2009-01-01","2009-02-01","2009-03-01","2009-04-01","2009-05-01","2009-06-01","2009-07-01","2009-08-01","2009-09-01","2009-10-01","2009-11-01","2009-12-01","2010-01-01","2010-02-01","2010-03-01","2010-04-01","2010-05-01","2010-06-01","2010-07-01","2010-08-01","2010-09-01","2010-10-01","2010-11-01","2010-12-01","2011-01-01","2011-02-01","2011-03-01","2011-04-01","2011-05-01","2011-06-01","2011-07-01","2011-08-01","2011-09-01","2011-10-01","2011-11-01","2011-12-01","2012-01-01","2012-02-01","2012-03-01","2012-04-01","2012-05-01","2012-06-01","2012-07-01","2012-08-01","2012-09-01","2012-10-01","2012-11-01","2012-12-01","2013-01-01","2013-02-01","2013-03-01","2013-04-01","2013-05-01","2013-06-01","2013-07-01","2013-08-01","2013-09-01","2013-10-01","2013-11-01","2013-12-01","2014-01-01","2014-02-01","2014-03-01","2014-04-01","2014-05-01","2014-06-01","2014-07-01","2014-08-01","2014-09-01","2014-10-01","2014-11-01","2014-12-01","2015-01-01","2015-02-01","2015-03-01","2015-04-01","2015-05-01","2015-06-01","2015-07-01","2015-08-01","2015-09-01","2015-10-01","2015-11-01","2015-12-01","2016-01-01","2016-02-01","2016-03-01","2016-04-01","2016-05-01","2016-06-01","2016-07-01","2016-08-01","2016-09-01","2016-10-01","2016-11-01","2016-12-01","2017-01-01","2017-02-01","2017-03-01","2017-04-01","2017-05-01","2017-06-01","2017-07-01","2017-08-01","2017-09-01","2017-10-01","2017-11-01","2017-12-01","2018-01-01","2018-02-01","2018-03-01","2018-04-01","2018-05-01","2018-06-01","2018-07-01","2018-08-01","2018-09-01","2018-10-01","2018-11-01","2018-12-01","2019-01-01","2019-02-01","2019-03-01","2019-04-01","2019-05-01","2019-06-01","2019-07-01","2019-08-01","2019-09-01","2019-10-01","2019-11-01","2019-12-01","2020-01-01","2020-02-01","2020-03-01","2020-04-01","2020-05-01","2020-06-01","2020-07-01","2020-08-01","2020-09-01"],"y":[3225769,3195866,2740553,2770715,2617333,2671797,2448889,2223024,2708778,2773293,2829000,3071396,3227605,3143839,2720100,2834959,2653887,2698200,2507430,2304990,2820085,2869247,3056934,3263621,3382382,3436417,2957530,3129309,2922500,2903637,2670053,2595676,3127387,3029021,3305954,3453751,3603946,3612297,3004720,3124451,2744485,2962937,2644539,2359800,2925918,3024973,3177100,3419595,3649702,3650668,3191526,3249428,2971484,3074209,2785466,2515361,3105958,3139059,3380355,3612886,3765824,3771842,3356365,3490100,3163659,3167124,2883810,2610667,3129205,3200527,3547804,3766323,3935589,3917884,3564970,3602455,3326859,3441693,3211600,2998119,3472440,3563007,3820570,4107195,4284443,4356216,3819379,3844987,3478890,3443039,3204637,2966477,3593364,3604104,3933016,4146797,4176486,4347059,3781168,3910790,3466878,3814984,3432625,3078405,3765504,3881893,4147096,4321833,4499221,4524918,3919072,4059443,3628786,3855835,3550084,3248144,4001521,4021677,4361140,4558511,4801148,4796653,4201394,4374749,4013814,4129052,3748529,3543639,4137679,4172512,4573996,4922125,5168724,5110638,4543759,4571997,4266481,4343369,3897685,3481405,4335287,4425920,4698067,5134110,5496516,5516837,4736005,4868674,4572702,4660504,4190367,3882181,4674035,4713183,5025595,5427144,5692572,5545859,4649100,4861782,4508606,4576449,4156821,3752763,4599189,4692941,5008001,5466688,5612312,5742437,4471408,4824559,4370463,4720992,4241751,3742224,1885466,138817,286570,555119,765274,852578,905992],"type":"scatter","mode":"lines","line":{"color":"#1f77b4"},"name":"Actual","marker":{"color":"rgba(44,160,44,1)","line":{"color":"rgba(44,160,44,1)"}},"error_y":{"color":"rgba(44,160,44,1)"},"error_x":{"color":"rgba(44,160,44,1)"},"xaxis":"x","yaxis":"y","frame":null}],"highlight":{"on":"plotly_click","persistent":false,"dynamic":false,"selectize":false,"opacityDim":0.2,"selected":{"opacity":1},"debounce":0},"shinyEvents":["plotly_hover","plotly_click","plotly_selected","plotly_relayout","plotly_brushed","plotly_brushing","plotly_clickannotation","plotly_doubleclick","plotly_deselect","plotly_afterplot","plotly_sunburstclick"],"base_url":"https://plot.ly"},"evals":[],"jsHooks":[]}

Applications

Once we estimate the decrease in passengers’ number, we can quantify losses caused by the Covid19. For example, if each passenger on average pays $10 airport tax, then the estimated tax loss is about 300 Million USD for the specific period.

In addition, as the underline forecast is a point estimate. Therefore you can leverage the prediction interval to provide a range for the drop in air passenger traffic.

Last but not least, you can use a top-bottom approach and distribute the forecast for some of the available categories in the data. For example, the drop of passengers by

  • Airline provider
  • Region
  • Domestic / International flights

Additional information

More details about the packages and tools used on this post:

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Rami Krispin.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Quantify the Covid19 Impact on the SFO Airport Passenger Air Traffic first appeared on R-bloggers.

Little useless-useful R functions – Mathematical puzzle of Four fours

$
0
0

[This article was first published on R – TomazTsql, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Yes. Playing with numbers is another call for useless R function. This time, we will be using a function that will find the simplest mathematical expression for a whole number from 0 to 9 (or even higher), using only: – common mathematical operations (and symbols) – only four digits of four (hence name Four Fours) – no concatenations of the numbers (yet).

For the sake of brevity, I have only used couple of numbers, but list can go on and on. And also only four mathematical operations are used: – addition – subtraction – multiplication – division.

I next version we can also add: – exponentiation – factorial – root extraction.

To make calculations a little bit faster to compute, forcing order of operations by adding parentheses is also a helpful way to do it. Next version should have multiple-parentheses available.

Useless function:

# Four Fours functionfour_fours <- function(maxnum) {  for (i in 0:maxnum) {      oper <- c("+","*", "-", "/")      para <- c("(",")")      step_counter <- 0      res <- i + 1        while (i != res) {        oper3 <- sample(oper,4,replace=TRUE)        for44 <- paste0("4",oper3[1],"4",oper3[2],"4",oper3[3],"4")                #adding paranthesis        stopit <- FALSE        while (!stopit){          pos_par <<- sort(sample(1:7,2))           nn <- pos_par[1]          mm <- pos_par[2]          rr <<- abs(nn-mm)                    if (rr == 4 | rr == 5 ){            stopit <- TRUE                      }        }                for44 <- paste0(substr(for44, 1, nn-1), "(", substr(for44, nn, nchar(for44)), sep = "")        for44 <- paste0(substr(for44, 1, mm-1+1), ")", substr(for44, mm+1, nchar(for44)), sep = "")               # if (for44 ) like "(/" or "(-" or "(*" or "(+" -> switch to -> "/(" or "-("        for44 <- gsub("\\(-", "-\\(", for44)        for44 <- gsub("\\(/", "/\\(", for44)        for44 <- gsub("(*", "*(", for44, fixed=TRUE)        for44 <- gsub("(+", "+(", for44, fixed=TRUE)        for44 <- gsub("\\+)", "\\)+", for44)        for44 <- gsub("\\-)", "\\)-", for44)        for44 <- gsub("\\*)", "\\)*", for44)        for44 <- gsub("\\/)", "\\)/", for44)                ### Adding SQRT         if (i >= 10){          lii <- lapply(strsplit(as.character(for44), ""), function(x) which(x == "4"))          start_pos <- sample(lii[[1]],1)          for44 <- paste0(substr(for44, 1, start_pos-1), "sqrt(", substr(for44, start_pos, start_pos), ")", substr(for44, start_pos+1, nchar(for44)),sep = "")        }                ###  Adding Factorial        if (i >= 11){          li <- lapply(strsplit(as.character(for44), ""), function(x) which(x == "4"))          start_pos_2 <- sample(li[[1]],1)          for44 <- paste0(substr(for44, 1, start_pos_2-1), "factorial(", substr(for44, start_pos_2, start_pos_2), ")", substr(for44, start_pos_2+1, nchar(for44)),sep = "")        }                res <- eval(parse(text=for44))        step_counter <- step_counter + 1        if (res==i){          print(paste0("Value: ", res, " was found formula: ", for44, " with result: ", res, " and steps: ", step_counter, collapse=NULL))        }        }      i <- i + 1    }}

And to run the function, simple as:

#Run functionfour_fours(6)

After the finish run, results are displayed with number of tries and the formula.

There are some solutions that do not need factorial or root extraction and some, that do. Therefore from step 10 till 14, there are some need for either and from 15 till 18, simple operations are enough, and so on. At some integer, also concatenation of two fours would be required.

The above solution should easily generate solutions from 1 to 25, based on my tests. There are also many optimisations possible; one useless would be using cloud scalable architecture – most useless (and expensive, though) optimisation – which I certainly do not approve of.

As always, code is available at Github.

Happy R-coding!

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – TomazTsql.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Little useless-useful R functions – Mathematical puzzle of Four fours first appeared on R-bloggers.

Smoothing isn’t Always Safe

$
0
0

[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Here is a quick data-scientist / data-analyst question: what is the overall trend or shape in the following noisy data? For our specific example: How do we relate value as a noisy function (or relation) of m? This example arose in producing our tutorial “The Nature of Overfitting”.

One would think this would be safe and easy to asses in R using ggplot2::geom_smooth(), but now we are not so sure.

Our Example

Let’s first load our data and characterize it a bit

d <- read.csv(  'sus_shape.csv',   strip.white = TRUE,   stringsAsFactors = FALSE)head(d)
##   m      value## 1 3 -12.968296## 2 3  -5.522812## 3 3  -6.893872## 4 3  -5.522812## 5 3 -11.338718## 6 3 -10.208145
summary(d)
##        m              value        ##  Min.   :   3.0   Min.   :-18.773  ##  1st Qu.:  86.0   1st Qu.: -1.304  ##  Median : 195.0   Median : -1.276  ##  Mean   : 288.8   Mean   : -1.508  ##  3rd Qu.: 436.0   3rd Qu.: -1.266  ##  Max.   :1000.0   Max.   : -1.260
nrow(d)
## [1] 15545

Now let’s try and look at this data. First we try a scatter plot with a low alpha, which gives us something similar to a density presentation.

library(ggplot2)ggplot(  data = d,  mapping = aes(x = m, y = value)) +   geom_point(    alpha = 0.005,     color = 'Blue') +   ggtitle("point plot of data")

Unnamed chunk 4 1

Each m value has many different value measurements (representing repetitions of a noisy experiment). Frankly the above is not that legible, so we need tools to try and summarize it in the region we are interested in (value near -1.25).

Trying Default Smoothing

Let’s run a default smoothing line through this data to try to get the overall relation.

ggplot(  data = d,  mapping = aes(x = m, y = value)) +   geom_smooth() +  ggtitle("suspect shape in smoothing (default)")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Unnamed chunk 5 1

This graph appears to imply some sort of oscillation or structure in the relation between mean value and m. We are pretty sure there is no such structure, and this is an artifact of the smoothing method. This defect is why we did not use ggplot2::geom_smooth() in our note on training set size.

We did see a warning, but we believe this is just telling us which default values were used, and not indicating the above pathology was detected.

At this point we are in a pickle. We had theoretical reasons to believe the data is a monotone increasing in m trend, with mean-zero noise that decreases with larger m. The graph doesn’t look like that. So our understanding or theory could be wrong, or the graph didn’t faithfully represent the data. The graph had been intended as a very small step in larger work. Re-examining the intricacies of what is the default behavior of this graphing software was not our intended task. We had been doing some actual research on the data.

Now have a second problem: is this unexpected structure in our data, or a graphing artifact? The point is: when something appears to work one can, with some risk, move on quickly; when something appears to not work in a surprising way, you end up with a lot of additional required investigation. This investigation is the content of this note, like it or not. Also in some loud R circles, one has no choice but to try “the default ggplot2::geom_smooth() graph”, otherwise one is pilloried for “not knowing it.”

We can try switching the smoothing method to see what another smoothing method says. Let’s try loess.

ggplot(  data = d,  mapping = aes(x = m, y = value)) +   geom_smooth(method = 'loess') +  ggtitle("suspect shape in smoothing (loess)")
## `geom_smooth()` using formula 'y ~ x'

Unnamed chunk 6 1

Now we have a different shape. At most one of these (and in fact neither) is representative of the data. There is, again, a warning. It appears, again, to be a coding style guide- and not detection of the issue at hand.

Looking Again

Let’s try a simple grouped box plot. We will group m into ranges to get more aggregation.

d$m_grouped <- formatC(  round(d$m/50)*50,   width = 4,   format = "d",   flag = "0")ggplot(  data = d,  mapping = aes(x = m_grouped, y = value)) +   geom_boxplot() +  theme(axis.text.x = element_text(angle = 90,                                    vjust = 0.5,                                    hjust=1)) +  ggtitle("m-grouped bar chart, no obvious plotting artifacts.")

Unnamed chunk 7 1

For legibility, we repeat these graphs zooming in to the area under disagreement. We are using coord_cartesian() to zoom in, so as to try and not change the underlying graphing calculation.

zoom <- coord_cartesian(xlim = c(0, 500), ylim = c(-1.5, -1)) 
ggplot(  data = d,  mapping = aes(x = m, y = value)) +   geom_smooth() +  zoom +  ggtitle("suspect shape in smoothing (default, zoomed)") +   geom_hline(    yintercept = max(d$value),     color = 'red',     linetype = 2)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'

Unnamed chunk 9 1

This crossing above -1.0 is very suspicious, as we have max(d$value) = -1.2600449. We have annotated this with the horizontal red dashed line.

And the entirety of the loess hump is also a plotting artifact, also completely out of the observed data range.

ggplot(  data = d,  mapping = aes(x = m, y = value)) +   geom_smooth(method = 'loess') +  zoom +  ggtitle("suspect shape in smoothing (loess, zoomed)") +   geom_hline(    yintercept = max(d$value),     color = 'red',     linetype = 2)
## `geom_smooth()` using formula 'y ~ x'

Unnamed chunk 10 1

The zoomed-in version of the box plot shows the noisy monotone asymptotic shape we expected for the original experiment that produced this data.

ggplot(  data = d[d$m <= 500, ],  mapping = aes(x = m_grouped, y = value)) +   geom_boxplot() +  coord_cartesian(ylim = c(-1.5, -1.2)) +  theme(    axis.text.x = element_text(      angle = 90,       vjust = 0.5,       hjust=1)) +  ggtitle("m-grouped bar chart, no obvious plotting artifacts, zoomed")

Unnamed chunk 11 1

The point plot, when zoomed, qualitatively agrees with the boxplot.

ggplot(  data = d,  mapping = aes(x = m, y = value)) +   geom_point(alpha = 0.05, color = 'Blue') +   coord_cartesian(    xlim = c(0, 500),     ylim = c(-1.5, -1.25))  +  ggtitle("point plot of data, zoomed")

Unnamed chunk 12 1

Directly calling loess/lowess

ggplot2 is documented as using loess, which in turn is documented as a newer adapter for lowess“with different defaults” then loess. However, the documented exposed controls on these two methods seem fairly disjoint.

That being said loess (without a ‘w’, as in “Uruguay”) called directly with default arguments shows the same chimeric artifact.

zoom2 <- coord_cartesian(ylim = c(-1.5, -1)) 
d$loess <- loess(value ~ m, data = d)$fittedggplot(  data = d,  mapping = aes(x = m)) +   geom_line(aes(y = loess)) +   geom_point(    aes(y = value),     alpha = 0.01,     color = 'Blue') +   zoom2 +   geom_hline(    yintercept = max(d$value),     color = 'red',     linetype = 2) +  ggtitle('direct loess (no w) call')

Unnamed chunk 14 1

Playing with arguments can suppress the artifact, but we still saw weird (but smaller) effects even with the suggested degree = 1 alternate setting.

Directly calling lowess (with a ‘w’, as in “answer”) gives a more reasonable result out of the box.

d$lowess <- lowess(d$m, d$value)$yggplot(  data = d,  mapping = aes(x = m)) +   geom_line(aes(y = lowess)) +   geom_point(    aes(y = value),     alpha = 0.01,     color = 'Blue') +   geom_hline(    yintercept = max(d$value),     color = 'red',     linetype = 2) +  coord_cartesian(    ylim = c(-1.5, -1.25)) +   ggtitle('direct lowess (with w) call')

Unnamed chunk 15 1

Simple Windowing

Simple methods from fields such as signal processing work well. For example, a simple square-window moving average appears to correctly tell the story. These are the methods I use, at the risk of being told I should have used geom_smooth().

# requires development version 1.3.2# remotes::install_github('WinVector/WVPlots')library(WVPlots)  
## Loading required package: wrapr
ConditionalSmoothedScatterPlot(  d,  xvar = 'm',   yvar = 'value',   point_color = "Blue",  point_alpha = 0.01,  k = 51,  groupvar = NULL,   title = 'Width 51 square window on data (zoomed)') +  coord_cartesian(ylim = c(-1.5, -1.25)) +   geom_hline(    yintercept = max(d$value),     color = 'red',     linetype = 2)

Unnamed chunk 16 1

The fact that the hard window yields a jagged curve gives an indication of the amount of noise in each region of the graph.

Conclusion

Large data sets are inherently illegible. So we rely on summaries and aggregations to examine them. When these fail we may not always be in a position to notice the distortion, and this can lead to problems.

Many of the above default summary presentations were deeply flawed and showed chimerical artifacts not in the data being summarized. Starting a research project to understand the nature of the above humps and oscillations would be fruitless, as they are not in the data, but instead artifacts of the plotting and analysis software.

As a consultant this is disturbing: I end up spending time on debugging the tools, and not on the client’s task.

The above were not flaws in ggplot2 itself, but in the use of the gam and loess smoothers, which are likely introducing the artifacts by trying to enforce certain curvature conditions not in the data. We are essentially looking at something akin to Gibbs’ phenomenon or ringing. This could trip up the data scientist or the data analyst without a background in signal analysis.

This sort of problem reveals the lie in the typical “data scientist >> statistician >> data analyst” or “statistics are always correct in R, and never correct in Python” snobberies. In fact a data analyst would get the summary shapes right, as presentation of this sort is one of their specialties.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Smoothing isn’t Always Safe first appeared on R-bloggers.

Last Call for the 2020 R Community Survey

$
0
0

[This article was first published on RStudio Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Plot of Net Promoter Score for R
Net Promoter Score results from the 2019 R Community Survey

On December 11, RStudio launched our third annual R Community Survey (formerly known as the Learning R Survey) to better understand how and why people learn and use the R language and associated tools. That survey closes TOMORROW, January 8, 2021. We encourage anyone who is interested in R to respond. The survey should only require 5 to 10 minutes to complete, depending on how little or how much information you choose to share with us. You can find the survey here:

If you don’t know R yet or use Python or Julia more than R, that’s fine too! The survey has specific questions for you, and your responses will help us better understand how we can be more encouraging to you and others like you.

Data and analysis of the 2018 and 2019 community survey data can be found on github at https://github.com/rstudio/r-community-survey in the 2018/ and 2019/ folders. Results from the 2020 survey will also be posted as free and open source data that github repo in February 2021.

Please ask your students, Twitter followers, Ultimate Frisbee team, and anyone else you think may be interested to complete the survey. Your efforts will help RStudio, educators, and users understand and grow our data science community.

You will find a full disclosure of what information will be collected and how it will be used on the first page of the survey. The survey does not collect personally identifiable information nor email addresses, but it does have optional demographic questions.

Thank you in advance for your consideration and time. We look forward to sharing the results with you next month!

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: RStudio Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Last Call for the 2020 R Community Survey first appeared on R-bloggers.

30 Year Weather Data Analysis

$
0
0

[This article was first published on business-science.io, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

In this article, we’ll use the EPA Arcgis map that contains weather data going back to 1990. This will be a quick code-based blog post showing how to load, explore and visualize the data a local EPA monitor.

# Librariespackages <-   c("data.table",    "ggplot2",    "stringr",    "skimr",    "janitor",    "glue"    )if (length(setdiff(packages,rownames(installed.packages()))) > 0) {  install.packages(setdiff(packages, rownames(installed.packages())))  }invisible(lapply(packages, library, character.only = TRUE))knitr::opts_chunk$set(  comment = NA,  fig.width = 12,  fig.height = 8,  out.width = '100%',  cache = TRUE)

Weather Monitor Data Extraction

Below, we show how to use glue::glue() to hit the api for our local site (“09-001-0017”) for the annual daily data sets from 1990-2020 with datatable::fread(), which took only a few minutes. See how we create an integer vector of desired years, and glue the string into each iteration of the request to create a list. We could easily change the code above and get the same for the next closest monitor in Westport, CT, or from any group of monitors in Connecticut or beyond if needed. For the actual blog post, the data we extracted below and saved to disc will be used.

# Years to retrieveyear <- c(1990:2020)ac <-   lapply(    glue::glue(      'https://www3.epa.gov/cgi-bin/broker?_service=data&_program=dataprog.Daily.sas&check=void&polname=Ozone&debug=0&year={year}&site=09-001-0017'    ),    fread  )

Fortunately, the annual data sets were consistent over the period with all the same variables, so it took just a few minutes to get a clean data.table stretching back to 1990. In our experience with public data sets, this is almost always not the case. Things like variables or formatting almost always change. It seems surprising that the collected data would be exactly the same for such a long period of time, so we assume that the EPA is making an effort to clean it up and keep it consistent, which is very much appreciated. In a future post, we might look at a more complicated exploration of the EPA API, which has data going back much further for some monitors and some variables, and seems to be one of the better organized and documented government API’s we have come across.

# Bind lists to data.tableac_dt <- rbindlist(ac)# Clean namesac_dt <- janitor::clean_names(ac_dt)

Exploration and Preparation

We will start off with the full 34 columns, but throw out the identifier rows where there is only one unique value. We can see that there are only 10,416 unique dates though there are 55,224 rows, so the many fields are layered in the data set. Four of the logical columns are all missing, so they will have to go. There are 14 unique values in the parameter_name field, so we will have to explore those. A majority of the pollutant standard rows are missing. We can also see “aqi”(Air Quality Index) which we consider to be a parameter is included in a separate column as a character. The two main measurements for all the other parameters are the “arithmetic_mean” and “first_maximum_value”. There are a couple of time-related variables including year, day_in_year and date_local. There are a lot of fields with only one unique value to identify the monitor, so these can all be dropped. Its pretty messy, so he best thing we can think of doing is to tidy up the data set so it is easier to work with.

      # Summarize data.table    skimzr::skim(ac_dt)  
Table 1: Data summary
Nameac_dt
Number of rows55224
Number of columns34
_______________________
Column type frequency:
character15
Date1
logical4
numeric14
________________________
Group variablesNone

Variable type: character

skim_variablen_missingcomplete_rateminmaxemptyn_uniquewhitespace
Datum0155010
Parameter Name015260140
Duration Description01623060
Pollutant Standard010173549760
Units of Measure01529080
Exceptional Data Type0148020
AQI011301570
Daily Criteria Indicator0111020
State Name011111010
County Name0199010
City Name011919010
Local Site Name012020010
Address013131010
MSA or CBSA Name013131010
Data Source011313010

Variable type: Date

skim_variablen_missingcomplete_rateminmaxmediann_unique
Date (Local)011990-01-012020-03-312002-08-2910416

Variable type: logical

skim_variablen_missingcomplete_ratemeancount
Nonreg Observation Count552240NaN:
Nonreg Arithmetic Mean552240NaN:
Nonreg First Maximum Value552240NaN:
Tribe Name552240NaN:

Variable type: numeric

skim_variablen_missingcomplete_ratemeansdp0p25p50p75p100hist
State Code019.000.009.009.009.009.009.00▁▁▇▁▁
County Code011.000.001.001.001.001.001.00▁▁▇▁▁
Site Number0117.000.0017.0017.0017.0017.0017.00▁▁▇▁▁
Parameter Code0155689.149429.1142401.0044201.0061102.0062101.0082403.00▅▁▇▁▁
POC011.010.291.001.001.001.009.00▇▁▁▁▁
Latitude0141.000.0041.0041.0041.0041.0041.00▁▁▇▁▁
Longitude01-73.590.00-73.59-73.59-73.59-73.59-73.59▁▁▇▁▁
Year012002.868.221990.001996.002002.002009.002020.00▇▆▇▃▃
Day In Year (Local)01181.5096.441.00106.00181.00258.00366.00▆▇▇▇▅
Observation Count0121.285.601.0023.0024.0024.0024.00▁▁▁▁▇
Observation Percent0197.449.074.00100.00100.00100.00100.00▁▁▁▁▇
Arithmetic Mean0144.4275.70-7.500.045.0056.62353.00▇▁▁▁▁
First Maximum Value0165.56112.18-5.000.0710.0064.00360.00▇▁▁▁▁
First Maximum Hour0110.896.690.006.0012.0015.0023.00▆▅▇▇▅

First we will drop all of the identifier rows with only one unique value (before column 9 and after column 27), and also the “tribe” and “nonreg” columns using data.table::patterns(). We then convert the air quality index (“aqi”) column to numeric for the cases where it is not missing. We are not clear why the “aqi” is not included in the “parameter_name” variable with the other measures, but seems to be associated with rows which have “ozone” and “sulfur dioxide” (two of five variables which compose the “aqi” itself). Air Quality is also stored in by the 1-hour average and separately a single 8-hour measurements for each day, and these numbers can be significantly different.

# Drop unneeded colsac_dt <- ac_dt[, c(9:27)][, .SD, .SDcols = !patterns("tribe|nonreg")]# Convert aqi to integerac_dt[, aqi := as.integer(str_extract(aqi, "\\d*"))]

We add the three measurement columns to value and 12 identifier columns to variable. We decided to separate the “aqi” index column from the rest of the data which is identified in the “parameter_name” column before tidying, and then bind them back together with three variables (“aqi”, “arithmetic_mean” and “first_maximum_value”).

# Separate out aqiaqi <- ac_dt[!is.na(aqi)]# Tidy key measures for parameters other than aqimeasures <- c("first_maximum_value", "arithmetic_mean")ids <- setdiff(names(ac_dt), measures)ids <- ids[!str_detect(ids, "aqi")]ac_dt_tidy <-  ac_dt[,         melt(.SD,             idcols = ids,             measure.vars = measures),        .SDcols = !"aqi"]# Tidy up aqiaqi <-   aqi[,       melt(.SD,           idcols = ids,           measure.vars = "aqi"),      .SDcols = !measures]# Put two tidied data sets back togetherac_dt_tidy <- rbind(ac_dt_tidy, aqi)# Show sample rowsac_dt_tidy             parameter_name    duration_description pollutant_standard     1: Outdoor Temperature                  1 HOUR                        2:      Sulfur dioxide                  1 HOUR    SO2 1-hour 2010     3:      Sulfur dioxide            3-HR BLK AVG    SO2 3-hour 1971     4:      Sulfur dioxide            3-HR BLK AVG    SO2 3-hour 1971     5: Outdoor Temperature                  1 HOUR                       ---                                                               120581:               Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-hour 2015120582:               Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-hour 2015120583:               Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-hour 2015120584:               Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-hour 2015120585:               Ozone 8-HR RUN AVG BEGIN HOUR  Ozone 8-hour 2015        date_local year day_in_year_local   units_of_measure     1: 1990-01-01 1990                 1 Degrees Fahrenheit     2: 1990-01-01 1990                 1  Parts per billion     3: 1990-01-01 1990                 1  Parts per billion     4: 1990-01-02 1990                 2  Parts per billion     5: 1990-01-02 1990                 2 Degrees Fahrenheit    ---                                                     120581: 2020-03-27 2020                87  Parts per million120582: 2020-03-28 2020                88  Parts per million120583: 2020-03-29 2020                89  Parts per million120584: 2020-03-30 2020                90  Parts per million120585: 2020-03-31 2020                91  Parts per million        exceptional_data_type observation_count observation_percent     1:                  None                24                 100     2:                  None                23                  96     3:                  None                 7                  88     4:                  None                 7                  88     5:                  None                24                 100    ---                                                            120581:                  None                17                 100120582:                  None                17                 100120583:                  None                17                 100120584:                  None                17                 100120585:                  None                12                  71        first_maximum_hour daily_criteria_indicator            variable value     1:                  0                        Y first_maximum_value  43.0     2:                  8                        Y first_maximum_value  11.0     3:                  8                        Y first_maximum_value   9.3     4:                 23                        Y first_maximum_value  17.6     5:                 14                        Y first_maximum_value  42.0    ---                                                                      120581:                 10                        Y                 aqi  48.0120582:                 22                        Y                 aqi  46.0120583:                  8                        Y                 aqi  46.0120584:                  7                        Y                 aqi  37.0120585:                 16                        N                 aqi  42.0

When we graph with “parameter_name” facets including separate colors for the mean and maximum values, we can see a few things. There are a few gaps in collection including a big one in sulfur dioxide from about 1997-2005. The Air Quality Index first created in the Clean Air Act has the following components: ground-level ozone, particulate matter, carbon monoxide, sulfur dioxide, and nitrogen dioxide. We are unsure how they calculate the AQI in our data set for the full period, because of the period where sulfur dioxide is missing. When we read up on AQI, we learned that there may be several ways of calculating the AQI. We will leave the details for later research.

# Look for missing periodsac_dt_tidy[  variable %in% c("arithmetic_mean", "first_maximum_value"),   ggplot(.SD,         aes(date_local,             y = value,             color = variable)) +    geom_line() +    facet_wrap( ~ parameter_name, scale = 'free') +    theme_bw() +    labs(caption = "Source: EPA Monitor 09-001-0017"    )]

Wind is Lowest during the Summer

We had hoped to look at the wind speeds during Hurricane Sandy, which hit us hard, but apparently, the monitor was knocked out, so there are no measurements for that date or for several months subsequent, so it looks like we may not do a lot with the wind data. It is hard to find much in the charts above, so we averaged up the values by month. We might have guessed, but hadn’t thought that wind was as seasonal as it seems to be below.

ac_dt_tidy[  str_detect(parameter_name, "Wind") &    variable %in% c("first_maximum_value", "arithmetic_mean"),  .(avg_speed = mean(value)), by = .(month(date_local), parameter_name, variable)][,  ggplot(.SD, aes(month, avg_speed, color = variable)) +    geom_line() +    facet_wrap( ~ parameter_name, scales = "free_y") +    theme_bw() +     labs(      x = "Month",      y = "Wind Speed",      caption = "Source: EPA Monitor 09-001-0017"    ) ]

Air Quality Has Improved

The data set actually records 4 measurements for ozone, the average and maximum values by hour, and separately, for 8-hour periods. The EPA sets a threshold for the level of Ozone to be avoided at 0.064, and days above this are shown in red. It looks like the “first_maximum_value” very often registers undesirable levels, although the hourly reading does much less so. We can see that there are clearly fewer unhealthy days over time, and only two unhealthy days based on the hourly arithmetic average since 2003. We can also see that the low end of the readings has been moving up over time, even though well in the healthy zone.

ac_dt_tidy[  parameter_name == "Ozone" &    variable %in% c("arithmetic_mean", "first_maximum_value")][  ][,    ggplot(.SD,           aes(date_local,               y = value)) +      geom_point(aes(color = cut(        value,        breaks = c(0, 0.064, 0.3),        labels = c("Good", "Unhealthy")      )),      size = 1) +      scale_color_manual(        name = "Ozone",        values = c("Good" = "green1",                   "Unhealthy" = "red1")) +      theme_bw() +      labs(x = "Year",           y = "Ozone",           caption = "Source: EPA Monitor 09-001-0017") +      facet_wrap(~ variable + duration_description, scales = "free_y")]

We can see in the chart looking at Air Quality based on the “Ozone 8-hour 2015” parameter below, that if the EPA calculates it, it doesn’t report AQI during the winter months, which probably makes sense because people are not out and air quality appears to be worst in the summer. Sometimes we get the iPhone messages about Air Quality and naturally worry, but when we look at the AQI daily over the last 30 years, we can see that the number of “Unhealthy” days has been declining similar two what we saw above with Ozone, and the last “Very Unhealthy” day was in 2006. The same trend with the low end of the AQI rising a little over time is apparent.

ac_dt_tidy[  variable == "aqi" &    pollutant_standard == "Ozone 8-hour 2015"][  ][,     ggplot(.SD,            aes(date_local,                y = value)) +       geom_point(aes(color = cut(         value,         breaks = c(0, 50, 100, 150, 200, 300),         labels = c(           "Good",           "Moderate",           "Unhealthy - Sensitive",           "Unhealthy",           "Very Unhealthy"         )       )),       size = 1) +       scale_color_manual(         name = "AQI",         values = c(           "Good" = "green1",           "Moderate" = "yellow",           "Unhealthy - Sensitive" = "orange",           "Unhealthy" = "red",           "Very Unhealthy" = "violetred4"         )       ) +       theme_bw() +       labs(x = "Year",            y = "Air Quality Indicator (AQI)",            caption = "Source: EPA Monitor 09-001-0017"            )]

A heatmap is another way to look at the Ozone which better shows the time dimension. The y-axis shows the day of the year, so the most unhealthy air quality is between days 175-225, or the end of June through the first half of August. We can also see that “Unhealthy” days might even have outnumbered healthy days back in the early 1990s, but we rarely see above “moderate” now.

breaks <- c(0, 50, 100, 150,200, 300, 1000)labels <- c("Good", "Moderate", "Unhealty - Sensitive Groups", "Unhealthy", "Very Unhealthy", "Hazardous")ac_dt[parameter_name == "Ozone" &        exceptional_data_type == "None", .(          year,          day_in_year_local,          observation_count,          duration_description,          date_local,          aqi= as.integer(str_extract(aqi, "\\d*")),          parameter_name,          `Air Quality` = cut(            as.integer(str_extract(aqi, "\\d*")),            breaks = breaks,            labels = labels          )        )][!is.na(`Air Quality`) &              day_in_year_local %in% c(90:260),           ggplot(.SD, aes(year, day_in_year_local, fill = `Air Quality`)) +              geom_tile() +             theme_bw() +             labs(               x = "Year",               y = "Day of Year",                caption = "Source: EPA Monitor 09-001-0017"              )]

Temperature

We can see above the dot plot of the Outside Temperature over the period. Hot days are defined as above 85, and very hot above 95, while cold are below 32. There isn’t much of a trend visible in the middle of the graphs. As might be expected the daily first maximum highs and lows tend to be significantly above the daily average levels. All in all, if there is change, it is less definitive than the air quality data looking at it this way.

ac_dt_tidy[  parameter_name == "Outdoor Temperature" &    variable %in% c("arithmetic_mean", "first_maximum_value")][  ][,  ggplot(.SD,         aes(date_local,             y = value)) +    geom_point(aes(      color = cut(value,                   breaks = c(-20, 32, 50, 65, 85, 95, 120),                  labels = c("Very Cold", "Cold", "Cool", "Moderate", "Hot", "Very Hot"))),             size = 1) +    scale_color_manual(        name = "Outside Temperature",        values = c(          "Very Cold" = "blue",          "Cold" = "yellow",          "Cool" = "green1",          "Moderate" = "green4",          "Hot" = "orange",          "Very Hot" = "red"          )      ) +    theme_bw() +     labs(      x = "Year",      y = "Outside Temperature",      caption = "Source: EPA Monitor 09-001-0017"    ) +    facet_wrap(~ variable)]

We also tried to look at the change in temperature over the period versus the first five years (1990-1995). By doing this, we probably learned more about heat maps than about the temperature. It does look like the bigger changes in temperature have probably happened more at the beginning and the end of the year. Movements in the maximum temperatures seem more pronounced than the averages, but again it makes sense that this would be the case.

temperature <-  ac_dt[parameter_name == "Outdoor Temperature",        c("year",          "day_in_year_local",          "arithmetic_mean",          "first_maximum_value")]baseline <-   temperature[year < 1995,              .(base_mean = mean(arithmetic_mean),                base_max = mean(first_maximum_value)), day_in_year_local]temperature <-   baseline[temperature[year > 1994], on = c("day_in_year_local")][,     `:=`(change_avg = arithmetic_mean - base_mean,         change_max = first_maximum_value - base_max)]temperature <-  temperature[, melt(    .SD,    id.vars = c("day_in_year_local", "year"),    measure.vars = c("change_max", "change_avg")  )]temperature[  year %in% c(1995:2019) &    !is.na(value),   ggplot(.SD,         aes(year,             day_in_year_local,             fill = cut(               value,               breaks = c(-100, -15, -5, 5, 15, 100),               labels = c("Much Colder", "Colder", "Similar", "Warmer", "Much Warmer")             ))) +    geom_tile() +    scale_fill_manual(name = "Temp. Change",                      values = c("skyblue4", "skyblue", "green", "red", "red4")) +    theme_bw() +    labs(      title = "Days Compared to 1990-1994 Average Temp. on That Day",      subtitle = "Hotter Days Shown Redder",      x = "Year",      y = "Day of Year",      caption = "Source: EPA"    ) +    facet_wrap(~ variable)]

Thoughts on Heatmaps

The interesting thing we learned about heat maps is how much we could control the perception of the chart based on our decisions about the size of the groupings and the color choices. Dark colors on the days with the biggest temperature increases could flood the chart with red. If we chose equal equal sized groups for the cut-offs, there would be a lot more days for the average which were colder (as shown below), but a lot more for the max which were hotter. It made us more wary of heat maps.

# Uneven hand selected cutoffs to find more balanced countslapply(list(cut(temperature[variable == "change_avg" &                    !is.na(value)]$value, c(-100,-15,-5, 5, 15, 100)), cut(temperature[variable == "change_max" &                    !is.na(value)]$value, c(-100,-15,-5, 5, 15, 100))), summary)[[1]](-100,-15]   (-15,-5]     (-5,5]     (5,15]   (15,100]        169       1489       4929       1786         65 [[2]](-100,-15]   (-15,-5]     (-5,5]     (5,15]   (15,100]        286       2016       4155       1821        160 # Even range limits with less even countslapply(list(cut(temperature[variable == "change_avg" &                    !is.na(value)]$value, 5),cut(temperature[variable == "change_max" &                    !is.na(value)]$value, 5)), summary)[[1]]    (-38,-25]   (-25,-12.1] (-12.1,0.827]  (0.827,13.8]   (13.8,26.8]             5           305          4253          3767           108 [[2]](-28.8,-15.8] (-15.8,-2.95]  (-2.95,9.95]   (9.95,22.9]   (22.9,35.8]           215          2858          4659           686            20 

Conclusion

That wraps up this quick exploration of an EPA monitor in Connecticut. We still have many questions about air quality, temperature and wind speed. We wonder why the EPA chose to put the monitor down by the edge of the water away from the heavy traffic of I-95 and Route 1 and the bulk of the population. We didn’t have a lot of time to spend, and acknowledge that we may have misread or misinterpreted some of the data, but now we at least know what to look for. The purpose of this blog is to explore, learn and get better, faster and more accurate in data analysis. If you are interested in learning R and would like to see what it takes to learn R, we invite you to download our free data science cheat sheet and join our free R webinar on January 20, 2021.

Author: David Lucy, Founder of Redwall Analytics David spent 25 years working with institutional global equity research with several top investment banking firms.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: business-science.io.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post 30 Year Weather Data Analysis first appeared on R-bloggers.

Predicting the Winner of Super Bowl LV

$
0
0

[This article was first published on R | JLaw's R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

TL;DR

  • Using Pythagorean expectation we should expect the Baltimore Ravens to be Super Bowl Champions
  • Using a Bradley-Terry model we should expect the Kansas City Chiefs to be Super Bowl champions
  • Seems like it will be a good year for the AFC

It’s Playoff Time in the NFL!. While my team has unfortunately missed the playoffs, I wanted to take advantage of the season to try to predict who will win the Super Bowl this year through two different mechanisms:

  1. Pythagorean Expectation
  2. Simulation using Bradley-Terry Models

Getting the Data

While ideally having more historical data would be better, I’m going to keep this exercise quick and dirty by only using the data from the 2020 NFL Regular Season which recently concluded. Data for this season can be easily imported using the nflfastR package. By using the fast_scraper_schedules function, I can quickly get all the games and their results for the 2020 season.

library(tidyverse)library(nflfastR)library(scales)#Get Season 2020 Schedule and resultsnfl_games <- fast_scraper_schedules(2020) %>%   #Weeks Beyond Week 17 Are the Playoffs  filter(week <= 17)knitr::kable(head(nfl_games, 3))
game_idseasongame_typeweekgamedayweekdaygametimeaway_teamhome_teamaway_scorehome_scorehome_resultstadiumlocationroofsurfaceold_game_id
2020_01_HOU_KC2020REG12020-09-10Thursday20:20HOUKC203414Arrowhead StadiumHomeoutdoorsNA2020091000
2020_01_SEA_ATL2020REG12020-09-13Sunday13:00SEAATL3825-13Mercedes-Benz StadiumHomeNANA2020091300
2020_01_CLE_BAL2020REG12020-09-13Sunday13:00CLEBAL63832M&T Bank StadiumHomeoutdoorsNA2020091301

The package returned both the data I’m looking for, but also a lot of additional data that could be used if necessary (day of week, dome vs. outdoor, etc.).

Method 1: Pythagorean expectation

Pythagorean expectation was developed by Bill James for Baseball and estimates the % of games that a team “should win” based on runs scored and runs allowed.

It was adapted for Pro Football by Football Outsiders to use the following formula:

Football Outside Almanac in 2011 stated that “From 1988 through 2004, 11 of 16 Super Bowls were won by the team that led the NFL in Pythagorean wins, while only seven were won by the team with the most actual victories”

There needs to be a little data manipulation to get the NFL schedule data into a format to calculate the pythagorean expectation. Most notably splitting each game into two rows of data to capture information on both the home team and away teams.

p_wins <- nfl_games %>%   pivot_longer(    cols = c(contains('team')),    names_to = "category",    values_to = 'team'  ) %>%   mutate(points_for = (category=='home_team')*home_score+           (category=='away_team')*away_score,         points_against = (category=='away_team')*home_score+           (category=='home_team')*away_score  ) %>%   group_by(team) %>%  summarize(pf = sum(points_for, na.rm = T),            pa = sum(points_against, na.rm = T),            actual_wins = sum(points_for > points_against, na.rm = T),            .groups = 'drop'  ) %>%   mutate(p_expectation = pf^2.37/(pf^2.37+pa^2.37)*16)

By pythagorean expectation the top 3 teams in the NFL are:

teampoints_forpoints_againstactual_winsexpected_wins
BAL4683031111.8
NO4823371211.2
TB4923551110.9

According to Pythagorean Expectation, the Baltimore Ravens are the best team in the NFL while the formula would say that the Kansas City Chiefs, the team with the most actual wins, “should” have only had 10.5 wins vs. the 14 actual wins they had.

An aside: Who “outkicked their coverage”?

The concept of “Expected Wins” allows us to see who outperformed their expectation vs. under-performed. The following plot shows actual wins on the x-axis and expected wins on the y-axis.

library(ggrepel)p_wins %>%   mutate(diff_from_exp = actual_wins - p_expectation) %>%   ggplot(aes(x = actual_wins, y = p_expectation, fill = diff_from_exp)) +     geom_label_repel(aes(label = team)) +     geom_abline(lty = 2) +     annotate("label", x = 1, y = 10, hjust = 'left', label = "Underachievers") +    annotate("label", x = 10, y = 5, hjust = 'left', label = "Overachievers") +    labs(x = "Actual Wins", y = "Expected Wins",          title = "What NFL Teams Over/Under Performed?",          caption = "Expected Wins Based on Pythagorian Expectation") +     scale_fill_gradient2(guide = F) +     cowplot::theme_cowplot()

The largest over-achievers appear to be Kansas city, and Cleveland while the largest under-achievers were Atlanta and Jacksonville.

Method #2: Simulation with Bradley-Terry Models

Bradley-Terry Models are probability models to predict the outcomes of paired comparisons (such as sporting events or ranking items in a competition).

In this case, to predict the future winner of Super Bowl LV. I’ll be using regular season data to estimate “ability parameters” for each team and then using those parameters to run simulations to estimate the winners of the NFL Playoff Match-ups.

The Bradley-Terry Model can be fit using the BradleyTerry2 package.

Step 1: Reshaping the Data

The BradleyTerry2 package can take data in a number of different ways but it is opinionated about the structure so we’ll need to reshape the data to get it into a format that the package wants.

Specifically, it can take in data similar to how glm() can use counts to fit a logistic regression. In this case it would be similar to:

BTm(cbind(win1, win2), team1, team2, ~ team, id = "team", data = sports.data)

The inclusion of only team in the formula means that only the “team” factors are used to estimate abilities. Other predictors can be added such as a home-field advantage but considering the nature of the 2020 season, I’m going to assume there was no home field advantage. The id="team" portion of the formula tells the function how to label factors for the output. For example the team “NYG” will become the “teamNYG” predictor.

Given the nature of the NFL schedule there shouldn’t be any repeats of Home/Away combinations. But to be sure we can group_by() and summarize().

Since the package used for modeling requires that each team variable has the same factor levels, I’ll recode home_team and away_team with new levels.

#Get List of All Teamsall_teams <- sort(unique(nfl_games$home_team))nfl_shaped <- nfl_games %>%  mutate(    home_team = factor(home_team, levels = all_teams),    away_team = factor(away_team, levels = all_teams),    home_wins = if_else(home_score > away_score, 1, 0),    away_wins = if_else(home_score < away_score, 1, 0)   ) %>%   group_by(home_team, away_team) %>%   summarize(home_wins = sum(home_wins),            away_wins = sum(away_wins),            .groups= 'drop') knitr::kable(head(nfl_shaped, 3), align = 'c')
home_teamaway_teamhome_winsaway_wins
ARIBUF10
ARIDET01
ARILA01

Step 2: Fitting the Bradley-Terry Model

The Bradley-Terry model can be fit similar to how other models like glm() are fit. By default, the first factor alphabetically becomes the reference factor and takes a coefficient of zero. All other coefficients are relative to that factor.

library(BradleyTerry2)base_model <- BTm(cbind(home_wins, away_wins), home_team, away_team,                  data = nfl_shaped, id = "team")

The summary() function will provide information on residuals, coefficients, and statistical significance, but for brevity, I’ll skip that output.

Step 3: Extracting the Team Abilities

While the package contains a BTAbilities() function to extract the abilities and their standard errors. The qvcalc() function will output abilities along with quasi-standard errors. The advantage of using quasi standard errors is that for the reference category the ability estimate and standard error will both be 0 while quasi-standard errors will be non-zero. The use of quasi-standard errors allow for any comparison.

base_abilities <- qvcalc(BTabilities(base_model)) %>%   .[["qvframe"]] %>%   as_tibble(rownames = 'team') %>%   janitor::clean_names()knitr::kable(base_abilities %>%                mutate(across(where(is.numeric), round, 2)) %>%                head(3),             align = 'c')
teamestimatesequasi_sequasi_var
ARI0.000.000.570.32
ATL-0.910.880.640.41
BAL1.060.890.650.42

Step 4: Simulating Playoff Matchups

To determine each team’s likelihood of winning their match-up I run 1,000 simulations pulling from a distribution of the ability scores using team ability and standard error as parameters. The percent of those 1,000 simulations won by each each represents the likelihood of winning that match-up.

To generate the 1,000 simulations I use the tidyr::crossing() function to replicate each row 1,000 times; then using dplyr to summarize over all simulations.

Since running this for any arbitrary combination of teams isn’t too time consuming, I’ll generate every combination of playoff team across the NFC and AFC even though at least half of these comparisons will be impossible in practice.

playoff_teams = c('BAL', 'BUF', 'CHI', 'CLE', 'GB', 'IND', 'KC', 'LA', 'NO',                  'PIT', 'SEA', 'TB', 'TEN', 'WAS')comparisons <- base_abilities %>%   filter(team %in% playoff_teams)#Generate All Potential Combination of Playoff Teamscomparisons <- comparisons %>%   rename_with(~paste0("t1_", .x)) %>%   crossing(comparisons %>% rename_with(~paste0("t2_", .x)))  %>%   filter(t1_team != t2_team)#Run 1000 Simulations per comparisonset.seed(20210107)#Draw from Ability Distributionsimulations <- comparisons %>%   crossing(simulation = 1:1000) %>%   mutate(    t1_val = rnorm(n(), t1_estimate, t1_quasi_se),    t2_val = rnorm(n(), t2_estimate, t2_quasi_se),    t1_win = t1_val > t2_val,    t2_win = t2_val > t1_val  )#Roll up the 1000 Resultssim_summary <- simulations %>%   group_by(t1_team, t2_team, t1_estimate, t2_estimate) %>%   summarize(t1_wins_pct = mean(t1_win), #Long-Term Average Winning % for Team 1            t2_wins_pct = mean(t2_win), #Long-Term Average Winning % for Team 2            .groups = 'drop') %>%   mutate(    #Create a label for the winner    winner = if_else(t1_wins_pct > t2_wins_pct, t1_team, t2_team)  )

Step 5: And the winner is….

Now since we have all potential combinations we can step through each of the games on the schedule to determine the likelihood of winning that match-up. For rounds after the initial wild-card round, the teams are re-seeded so the #1 seed will play whatever the lowest winning seed is (can be anywhere from #4 to #7). While initially I wanted to look at each team’s likelihood of winning the Super Bowl, I couldn’t quite figure out how to easily determine the probability of each scenario given the re-seeding process. So I will just step through each round based on the result of the previous round.

For simplicity I define a function to take in the two teams and return the ability scores from the simulations above.

winners <- function(t1, t2){  dt = sim_summary %>% filter(t1_team == t1 & t2_team == t2) %>%     inner_join(      nflfastR::teams_colors_logos %>%         filter(team_abbr == t1) %>%         select(t1_team = team_abbr, t1_name = team_name),      by = "t1_team"    ) %>%     inner_join(      nflfastR::teams_colors_logos %>%         filter(team_abbr == t2) %>%         select(t2_team = team_abbr, t2_name = team_name),      by = "t2_team"    )    return(     list(       team1 = dt$t1_name,       team1_prob = dt$t1_wins_pct,       team2 = dt$t2_name,       team2_prob = dt$t2_wins_pct,       winner = if_else(dt$winner == dt$t1_team, dt$t1_name, dt$t2_name)     )  )}

NFC

Wild-Card Round

#2. New Orleans Saints (95%) vs. #7. Chicago Bears (5%)

Winner:New Orleans Saints

#3. Seattle Seahawks (71%) vs. #6. Los Angeles Rams (29%)

Winner:Seattle Seahawks

#4. Washington Football Team (4%) vs. #5. Tampa Bay Buccaneers (96%)

Winner:Tampa Bay Buccaneers

Divisional Round

#1. Green Bay Packers (66%) vs. #5. Tampa Bay Buccaneers (34%)

Winner:Green Bay Packers

#2. New Orleans Saints (60%) vs. #3. Seattle Seahawks (40%)

Winner:New Orleans Saints

NFC Championship Game

#1. Green Bay Packers (55%) vs. #2. New Orleans Saints (45%)

The Green Bay Packers are heading to the Super Bowl!

AFC

Wild-Card Round

#2. Buffalo Bills (91%) vs. #7. Indianapolis Colts (9%)

Winner:Buffalo Bills

#3. Pittsburgh Steelers (68%) vs. #6. Cleveland Browns (32%)

Winner:Pittsburgh Steelers

#4. Tennessee Titans (47%) vs. #5. Baltimore Ravens (53%)

Winner:Baltimore Ravens

Divisional Round

#1. Kansas City Chiefs (89%) vs. #5. Baltimore Ravens (11%)

Winner:Kansas City Chiefs

#2. Buffalo Bills (76%) vs. #3. Pittsburgh Steelers (24%)

Winner:Buffalo Bills

AFC Championship Game

#1. Kansas City Chiefs (64%) vs. #2. Buffalo Bills (36%)

Kansas City Chiefs is headed to the Super Bowl!

Super Bowl LV

#1. Green Bay Packers (18%) vs. #1. Kansas City Chiefs (82%)

Apparently the NFC and AFC alternate who the home team is and since the Chiefs were the home team in Super Bowl LIV, the NFC representative will be the home team in Super Bowl LV.

Your Super Bowl LV Champions… the Kansas City Chiefs

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R | JLaw's R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Predicting the Winner of Super Bowl LV first appeared on R-bloggers.


Daniel Aleman – The Key Metric for your Forecast is… TRUST

$
0
0

[This article was first published on Why R? Foundation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Three months ago we finished Why R? 2020 conference. One of the memorable moments of the conference were invited highlighted talks! Today we would like to remind you about the talk by Daniel Aleman (from Global BPM at Archroma). The video from the recording is at the end of the post.

Usually Data Scientist focus on: Which language? Package? Type of problem to solve? Technique to apply? What metrics to validate the model etc.. but my own experience and studies have shown that people(the final users) are distrustful of automated model for forecasting. In this presentation, I will not only share my experience using different techniques to have the best forecast analysis, but will disclose the journey to gain the trust of business over the project.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: Why R? Foundation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Daniel Aleman - The Key Metric for your Forecast is... TRUST first appeared on R-bloggers.

Isovists using uniform ray casting in R

$
0
0

[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Isovists are polygons of visible areas from a point. They remove views that are blocked by objects, typically buildings. They can be used to understanding the existing impact of, or where to place urban design features that can change people’s behaviour (e.g. advertising boards, security cameras or trees). Here I present a custom function that creates a visibility polygon (isovist) using a uniform ray casting “physical” algorithm in R.  First we load the required packages (use install.packages() first if these are not already installed in R):
library(sf)library(dplyr)library(ggplot2) 

Data generation

First we create and plot an example footway with viewpoints and set of buildings which block views. All data used should be in the same Coordinate Reference System (CRS). We generate one viewpoint every 50 m (note density here is a function of the st_crs() units, in this case meters)
library(sf)footway <- st_sfc(st_linestring(rbind(c(-50,0),c(150,0))))st_crs(footway) = 3035 viewpoints <- st_line_sample(footway, density = 1/50)viewpoints <- st_cast(viewpoints,"POINT")buildings <- rbind(c(1,7,1),c(1,31,1),c(23,31,1),c(23,7,1),c(1,7,1),                   c(2,-24,2),c(2,-10,2),c(14,-10,2),c(14,-24,2),c(2,-24,2),                   c(21,-18,3),c(21,-10,3),c(29,-10,3),c(29,-18,3),c(21,-18,3),                   c(27,7,4),c(27,17,4),c(36,17,4),c(36,7,4),c(27,7,4),                   c(18,44,5), c(18,60,5),c(35,60,5),c(35,44,5),c(18,44,5),                   c(49,-32,6),c(49,-20,6),c(62,-20,6),c(62,-32,6),c(49,-32,6),                   c(34,-32,7),c(34,-10,7),c(46,-10,7),c(46,-32,7),c(34,-32,7),                   c(63,9,8),c(63,40,8),c(91,40,8),c(91,9,8),c(63,9,8),                   c(133,-71,9),c(133,-45,9),c(156,-45,9),c(156,-71,9),c(133,-71,9),                   c(152,10,10),c(152,22,10),c(164,22,10),c(164,10,10),c(152,10,10),                   c(44,8,11),c(44,24,11),c(59,24,11),c(59,8,11),c(44,8,11),                   c(3,-56,12),c(3,-35,12),c(27,-35,12),c(27,-56,12),c(3,-56,12),                   c(117,11,13),c(117,35,13),c(123,35,13),c(123,11,13),c(117,11,13),                   c(66,50,14),c(66,55,14),c(86,55,14),c(86,50,14),c(66,50,14),                   c(67,-27,15),c(67,-11,15),c(91,-11,15),c(91,-27,15),c(67,-27,15))buildings <- lapply( split( buildings[,1:2], buildings[,3] ), matrix, ncol=2)buildings   <- lapply(X = 1:length(buildings), FUN = function(x) {  st_polygon(buildings[x])})buildings <- st_sfc(buildings)st_crs(buildings) = 3035 # plot raw dataggplot() +  geom_sf(data = buildings,colour = "transparent",aes(fill = 'Building')) +  geom_sf(data = footway, aes(color = 'Footway')) +  geom_sf(data = viewpoints, aes(color = 'Viewpoint')) +  scale_fill_manual(values = c("Building" = "grey50"),                     guide = guide_legend(override.aes = list(linetype = c("blank"),                                         nshape = c(NA)))) +    scale_color_manual(values = c("Footway" = "black",                                 "Viewpoint" = "red",                                "Visible area" = "red"),                     labels = c("Footway", "Viewpoint","Visible area"))+  guides(color = guide_legend(    order = 1,    override.aes = list(      color = c("black","red"),      fill  = c("transparent","transparent"),      linetype = c("solid","blank"),      shape = c(NA,16))))+  theme_minimal()+  coord_sf(datum = NA)+  theme(legend.title=element_blank())

Isovist function

Function inputs

Buildings should be cast to "POLYGON" if they are not already
buildings <- st_cast(buildings,"POLYGON")

Creating the function

A few parameters can be set before running the function. rayno is the number of observer view angles from the viewpoint. More rays are more precise, but decrease processing speed.raydist is the maximum view distance. The function takessfc_POLYGON type and sfc_POINT objects as inputs for buildings abd the viewpoint respectively.If points have a variable view distance the function can be modified by creating a vector of view distance of length(viewpoints) here and then selecting raydist[x] in st_buffer below.Each ray is intersected with building data within its raycast distance, creating one or more ray line segments. The ray line segment closest to the viewpoint is then extracted, and the furthest away vertex of this line segement is taken as a boundary vertex for the isovist. The boundary vertices are joined in a clockwise direction to create an isovist.
st_isovist <- function(  buildings,  viewpoint,    # Defaults  rayno = 20,  raydist = 100) {    # Warning messages  if(!class(buildings)[1]=="sfc_POLYGON")     stop('Buildings must be sfc_POLYGON')  if(!class(viewpoint)[1]=="sfc_POINT") stop('Viewpoint must be sf object')    rayends     <- st_buffer(viewpoint,dist = raydist,nQuadSegs = (rayno-1)/4)  rayvertices <- st_cast(rayends,"POINT")    # Buildings in raydist  buildintersections <- st_intersects(buildings,rayends,sparse = FALSE)    # If no buildings block max view, return view  if (!TRUE %in% buildintersections){    isovist <- rayends  }    # Calculate isovist if buildings block view from viewpoint  if (TRUE %in% buildintersections){        rays <- lapply(X = 1:length(rayvertices), FUN = function(x) {      pair      <- st_combine(c(rayvertices[x],viewpoint))      line      <- st_cast(pair, "LINESTRING")      return(line)    })        rays <- do.call(c,rays)    rays <- st_sf(geometry = rays,                  id = 1:length(rays))        buildsinmaxview <- buildings[buildintersections]    buildsinmaxview <- st_union(buildsinmaxview)    raysioutsidebuilding <- st_difference(rays,buildsinmaxview)        # Getting each ray segement closest to viewpoint    multilines  <- dplyr::filter(raysioutsidebuilding, st_is(geometry, c("MULTILINESTRING")))    singlelines <- dplyr::filter(raysioutsidebuilding, st_is(geometry, c("LINESTRING")))    multilines  <- st_cast(multilines,"MULTIPOINT")    multilines  <- st_cast(multilines,"POINT")    singlelines <- st_cast(singlelines,"POINT")        # Getting furthest vertex of ray segement closest to view point    singlelines <- singlelines %>%       group_by(id) %>%      dplyr::slice_tail(n = 2) %>%      dplyr::slice_head(n = 1) %>%      summarise(do_union = FALSE,.groups = 'drop') %>%      st_cast("POINT")        multilines  <- multilines %>%       group_by(id) %>%      dplyr::slice_tail(n = 2) %>%      dplyr::slice_head(n = 1) %>%      summarise(do_union = FALSE,.groups = 'drop') %>%      st_cast("POINT")        # Combining vertices, ordering clockwise by ray angle and casting to polygon    alllines <- rbind(singlelines,multilines)    alllines <- alllines[order(alllines$id),]     isovist  <- st_cast(st_combine(alllines),"POLYGON")  }  isovist}

Running the function in a loop

It is possible to wrap the function in a loop to get multiple isovists for a multirow sfc_POINT object. There is no need to heed the repeating attributes for all sub-geometries warning as we want that to happen in this case.
isovists   <- lapply(X = 1:length(viewpoints), FUN = function(x) {  viewpoint   <- viewpoints[x]  st_isovist(buildings = buildings,             viewpoint = viewpoint,             rayno = 41,             raydist = 100)})
All isovists are unioned to create a visible area polygon, which can see plotted over the original path, viewpoint and building data below.
isovists <- do.call(c,isovists)visareapoly <- st_union(isovists) ggplot() +  geom_sf(data = buildings,colour = "transparent",aes(fill = 'Building')) +  geom_sf(data = footway, aes(color = 'Footway')) +  geom_sf(data = viewpoints, aes(color = 'Viewpoint')) +  geom_sf(data = visareapoly,fill="transparent",aes(color = 'Visible area')) +  scale_fill_manual(values = c("Building" = "grey50"),                     guide = guide_legend(override.aes = list(linetype = c("blank"),                                          shape = c(NA)))) +  scale_color_manual(values = c("Footway" = "black",                                 "Viewpoint" = "red",                                "Visible area" = "red"),                     labels = c("Footway", "Viewpoint","Visible area"))+  guides( color = guide_legend(    order = 1,    override.aes = list(      color = c("black","red","red"),      fill  = c("transparent","transparent","white"),      linetype = c("solid","blank", "solid"),      shape = c(NA,16,NA))))+  theme_minimal()+  coord_sf(datum = NA)+  theme(legend.title=element_blank())


Isovists using uniform ray casting in R was first posted on January 6, 2021 at 4:46 pm. var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Isovists using uniform ray casting in R first appeared on R-bloggers.

RObservations #7 – #TidyTuesday – Analysing Coffee Ratings Data

$
0
0

[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction

Around four years ago I was given a copy of Time Magazine’s specialty issue on Coffee together with a French press as a gift. At the time, I was satisfied with a regular instant cup of joe and did not know much about the vastness and culture of the industry. However, it was thanks to these gifts that I was able to learn a lot about coffee, such as the two major species of beans (Arabica and Robusta),the tasting process done by connoisseurs to rank various coffees(called “cupping”), about the altitude, climate and countries various coffees grow around the world. If you read this specialty issue by Time, you probably not only got a more expensive interest piqued (if you haven’t already), but also probably learned enough to hold your own with the the best of the coffee snobs out there.

(PSA- this blog is not sponsored by Time Magazine, but I won’t say no if I got an offer!)

In this blog post we’re going to examine the coffee_ratings dataset released back in the beginning of July 2020 in the Tidy Tuesday Project by R4DS. I initially started analyzing this dataset seeking to answer a lot of questions. But, because there is so much to discover and analyze from this relatively small dataset, I thought it is best to try to focus my question on a very simple one:

Where in the world can I find the best coffee beans?

While this question seems simple enough. There is a lot to uncover to answer this question.

Our Data (Some Exploratory Data Analysis)

Loading our data

I am loading the data with the tidytuesdayR package, if you want you can load the raw data with the readr package’s read_csv() function as well.

A Quick Glimpse

library(tidyverse)coffee_ratings<-tuesdata$coffee_ratingsglimpse(coffee_ratings)## Rows: 1,339## Columns: 43## $ total_cup_points       90.58, 89.92, 89.75, 89.00, 88.83, 88.83, 88.75, 88.67, 88.42, 88.25, 88.08, 87.92, 87.92, 87.92, 87.8...## $ species                "Arabica", "Arabica", "Arabica", "Arabica", "Arabica", "Arabica", "Arabica", "Arabica", "Arabica", "Ar...## $ owner                  "metad plc", "metad plc", "grounds for health admin", "yidnekachew dabessa", "metad plc", "ji-ae ahn",...## $ country_of_origin      "Ethiopia", "Ethiopia", "Guatemala", "Ethiopia", "Ethiopia", "Brazil", "Peru", "Ethiopia", "Ethiopia",...## $ farm_name              "metad plc", "metad plc", "san marcos barrancas \"san cristobal cuch", "yidnekachew dabessa coffee pla...## $ lot_number             NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "YNC-06114", NA, NA, NA, NA, N...## $ mill                   "metad plc", "metad plc", NA, "wolensu", "metad plc", NA, "hvc", "c.p.w.e", "c.p.w.e", "tulla coffee f...## $ ico_number             "2014/2015", "2014/2015", NA, NA, "2014/2015", NA, NA, "010/0338", "010/0338", "2014/15", NA, "unknown...## $ company                "metad agricultural developmet plc", "metad agricultural developmet plc", NA, "yidnekachew debessa cof...## $ altitude               "1950-2200", "1950-2200", "1600 - 1800 m", "1800-2200", "1950-2200", NA, NA, "1570-1700", "1570-1700",...## $ region                 "guji-hambela", "guji-hambela", NA, "oromia", "guji-hambela", NA, NA, "oromia", "oromiya", "snnp/kaffa...## $ producer               "METAD PLC", "METAD PLC", NA, "Yidnekachew Dabessa Coffee Plantation", "METAD PLC", NA, "HVC", "Bazen ...## $ number_of_bags         300, 300, 5, 320, 300, 100, 100, 300, 300, 50, 300, 10, 10, 1, 300, 10, 1, 150, 3, 250, 10, 250, 14, 1...## $ bag_weight             "60 kg", "60 kg", "1", "60 kg", "60 kg", "30 kg", "69 kg", "60 kg", "60 kg", "60 kg", "60 kg", "1 kg",...## $ in_country_partner     "METAD Agricultural Development plc", "METAD Agricultural Development plc", "Specialty Coffee Associat...## $ harvest_year           "2014", "2014", NA, "2014", "2014", "2013", "2012", "March 2010", "March 2010", "2014", "2014", "2014"...## $ grading_date           "April 4th, 2015", "April 4th, 2015", "May 31st, 2010", "March 26th, 2015", "April 4th, 2015", "Septem...## $ owner_1                "metad plc", "metad plc", "Grounds for Health Admin", "Yidnekachew Dabessa", "metad plc", "Ji-Ae Ahn",...## $ variety                NA, "Other", "Bourbon", NA, "Other", NA, "Other", NA, NA, "Other", NA, "Other", "Other", NA, NA, "Othe...## $ processing_method      "Washed / Wet", "Washed / Wet", NA, "Natural / Dry", "Washed / Wet", "Natural / Dry", "Washed / Wet", ...## $ aroma                  8.67, 8.75, 8.42, 8.17, 8.25, 8.58, 8.42, 8.25, 8.67, 8.08, 8.17, 8.25, 8.08, 8.33, 8.25, 8.00, 8.33, ...## $ flavor                 8.83, 8.67, 8.50, 8.58, 8.50, 8.42, 8.50, 8.33, 8.67, 8.58, 8.67, 8.42, 8.67, 8.42, 8.33, 8.50, 8.25, ...## $ aftertaste             8.67, 8.50, 8.42, 8.42, 8.25, 8.42, 8.33, 8.50, 8.58, 8.50, 8.25, 8.17, 8.33, 8.08, 8.50, 8.58, 7.83, ...## $ acidity                8.75, 8.58, 8.42, 8.42, 8.50, 8.50, 8.50, 8.42, 8.42, 8.50, 8.50, 8.33, 8.42, 8.25, 8.25, 8.17, 7.75, ...## $ body                   8.50, 8.42, 8.33, 8.50, 8.42, 8.25, 8.25, 8.33, 8.33, 7.67, 7.75, 8.08, 8.00, 8.25, 8.58, 8.17, 8.50, ...## $ balance                8.42, 8.42, 8.42, 8.25, 8.33, 8.33, 8.25, 8.50, 8.42, 8.42, 8.17, 8.17, 8.08, 8.00, 8.75, 8.00, 8.42, ...## $ uniformity             10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 9.33, 10.00, 10.00, 10.00, 10.00, 10.00, 9.33,...## $ clean_cup              10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10...## $ sweetness              10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 10.00, 9.33, 9.33, 10.00, 10.00, 10.00, 10.00, 10.00, 9.33, ...## $ cupper_points          8.75, 8.58, 9.25, 8.67, 8.58, 8.33, 8.50, 9.00, 8.67, 8.50, 8.58, 8.50, 8.33, 8.58, 8.50, 8.17, 8.33, ...## $ moisture               0.12, 0.12, 0.00, 0.11, 0.12, 0.11, 0.11, 0.03, 0.03, 0.10, 0.10, 0.00, 0.00, 0.00, 0.05, 0.00, 0.03, ...## $ category_one_defects   0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...## $ quakers                0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...## $ color                  "Green", "Green", NA, "Green", "Green", "Bluish-Green", "Bluish-Green", NA, NA, "Green", NA, NA, NA, N...## $ category_two_defects   0, 1, 0, 2, 2, 1, 0, 0, 0, 4, 1, 0, 0, 2, 2, 0, 0, 2, 0, 8, 0, 2, 0, 0, 1, 2, 2, 1, 3, 0, 2, 1, 2, 0, ...## $ expiration             "April 3rd, 2016", "April 3rd, 2016", "May 31st, 2011", "March 25th, 2016", "April 3rd, 2016", "Septem...## $ certification_body     "METAD Agricultural Development plc", "METAD Agricultural Development plc", "Specialty Coffee Associat...## $ certification_address  "309fcf77415a3661ae83e027f7e5f05dad786e44", "309fcf77415a3661ae83e027f7e5f05dad786e44", "36d0d00a37243...## $ certification_contact  "19fef5a731de2db57d16da10287413f5f99bc2dd", "19fef5a731de2db57d16da10287413f5f99bc2dd", "0878a7d4b9d35...## $ unit_of_measurement    "m", "m", "m", "m", "m", "m", "m", "m", "m", "m", "m", "m", "m", "ft", "m", "m", "m", "m", "m", "m", "...## $ altitude_low_meters    1950.0, 1950.0, 1600.0, 1800.0, 1950.0, NA, NA, 1570.0, 1570.0, 1795.0, 1855.0, 1872.0, 1943.0, 609.6,...## $ altitude_high_meters   2200.0, 2200.0, 1800.0, 2200.0, 2200.0, NA, NA, 1700.0, 1700.0, 1850.0, 1955.0, 1872.0, 1943.0, 609.6,...## $ altitude_mean_meters   2075.0, 2075.0, 1700.0, 2000.0, 2075.0, NA, NA, 1635.0, 1635.0, 1822.5, 1905.0, 1872.0, 1943.0, 609.6,...

A quick glimpse of our data (no pun intended) is enough to indicate that our dataset is far from clean. It also looks like there is missing data everywhere. Lets see how much.

Missing Data

library(naniar)vis_miss(coffee_ratings)

Thankfully, it’s not as bad as I thought it was going to be. For the nature of my question I am only going to using the total_cupper_points, country_of_origin, grading_date and species variables which all have little to no missing data (I thought this would be more of an issue, but looking back at it I’m thankful it isn’t for this case.)

Quantites of Coffee per Country

As stated in the description of our dataset (see the readme.md)

“These data were collected from the Coffee Quality Institute’s review pages in January 2018.”

(I am not sure how grammatical that phrase is but ok.)

To better understand our data, lets look at the frequencies of our data in terms of countries listed in our data set. Because there is only one instance of missing data, we will remove it from our plots for aesthetic reasons.

library(ggthemes)# Need to make a new transformed dataset for this visualization(  country_table<-coffee_ratings %>%    count(country_of_origin = factor(country_of_origin)) %>%     mutate(pct = prop.table(n)) %>%    arrange(-pct) %>%     tibble())## # A tibble: 37 x 3##    country_of_origin                n    pct##                              ##  1 Mexico                         236 0.176 ##  2 Colombia                       183 0.137 ##  3 Guatemala                      181 0.135 ##  4 Brazil                         132 0.0986##  5 Taiwan                          75 0.0560##  6 United States (Hawaii)          73 0.0545##  7 Honduras                        53 0.0396##  8 Costa Rica                      51 0.0381##  9 Ethiopia                        44 0.0329## 10 Tanzania, United Republic Of    40 0.0299## # ... with 27 more rows# Together with my knowledge of ggplot and google, these visualizations became possibleggplot(  country_table %>% filter(country_of_origin != "NA"),  mapping = aes(    x = reorder(country_of_origin, n),    y = pct,    group = 1,    label = scales::percent(pct)  )) +  theme_fivethirtyeight() +  geom_bar(stat = "identity",           fill = "#634832") +  geom_text(position = position_dodge(width = 0.9),            # move to center of bars            hjust = -0.05,            #Have Text just above bars            size = 2.5) +  labs(x = "Country of Origin",       y = "Proportion of Dataset") +  theme(axis.text.x = element_text(    angle = 90,    vjust = 0.5,    hjust = 1  )) +  ggtitle("Country of Origin Listed in Coffee Ratings Dataset " ) +   # This Emoji messes up this line in R markdown but hey, it  scale_y_continuous(labels = scales::percent) +                        # looks good.  coord_flip()

From a brief look at our table and bar chart we see that over 54% of our dataset consists of coffees from Mexico, Columbia, Guatemala and Brazil. But this only tells us part of the story, what species of coffees do we have in our dataset from each country?

Before looking at that lets look at the overall Arabica/Robusta proportion in our dataset:

# Need to make a new transformed dataset for this visualizationspecies_table<-coffee_ratings %>%     count(species = factor(species)) %>%     mutate(pct = prop.table(n)) %>% tibble()ggplot(species_table,mapping=aes(x=species,y=pct,group=1,label=scales::percent(pct)))+   theme_fivethirtyeight()+  geom_bar(stat="identity",           fill=c("#634832","#3b2f2f"))+    geom_text(position = position_dodge(width=0.9),    # move to center of bars              vjust=-0.5, #Have Text just above bars              size = 3)+  scale_y_continuous(labels = scales::percent)+  ggtitle("Arabica vs Robusta Proportion in Dataset ")

Wow! only 2% of Coffee in our dataset is from Robusta beans! But if you think about this in context, this shouldn’t be too much of a suprise. Robusta coffee is primarily used in instant coffee,espresso and filler for coffee blends. The reason why Robusta coffee beans are not graded proportionately as Arabica beans are is due to the fact that the quality of these bitter, earthy beans are usually not as desirable to coffee drinkers as their smoother, richer Arabica counterparts.

With that in mind, lets see how the breakdown proportionally per country:

# Need to make a new transformed datasets for this visualization(  arabica_countries<-coffee_ratings %>%   filter(species =="Arabica") %>%     count(species=factor(species),          country=country_of_origin) %>%     mutate(pct = prop.table(n)) %>%     arrange(-n) %>%   tibble())## # A tibble: 37 x 4##    species country                          n    pct##                                 ##  1 Arabica Mexico                         236 0.180 ##  2 Arabica Colombia                       183 0.140 ##  3 Arabica Guatemala                      181 0.138 ##  4 Arabica Brazil                         132 0.101 ##  5 Arabica Taiwan                          75 0.0572##  6 Arabica United States (Hawaii)          73 0.0557##  7 Arabica Honduras                        53 0.0404##  8 Arabica Costa Rica                      51 0.0389##  9 Arabica Ethiopia                        44 0.0336## 10 Arabica Tanzania, United Republic Of    40 0.0305## # ... with 27 more rowsggplot(arabica_countries %>% filter(country!="NA"),       mapping=aes(x=reorder(country,n),y=pct,group=1,label=scales::percent(pct))) +   theme_fivethirtyeight()+  geom_bar(stat="identity",           fill="#634832")+  geom_text(position = position_dodge(width = 0.9),            # move to center of bars            hjust = -0.05,            #Have Text just above bars            size = 2.5) +  ggtitle("Arabica Coffee Countries (for our dataset) ") +   scale_y_continuous(labels = scales::percent) +                       coord_flip()
(  robusta_countries<-coffee_ratings %>%     filter(species =="Robusta") %>%     count(species = factor(species),          country=country_of_origin) %>%     mutate(pct = prop.table(n)) %>%    arrange(-n) %>%   tibble())## # A tibble: 5 x 4##   species country           n    pct##                 ## 1 Robusta India            13 0.464 ## 2 Robusta Uganda           10 0.357 ## 3 Robusta Ecuador           2 0.0714## 4 Robusta United States     2 0.0714## 5 Robusta Vietnam           1 0.0357ggplot(robusta_countries %>% filter(country!="NA"),       mapping=aes(x=reorder(country,n),y=pct,group=1,label=scales::percent(pct))) +   theme_fivethirtyeight()+  geom_bar(stat="identity",           fill="#3b2f2f")+  geom_text(position = position_dodge(width = 0.9),            # move to center of bars            hjust = -0.05,            #Have Text just above bars            size = 2.5) +  ggtitle("Robusta Coffee Countries (for our dataset) ") +   scale_y_continuous(labels = scales::percent) +                       coord_flip()

The Robusta coffees that we have in this dataset are mostly from India and Uganda, with a few coffees from the Ecuador, the United States and Vietnam. With that being known, Lets look at the Arabica/Robusta ratio for countries that we have Robusta Data on.

coffee_ratings %>%   filter(country_of_origin %in% c("India","Uganda","Ecuador","United States","Vietnam")) %>%   count(country_of_origin,species) %>%   group_by(country_of_origin)## # A tibble: 10 x 3## # Groups:   country_of_origin [5]##    country_of_origin species     n##                    ##  1 Ecuador           Arabica     1##  2 Ecuador           Robusta     2##  3 India             Arabica     1##  4 India             Robusta    13##  5 Uganda            Arabica    26##  6 Uganda            Robusta    10##  7 United States     Arabica     8##  8 United States     Robusta     2##  9 Vietnam           Arabica     7## 10 Vietnam           Robusta     1ggplot(coffee_ratings %>% filter(country_of_origin %in% c("India","Uganda","Ecuador","United States","Vietnam")),       mapping=aes(x=country_of_origin,fill=species))+  theme_fivethirtyeight()+  geom_bar(position="fill")+  scale_fill_manual(values=c("#BE9B7B", "#3b2f2f"))+  theme(legend.title = element_blank())+  ggtitle("Arabica/Robusta Ratio from countries with Robusta data ")

Now that we have better understanding of where our coffees come from, we can get into trying to answer the question of where the best coffee beans are in the world.

Well, it depends.

What type? What year?

It would be nice to just pick out the highest rated coffee and be done with it, but that wouldn’t tell us anything (or really motivate a blog post). We need to consider is when was a given coffee graded. That can tell us the performance of a given country’s over time. Additionally, we need to consider the species of bean- where is the best ranked Arabica coffee from? Where is the best Robusta coffee from?

Before we can answer this question, we need to clean the grading_date and convert them into the date data from. Thankfully, the lubridate package will help us with doing this relatively easy. After that we will formulate our data set with the dplyr package to get the data in the form we need for our visualization. |

library(lubridate)# Getting the year data coffee_ratings$new_dates<-coffee_ratings$grading_date %>% mdy()coffee_ratings$score_year<- coffee_ratings$new_dates %>% year()# Dataset for visualizations(  top_annual_score<- coffee_ratings %>%  group_by(species,           score_year,           country_of_origin) %>%   summarise(max_points = max(total_cup_points)) %>%   filter(max_points == max(max_points)) %>%   arrange(-max_points))## # A tibble: 15 x 4## # Groups:   species, score_year [15]##    species score_year country_of_origin max_points##                               ##  1 Arabica       2015 Ethiopia                90.6##  2 Arabica       2010 Guatemala               89.8##  3 Arabica       2013 Brazil                  88.8##  4 Arabica       2012 Peru                    88.8##  5 Arabica       2016 China                   87.2##  6 Arabica       2014 Costa Rica              87.2##  7 Arabica       2011 Brazil                  86.9##  8 Arabica       2017 Honduras                86.7##  9 Arabica       2018 Kenya                   84.6## 10 Robusta       2014 Uganda                  83.8## 11 Robusta       2017 India                   83.5## 12 Robusta       2015 India                   83.2## 13 Robusta       2012 India                   82.8## 14 Robusta       2016 India                   82.5## 15 Robusta       2013 India                   81.2ggplot(top_annual_score,       mapping=aes(x=score_year,                   y=max_points,                   label=paste0(score_year,"\n",country_of_origin,"\n", max_points),                   color=country_of_origin))+  theme_fivethirtyeight()+  geom_text(position = position_dodge(width = 0.9),            # move to center of bars            hjust =-0.2,            #Have Text just above bars            size =3.5) +  geom_point(size=4,             alpha=0.8)+  theme(legend.position = "none")+  facet_wrap(~species)+  ggtitle(" Top Scoring Coffees by Year - Faceted on Species ")

From our visualization and table we see for Arabica beans, the top coffee varied from country to country for a given year. However for Robusta, India seemed to have dominated with consistent wins from 2012 – 2017 with an exception of Uganda beating them in 2014.

Overall, for our given timespan in our dataset, for Arabica beans (as well as our entire dataset) Ethiopia scored the highest with a score of 90.58 and for Robusta Beans Uganda had the highest score of 83.75.

The overall summary for of scores for Arabica and Robusta beans accross the years is plotted in the below visualization with boxplots.

(arabica_robusta_average_score<-   coffee_ratings %>%   group_by(species) %>%   summarise(average_score = mean(total_cup_points),            lower_ci = mean(total_cup_points) - 1.96*sqrt(var(total_cup_points)/length(total_cup_points)),            upper_ci = mean(total_cup_points) + 1.96*sqrt(var(total_cup_points)/length(total_cup_points)))  )## # A tibble: 2 x 4##   species average_score lower_ci upper_ci##                      ## 1 Arabica          82.1     81.9     82.3## 2 Robusta          80.9     80.0     81.8ggplot(coffee_ratings,mapping=aes(x=score_year,y=total_cup_points,group=score_year))+  theme_fivethirtyeight()+  geom_boxplot(color="#3b2f2f")+  coord_flip()+  facet_wrap(~species)+  geom_hline(data=arabica_robusta_average_score,             mapping=aes(yintercept=average_score),             size= 0.5)+  geom_hline(data=arabica_robusta_average_score,             mapping=aes(yintercept=lower_ci),             linetype="dashed",             size= 0.5)+   geom_hline(data=arabica_robusta_average_score,             mapping=aes(yintercept=upper_ci),             linetype="dashed",             size= 0.5)+  ggtitle("Boxplots of Arabica and Robusta Beans from 2010-2018 \n           with confidence intervals plotted")

Besides for some outliers on the lower end of the scoring range, most of these coffees in this dataset are on average score around 80 or above. What can be implied from here is that the coffees that come in to be graded by the Coffee Quality Institute are usually those which have are assumed to be high in quality. This shouldn’t be a surprise because it appears that beans graded by the CQI are usually those which are submitted as it says it on the site’s banner

Welcome to the Coffee Quality Institute (CQI) database, which allows users to submit a sample for Q Grading

Conclusion

Its not surprising for our data set that Robusta beans scored poorer than their Arabica counterparts. That is something that anyone with some background in coffee will tell you- Arabica is generally more desirable by coffee drinkers and Robusta is usually used for instant coffee, Espresso and filler for coffee blends.

What is more telling about this dataset is that the best coffee is not something which is country specific for Arabica beans. However for Robusta beans, India and Uganda being top annual scorers in this domain seems to be indicative of higher quality Robusta in these countries.

Additionally, based on the fact that the average score for both Arabica and Robusta Beans is in the 80 range is telling that the beans which are being graded are those with a assumed to possess higher quality are submitted to be graded. Does this mean we have a overarching view of the worldwide coffee quality range? Certainly not.

But it could be telling on what quality of coffee you’re having the next time you visit a coffee shop which has coffee which is graded by the CQI.

(Let me know how it tastes!)

Thanks for reading!

Want to see more content like this?

Be sure to subscribe!

Join 69 other followers

Email Address:

Follow

Be sure to check these links out also:

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;// s.defer = true;// s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post RObservations #7 – #TidyTuesday – Analysing Coffee Ratings Data first appeared on R-bloggers.

rOpenSci 2020 Code of Conduct Transparency Report

$
0
0

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The rOpenSci community is supported by our Code of Conduct with a clear description of unacceptable behaviors, instructions on how to make a report, and information on how reports are handled. We, the Code of Conduct Committee, are responsible for receiving, investigating, deciding, enforcing and reporting on all reports of potential violations of our Code. We are committed to transparency with our community while upholding the privacy of victims and people who report incidents.

In 2020, we did not receive any Code of Conduct incident reports.

We acknowledge that a lack of incident reports does not mean we’re perfect. If in doubt, please contact us at conduct at ropensci.org. We welcome feedback.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post rOpenSci 2020 Code of Conduct Transparency Report first appeared on R-bloggers.

rOpenSci Code of Conduct Annual Review

$
0
0

[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Our community is our best asset. It’s so important to us, it’s in our mission statement. We recognize that communities are not inclusive by default; they require deliberate attention, including an enforceable Code of Conduct.

rOpenSci is committed to providing a safe, inclusive, welcoming, and harassment-free experience for everyone. We welcome people of all backgrounds and identities, including but not limited to gender, gender identity and expression, age, sexual orientation, disability, physical appearance, body size, race, ethnicity, religion (or lack thereof), or technology choices. We are anti-racist. We welcome anyone, no matter their technical expertise, career stage, or work sector.

We are all supported by rOpenSci’s Code of Conduct that applies to all people participating in the rOpenSci community, including rOpenSci staff and leadership. It applies to all modes of interaction online including GitHub project repositories, the rOpenSci discussion forum, Slack, Community Calls, and in person at rOpenSci-hosted events or events officially endorsed by rOpenSci, including social gatherings affiliated with the event. It is developed and enforced by a committee including rOpenSci staff and an independent community member.

Here we report on our annual review of rOpenSci’s Code of Conduct, reporting process, and internal guidelines for handling reports and enforcement.

Updates

  1. A change to the committee. After serving on the Code of Conduct Committee for two years, Scott Chamberlain is stepping down and Mark Padgham has joined. Committee members for 2021 are Stefanie Butland (rOpenSci Community Manager), Mark Padgham (rOpenSci Software Research Scientist) and Kara Woo (independent community member). We are responsible for receiving, investigating, deciding, enforcing and reporting on all reports of potential violations of our Code.
  2. No changes have been made to the text of the Code
  3. This code of conduct now also applies to packages in the rOpenSci GitHub organization, as described in our package development guide.

With this change of committee members, this will be version 2.2, dated January 5, 2021.

We welcome your feedback by email to conduct at ropensci.org, and we thank you for continuing to work with us to make rOpenSci a safe, enjoyable, friendly and enriching experience for everyone who participates.

var vglnk = {key: '949efb41171ac6ec1bf7f206d57e90b8'}; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; // s.defer = true; // s.src = '//cdn.viglink.com/api/vglnk.js'; s.src = 'https://www.r-bloggers.com/wp-content/uploads/2020/08/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post rOpenSci Code of Conduct Annual Review first appeared on R-bloggers.

Viewing all 12075 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>