A COVID Small Multiple

[This article was first published on R on kieranhealy.org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

John Burn-Murdoch has been doing very good work at the Financial Times producing various visualizations of the progress of COVID-19. One of his recent images is a small-multiple plot of cases by country, showing the trajectory of the outbreak for a large number of countries, with a the background of each small-multiple panel also showing (in grey) the trajectory of every other country for comparison. It’s a useful technique. In this example, I’ll draw a version of it in R and ggplot. The main difference is that instead of ordering the panels alphabetically by country, I’ll order them from highest to lowest current reported cases.

Here’s the figure we’ll end up with:

Cumulative reported COVID-19 cases to date, top 50 Countries

There are two small tricks. First, getting all the data to show (in grey) in each panel while highlighting just one country. Second, for reasons of space, moving the panel labels (in ggplot’s terminology, the strip labels) inside the panels, in order to tighten up the space a bit. Doing this is really the same trick both times, viz, creating a some mini-datasets to use for particular layers of the plot.

The code for this (including code to pull the data) is in my COVID GitHub repository. See the repo for details on downloading and cleaning it. Just this morning the ECDC changed how it’s supplying its data, moving from an Excel file to your choice of JSON, CSV, or XML, so this earlier post walking through the process for the Excel file is already out of date for the downloading step. There’s a new function in the repo, though.

We’ll start with the data mostly cleaned and organized.

 1 2 3 4 5 6 7 8 91011121314151617

>cov_case_curve# A tibble: 1,165 x 9# Groups:   iso3 [94]datecnameiso3casesdeathscu_casescu_deathsdays_elapsedend_label<date><chr><chr><dbl><dbl><dbl><dbl><drtn><chr>12020-01-19ChinaCHN136121630daysNA22020-01-20ChinaCHN19023531daysNA32020-01-21ChinaCHN151338662daysNA42020-01-22ChinaCHN14011526173daysNA52020-01-23ChinaCHN970623174daysNA62020-01-24ChinaCHN2599882265daysNA72020-01-25ChinaCHN441151323416daysNA82020-01-26ChinaCHN665151988567daysNA92020-01-27ChinaCHN787252775818daysNA102020-01-28ChinaCHN17532545281069daysNA# … with 1,155 more rows

Then we pick out the top 50 countries, isolating their maximum case value. The code here is a bit inefficient as I keep having to recode some of the country names in the mini-datasets. There are other inefficiencies too, but oh well. I’ll clean them up later.

 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132

top_50<-cov_case_curve%>%group_by(cname)%>%filter(cu_cases==max(cu_cases))%>%ungroup()%>%top_n(50,cu_cases)%>%select(iso3,cname,cu_cases)%>%mutate(days_elapsed=1,cu_cases=max(cov_case_curve$cu_cases)-1e4,cname=recode(cname,`United States`="USA",`Iran, Islamic Republic of`="Iran",`Korea, Republic of`="South Korea",`United Kingdom`="UK"))top_50# A tibble: 50 x 4iso3cnamecu_casesdays_elapsed<chr><chr><dbl><dbl>1ARGArgentina7599112AUSAustralia7599113AUTAustria7599114BELBelgium7599115BRABrazil7599116CANCanada7599117CHLChile7599118CHNChina7599119CZECzechRepublic75991110DNKDenmark759911# … with 40 more rows

This gives us our label layer. We’ve set days_elapsed and cu_cases values to the same thing for every country, because these are the x and y locations where the country labels will go.

Next, a data layer for the grey line traces and a data layer for the little endpoints at the current case-count value.

 1 2 3 4 5 6 7 8 9101112131415

cov_case_curve_bg<-cov_case_curve%>%select(-cname)%>%filter(iso3%in%top_50$iso3)cov_case_curve_endpoints<-cov_case_curve%>%filter(iso3%in%top_50$iso3)%>%mutate(cname=recode(cname,`United States`="USA",`Iran, Islamic Republic of`="Iran",`Korea, Republic of`="South Korea",`United Kingdom`="UK"))%>%group_by(iso3)%>%filter(cu_cases==max(cu_cases))%>%select(cname,iso3,days_elapsed,cu_cases)%>%ungroup()

We drop cname in the cov_case_curve_bg layer, because we’re going to facet by that value with the main dataset in a moment. That’s the trick that allows the traces for all the countries to appear in each panel.

And now we can draw the plot. I really need to fix that country recode—a prime example of DRY.

 1 2 3 4 5 6 7 8 91011121314151617181920212223242526272829303132333435363738394041424344454647484950515253545556

cov_case_sm<-cov_case_curve%>%filter(iso3%in%top_50$iso3)%>%mutate(cname=recode(cname,`United States`="USA",`Iran, Islamic Republic of`="Iran",`Korea, Republic of`="South Korea",`United Kingdom`="UK"))%>%ggplot(mapping=aes(x=days_elapsed,y=cu_cases))+# The line traces for every country, in every panelgeom_line(data=cov_case_curve_bg,aes(group=iso3),size=0.15,color="gray80")+# The line trace in red, for the country in any given panelgeom_line(color="firebrick",lineend="round")+# The point at the end. Bonus trick: some points can have fills!geom_point(data=cov_case_curve_endpoints,size=1.1,shape=21,color="firebrick",fill="firebrick2")+# The country label inside the panel, in lieu of the strip labelgeom_text(data=top_50,mapping=aes(label=cname),vjust="inward",hjust="inward",fontface="bold",color="firebrick",size=2.1)+# Log transform and friendly labelsscale_y_log10(labels=scales::label_number_si())+# Facet by country, order from high to lowfacet_wrap(~reorder(cname,-cu_cases),ncol=5)+labs(x="Days Since 100th Confirmed Case",y="Cumulative Number of Cases (log10 scale)",title="Cumulative Number of Reported Cases of COVID-19: Top 50 Countries",subtitle=paste("Data as of",format(max(cov_curve$date),"%A, %B %e, %Y")),caption="Kieran Healy @kjhealy / Data: https://www.ecdc.europa.eu/")+theme(plot.title=element_text(size=rel(1),face="bold"),plot.subtitle=element_text(size=rel(0.7)),plot.caption=element_text(size=rel(1)),# turn off the strip label and tighten the panel spacingstrip.text=element_blank(),panel.spacing.x=unit(-0.05,"lines"),panel.spacing.y=unit(0.3,"lines"),axis.text.y=element_text(size=rel(0.5)),axis.title.x=element_text(size=rel(1)),axis.title.y=element_text(size=rel(1)),axis.text.x=element_text(size=rel(0.5)),legend.text=element_text(size=rel(1)))ggsave("figures/cov_case_sm.png",cov_case_sm,width=10,height=12,dpi=300)

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R on kieranhealy.org.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A COVID Small Multiple

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112