Quantcast
Channel: R-bloggers
Viewing all articles
Browse latest Browse all 12094

Examining POTUS Executive Orders

$
0
0

(This article was first published on R – rud.is, and kindly contributed to R-bloggers)

This week’s edition of Data is Plural had two really fun data sets. One is serious fun (the first comprehensive data set on U.S. evictions, and the other I knew about but had forgotten: The Federal Register Executive Order (EO) data set(s).

The EO data is also comprehensive as the summary JSON (or CSV) files have links to more metadata and even more links to the full-text in various formats.

What follows is a quick post to help bootstrap folks who may want to do some tidy text mining on this data. We’ll look at EOs-per-year (per-POTUS) and also take a look at the “top 5 ‘first words’” in the titles of the EOS (also by POTUS).

Ingesting the Data

The EO main page has a list of EO JSON files by POTUS. We’re going to scrape this so we can classify the EOs by POTUS (we could also just use the Federal Register API since @thosjleeper wrote a spiffy package to access it):

library(rvest)library(stringi)library(pluralize) # devtools::install_github("hrbrmstr/pluralize")library(hrbrthemes)library(tidyverse)#' Retrieve the Federal Register main EO page so we can get the links for each POTUSpg <- read_html("https://www.federalregister.gov/executive-orders") #' Find the POTUS EO data nodes, excluding the one for "All"html_nodes(pg, "ul.bulk-files") %>%   html_nodes(xpath = ".//li[span[a[contains(@href, 'json')]] and                             not(span[contains(., 'All')])]") -> potus_nodes#' Turn the POTUS info into a data frame with the POTUS name and EO JSON link,#' then retrieve the JSON file and make a data frame of individual data elementsdata_frame(  potus = html_nodes(potus_nodes, "span:nth-of-type(1)") %>% html_text(),  eo_link = html_nodes(potus_nodes, "a[href *= 'json']") %>%     html_attr("href") %>%     sprintf("https://www.federalregister.gov%s", .)) %>%   mutate(eo = map(eo_link, jsonlite::fromJSON)) %>%   mutate(eo = map(eo, "results")) %>%   unnest() -> eo_dfglimpse(eo_df)## Observations: 887## Variables: 16## $ potus                   "Donald Trump", "Donald Trump", "Donald Trump", "Donald Trump", "Donald Trump", "D...## $ eo_link                 "https://www.federalregister.gov/documents/search.json?conditions%5Bcorrection%5D=...## $ citation                "82 FR 8351", "82 FR 8657", "82 FR 8793", "82 FR 8799", "82 FR 8977", "82 FR 9333"...## $ document_number         "2017-01799", "2017-02029", "2017-02095", "2017-02102", "2017-02281", "2017-02450"...## $ end_page                8352, 8658, 8797, 8803, 8982, 9338, 9341, 9966, 10693, 10696, 10698, 10700, 12287,...## $ executive_order_notes   NA, "See: EO 13807, August 15, 2017", NA, NA, "See: EO 13780, March 6, 2017", "Sup...## $ executive_order_number  13765, 13766, 13767, 13768, 13769, 13770, 13771, 13772, 13773, 13774, 13775, 13776...## $ html_url                "https://www.federalregister.gov/documents/2017/01/24/2017-01799/minimizing-the-ec...## $ pdf_url                 "https://www.gpo.gov/fdsys/pkg/FR-2017-01-24/pdf/2017-01799.pdf", "https://www.gpo...## $ publication_date        "2017-01-24", "2017-01-30", "2017-01-30", "2017-01-30", "2017-02-01", "2017-02-03"...## $ signing_date            "2017-01-20", "2017-01-24", "2017-01-25", "2017-01-25", "2017-01-27", "2017-01-28"...## $ start_page              8351, 8657, 8793, 8799, 8977, 9333, 9339, 9965, 10691, 10695, 10697, 10699, 12285,...## $ title                   "Minimizing the Economic Burden of the Patient Protection and Affordable Care Act ...## $ full_text_xml_url       "https://www.federalregister.gov/documents/full_text/xml/2017/01/24/2017-01799.xml...## $ body_html_url           "https://www.federalregister.gov/documents/full_text/html/2017/01/24/2017-01799.ht...## $ json_url                "https://www.federalregister.gov/api/v1/documents/2017-01799.json", "https://www.f...

EOs By Year

To see how many EOs were signed per-year, per-POTUS, we’ll convert the signing_date into a year (and return it back to a Date object so we get spiffier plot labels), factor order the POTUS names and mark the start of each POTUS term. I’m not usually a fan of stacked bar charts, but since there will only be — at most — two segments, I think they work well and it also shows just how many EOs are established in year one of a POTUS term:

mutate(eo_df, year = lubridate::year(signing_date)) %>%   mutate(year = as.Date(sprintf("%s-01-01", year))) %>%   count(year, potus) %>%  mutate(    potus = factor(      potus,       levels = c("Donald Trump", "Barack Obama", "George W. Bush", "William J. Clinton")    )  ) %>%  ggplot(aes(year, n, group=potus)) +  geom_col(position = "stack", aes(fill = potus)) +  scale_x_date(    name = NULL,    expand = c(0,0),    breaks = as.Date(c("1993-01-01", "2001-01-01", "2009-01-01", "2017-01-01")),    date_labels = "%Y",    limits = as.Date(c("1992-01-01", "2020-12-31"))  ) +  scale_y_comma(name = "# EOs") +  scale_fill_ipsum(name = NULL) +  labs(    title = "Number of Executive Orders Signed Per-Year, Per-POTUS",    subtitle = "1993-Present",    caption = "Source: Federal Register "  ) +  theme_ipsum_rc(grid = "Y") +  theme(legend.position = "bottom")

Favourite First (Title) Words

I’ll let some eager tidy text miners go-to-town on the full text links and just focus on one aspect of the EO titles: the “first” words. These are generally words like “Amending”, “Establishing”, “Promoting”, etc. to give citizens a quick idea of what’s the order is supposed to be doing. We’ll remove common words, turn plurals into singulars and also get rid of years/dates to make the data a bit more useful and focus on the “top 5” first words used by each POTUS (and show all the first words across each POTUS). I’m using raw counts here (since this is a quick post) but another view normalized by percent of all POTUS EOs might prove more interesting/valuable:

mutate(titles_df, first_word = singularize(first_word)) %>%   count(potus, first_word, sort=TRUE) %>%   filter(!stri_detect_regex(first_word, "President|Federal|National")) %>%  mutate(first_word = stri_replace_all_fixed(first_word, "Establishment", "Establishing")) %>%   mutate(first_word = stri_replace_all_fixed(first_word, "Amendment", "Amending")) -> first_wordsgroup_by(first_words, potus) %>%     top_n(5) %>%      ungroup() %>%     distinct(first_word) %>%     pull(first_word) -> all_first_wordsfilter(first_words, first_word %in% all_first_words) %>%   mutate(    potus = factor(      potus,       levels = c("Donald Trump", "Barack Obama", "George W. Bush", "William J. Clinton")    )  ) %>%   mutate(    first_word = factor(      first_word,       levels = rev(sort(unique(first_word)))    )  ) -> first_dfggplot(first_df, aes(n, first_word)) +  geom_segment(aes(xend=0, yend=first_word, color=potus), size=4) +  scale_x_comma(limits=c(0,40)) +  scale_y_discrete(limits = sort(unique(first_df$first_word))) +  facet_wrap(~potus, scales = "free", ncol = 2) +  labs(    x = "# EOs",    y = NULL,    title = "Top 5 Executive Order 'First Words' by POTUS",    subtitle = "1993-Present",    caption = "Source: Federal Register "  ) +  theme_ipsum_rc(grid="X", strip_text_face = "bold") +  theme(panel.spacing.x = unit(5, "lines")) +  theme(legend.position="none")

FWIW I expected more “Revocation”/”Removing” from the current tangerine-in-chief, but there’s plenty “Enforcing” and “Blocking” to make up for it (being the “tough guy” that he likes to pretend he is).

FIN

There’s way more that can be done with this data set and hopefully folks will take it for a spin and come up with their own interesting views. If you do, drop a note in the comments with a link to your creation(s)!

The code blocks are all combined into this gist.

var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) {var s = d.createElement(t); s.type = 'text/javascript'; s.async = true;s.src = '//cdn.viglink.com/api/vglnk.js';var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...


Viewing all articles
Browse latest Browse all 12094

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>