Quantcast
Channel: R-bloggers
Viewing all 12095 articles
Browse latest View live

Comments on the New R OOP System, R7

$
0
0
[This article was first published on Mad (Data) Scientist, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Object-Oriented Programming (OOP) is more than just a programming style; it’s a philosophy. R has offered various forms of OOP, starting with S3, then (among others) S4, reference classes, and R6, and now R7. The latter has been under development by a team broadly drawn from the R community leadership, not only the “directors” of R development, the R Core Team, but also the prominent R services firm RStudio and so on.

I’ll start this report with a summary, followed by details (definition of OOP, my “safety” concerns etc.). The reader need not have an OOP background for this material; an overview will be given here (though I dare say some readers who have this background may learn something too).

This will not be a tutorial on how to use R7, nor an evaluation of its specific features. Instead, I’ll first discuss the goals of the S3 and S4 OOP systems, which R7 replaces, especially in terms of whether OOP is the best way to meet those goals. These comments then apply to R7 as well.

SUMMARY

Simply put, R7 does a very nice job of implementing something I’ve never liked very much. I do like two of the main OOP tenets, encapsulation and polymorphism, but S3 offers those and it’s good enough for me. And though I agree in principle with another point of OOP, “safety,” I fear that it often results in a net LOSS of safety. . R7 does a good job of combining S3 and S4 (3 + 4 = 7), but my concerns about complexity and a net loss in safety regarding S4 remain in full.

OOP OVERVIEW

The first OOP language in wide use was C++, an extension of C that was originally called C with Classes. The first widely used language designed to be OOP “from the ground up” was Python. R’s OOP offerings have been limited.

Encapsulation:

This simply means organizing several related variables into one convenient package. R’s list structure has always done that. S3 classes then tack on a class name as attribute.

Polymorphism:

The term here, meaning “many forms,” simply means that the same function will take different actions when it is applied to different kinds of objects.

For example, consider a sorting operation. We would like this function to do a numeric sort if it is a applied to a vector of (real) numbers, but do an alphabetical sort on character vectors. Meanwhile, we would like to use the same function name, say ‘sort’.

S3 accomplishes this via generic functions. Even beginning R users have likely made calls to generic functions without knowing it. For instance, consider the (seemingly) ordinary plot() function. Say we call this function on a vector x; a graph of x will be displayed. But if we call lm() on some data, then call plot() on the output lmout R will display some graphs depicting that output:

mtc <- mtcars
plot(mtc$mpg) # plots mpg[i] against i
lmout <- lm(mpg ~ .,data=mtc)
plot(lmout)  # plots several graphs, e.g. residuals

The “magic” behind this is dispatch. The R interpreter will route a nominal call to plot() to a class-specific function. In the lm() example, for instance, lm() returns an S3 object of class ‘lm’, so the call plot(lmout) will actually be passed on to another function, plot.lmout().

Other well-known generics are print(), summary(), predict() and coef().

Note that the fact that R and Python are not strongly-typed languages made polymorphism easy to implement. C++ on the other hand is strongly-typed, and the programmer will likely need to use templates, very painful.

By the way, I always tell beginning and intermediate R users that a good way to learn about functions written by others (including in R itself) is to run the function through R’s debug() function. In our case here, they may find it instructive to run debug(plot) and then plot(lmout) to see dispatch up close.

Inheritance:

Say the domain is pets. We might have dogs named Norm, Norma, Frank and Hadley, cats named JJ, Joe, Yihui and Susan, and more anticipated in the future.

To keep track of them, we might construct a class ‘pets’, with fields for name and birthdate. But we could then also construct subclasses ‘dogs’ and ‘cats’. Each subclass would have all the fields of the top class, plus others specific to dogs or cats. We might then also construct a sub-subclass, ‘gender.’

“Safety”:

Say you have a generic function defined for the class, with two numeric arguments, returning TRUE if the first is less than the second:

f <- function(x,y) x < y

But you accidentally call the function with two character strings as arguments. This should produce an error, but won’t

In a mission-critical setting, this could be costly. If the app processes incoming sales orders, say, there would be downtime while restarting the app, possibly lost orders, etc.

If you are worried about this, you could add error-checking code, e.g.

> f
function(x,y) {
   if (!is.numeric(x) || !is.numeric(y))
      stop('non-numeric arguments')
   x < y
}

More sophisticated OOP systems such as S4 can catch such errors for you. There is no free lunch, though–the machinery to even set up your function becomes more complex and then you still have to tell S4 that x and y above must be numeric–but arguably S4 is cleaner-looking than having a stop() call etc.

Consider another type of calamity: As noted, S3 objects are R lists. Say one of the list elements has the name partNumber, but later in assigning a new value to that element, you misspell at as partnumber:

myS3object <- partnumber

Here we would seem to have no function within which to check for misspelling etc. Thus S4 or some other “safe” OOP system would seem to be a must–unless we create functions to read or write elements of our object. And it turns out that that is exactly what OOP people advocate anyway (e.g. even in S4 etc.), in the form of getters and setters.

In the above example, for instance, say we have a class ‘Orders’, one of whose fields is partNumber. In S3, the getter in the above example would be named partNumber, and for a particular sales order thisOrder, one would fetch the part number via

get_partNumber(thisOrder)

rather than the more direct way of accessing an R list:

pn <- thisOrder$partNumber

The reader may think it’s silly to write special functions for list read and write, and many would agree. But the OOP philosophy is that we don’t touch objects directly, and instead have functions to act as intermediaries. At any rate, we could place our error-checking code in the getters and setters. (Although there still would be no way under S3 to prevent direct access.)

ANALYSIS

I use OOP rather sparingly in R, S3 in my own code, and S4, reference classes or R6 when needed for a package that I obtain from CRAN (e.g. ebimage for S4), In Python, I use OOP again for library access, e.g. threading, and to some degree, just for fun, as I like Python’s class structure.

But mostly, I have never been a fan of OOP. In particular, I never have been impressed by the “safety” argument. Here’s why:

Safety vs. Complexity

Of course, OOP does not do anything to prevent code logic errors, which are far more prevalent than, say, misspellings. And, most important:

  • There is a direct relation between safety and code complexity.
  • There is a direct relation between code logic errors and code complexity.

One of my favorite R people is John Chambers, “Father of the S Language” and thus the “Grandfather of R.” In his book, Software for Data Analysis, p.335, he warns that “Defining [an S4] class is a more serious piece of programming …than in previous chapters…[even though] the number of lines is not large…” He warns that things are even more difficult for the user of a class than it was for the author in designing it, with “advance contemplation” of what problems users may encounter. And, “You may want to try several different versions [of the class] before committing to one.”

In other words, safety in terms of misspellings etc. comes at possibly major expense in logic errors. There is no avoiding this.

There Are Other Ways to Achieve Safety:

As noted above, we do have alternatives to OOP in this regard, in the form of inserting our own error-checking code. (Note too that error-checking may be important in the middle of your code, using stopifnot().) Indeed, this can be superior to using OOP, as one has much more flexibility, allowing for more sophisticated checks.

Why the Push for R7 Now?

Very few of the most prominent developers of R packages use S4 as of now. One must conclude either that there is not a general urgency for safety and/or authors find that safety is more easily and effectively achieved through alternative means, per the above discussion.

As to encapsulation and inheritance, S3 already does a reasonably good job there. Why, then, push for R7?

The impetus seems to be a desire to modernize/professionalize R, moving closer to status as a general-purpose language. Arguably, OOP had been a weak point of R in that sense, and now R can hold its head high in the community of languages.

That’s great, but as usual I am concerned about the impact on the teaching of R to learners without prior programming experience. I’ve been a major critic of the tidyverse in that regard, as Tidy emphasizes “modern” functional programming/loop avoidance to students who barely know what a function is. Will R beginners be taught R7? That would be a big mistake, and I hope those who tend to be enthralled with The New, New Thing resist such a temptation.

Me, well as mentioned, I’m not much of an OOP fan, and don’t anticipate using R7. But the development team has done a bang-up job in creating R7, and for those who feel the need for a strong OOP paradigm, I strongly recommend it.

To leave a comment for the author, please follow the link and comment on their blog: Mad (Data) Scientist.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Comments on the New R OOP System, R7

Efficient list recursion in R with {rrapply}

$
0
0
[This article was first published on R-bloggers | A Random Walk, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

sticker

Introduction

The nested list below shows a small extract from the Mathematics Genealogy Project highlighting the advisor/student genealogy of several famous mathematicians. The mathematician’s given names are present in the "given" attribute of each list element. The numeric values at the leaf elements are the total number of student descendants according to the website as of June 2022. If no descendants are available there is a missing value present at the leaf node.

students <- list(  Bernoulli = structure(list(    Bernoulli = structure(list(      Bernoulli = structure(1L, given = "Daniel"),      Euler = structure(list(        Euler = structure(NA, given = "Johann"),        Lagrange = structure(list(          Fourier = structure(73788L, given = "Jean-Baptiste"),           Plana = structure(NA, given = "Giovanni"),          Poisson = structure(128235L, given = "Simeon")        ), given = "Joseph")      ), given = "Leonhard")    ), given = "Johann"),    Bernoulli = structure(NA, given = "Nikolaus")  ), given = "Jacob"))str(students, give.attr = FALSE)#> List of 1#>  $ Bernoulli:List of 2#>   ..$ Bernoulli:List of 2#>   .. ..$ Bernoulli: int 1#>   .. ..$ Euler    :List of 2#>   .. .. ..$ Euler   : logi NA#>   .. .. ..$ Lagrange:List of 3#>   .. .. .. ..$ Fourier: int 73788#>   .. .. .. ..$ Plana  : logi NA#>   .. .. .. ..$ Poisson: int 128235#>   ..$ Bernoulli: logi NA

As an exercise in list recursion, consider the following simple data exploration question:

Filter all descendants of ‘Leonhard Euler’ and replace all missing values by zero while maintaining the list structure.

Here is a possible (not so efficient) base R solution using recursion with the Recall() function:

filter_desc_euler <- \(x) {  i <- 1  while(i <= length(x)) {    if(identical(names(x)[i], "Euler") & identical(attr(x[[i]], "given"), "Leonhard")) {      x[[i]] <- rapply(x[[i]], f = \(x) replace(x, is.na(x), 0), how = "replace")      i <- i + 1    } else {      if(is.list(x[[i]])) {        val <- Recall(x[[i]])        x[[i]] <- val        i <- i + !is.null(val)      } else {        x[[i]] <- NULL      }      if(all(sapply(x, is.null))) {        x <- NULL      }    }  }  return(x)}str(filter_desc_euler(students), give.attr = FALSE)#> List of 1#>  $ Bernoulli:List of 1#>   ..$ Bernoulli:List of 1#>   .. ..$ Euler:List of 2#>   .. .. ..$ Euler   : num 0#>   .. .. ..$ Lagrange:List of 3#>   .. .. .. ..$ Fourier: num 73788#>   .. .. .. ..$ Plana  : num 0#>   .. .. .. ..$ Poisson: num 128235

This works, but is hardly the kind of convoluted code we would like to write for such a seemingly simple question. Moreover, this code is not very easy to follow, which can make updating or modifying it quite a time-consuming and error-prone task.

An alternative approach would be to unnest the list into a more manageable (e.g. rectangular) format or use specialized packages, such as igraph or data.tree, to make pruning or modifying node entries more straightforward. Note that attention must be paid to correctly include the node attributes in the transformed object as the node names themselves are not unique in this example. This is a sensible approach and usually the way to go when cleaning or tidying up the data, but for fast prototyping and data exploration tasks we may want to keep the list in its original format to reduce the number of processing steps and minimize the code complexity. Another reason to maintain a nested data structure may be that we wish to use a certain data visualization or data exporting function and the function expects its input in a nested format.

The recursive function above makes use of base rapply(), a member of the apply-family of functions in R, that allows to apply a function recursively to the elements of a nested list and decide how the returned result is structured. Although sometimes useful, the rapply() function is not sufficiently flexible for many list recursion tasks in practice, as also demonstrated in the above example. In this context, the rrapply() function in the minimal rrapply-package attempts to revisit and extend base rapply() to make it more generally applicable for list recursion in the wild. The rrapply() function builds upon R’s native C implementation ofrapply() and for this reason requires no other external dependencies.

When to use rrapply()

Below, we make use of the two datasets renewable_energy_by_country and pokedex included in the rrapply-package.

  • renewable_energy_by_country is a nested list containing the renewable energy shares per country (% of total energy consumption) in 2016. The data is publicly available at the United Nations Open SDG Data Hub. The 249 countries and areas are structured based on their geographical locations according to the United Nations M49 standard. The numeric values listed for each country are percentages, if no data is available the value of the country is NA.

  • pokedex is a nested list containing various property values for each of the 151 original Pokémon available (in .json) from https://github.com/Biuni/PokemonGO-Pokedex.

library(rrapply)data("renewable_energy_by_country")

For convenience, we subset only the values for countries and areas in Oceania from renewable_energy_by_country,

renewable_oceania <- renewable_energy_by_country[["World"]]["Oceania"]str(renewable_oceania, list.len = 3, give.attr = FALSE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 6#>   .. ..$ Australia                        : num 9.32#>   .. ..$ Christmas Island                 : logi NA#>   .. ..$ Cocos (Keeling) Islands          : logi NA#>   .. .. [list output truncated]#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 8#>   .. ..$ Guam                                : num 3.03#>   .. ..$ Kiribati                            : num 45.4#>   .. ..$ Marshall Islands                    : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

List pruning and unnesting

how = "prune"

With base rapply(), there is no convenient way to prune or filter elements from the input list. The rrapply() function adds an option how = "prune" to prune all list elements not subject to application of the function f from a nested list. The original list structure is retained, similar to the non-pruned versions how = "replace" and how = "list". Using how = "prune" and the same syntax as in rapply(), we can easily drop all missing values from the list while preserving the nested list structure:

## drop all logical NA's while preserving list structure rrapply(  renewable_oceania,  f = \(x) x,    classes = "numeric",  how = "prune") |>  str(list.len = 3, give.attr = FALSE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 2#>   .. ..$ Australia  : num 9.32#>   .. ..$ New Zealand: num 32.8#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 7#>   .. ..$ Guam                            : num 3.03#>   .. ..$ Kiribati                        : num 45.4#>   .. ..$ Marshall Islands                : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

Remark: if the f function is missing, it defaults to the identity function. That is, the f argument can be dropped when no (non-trivial) function is applied to the list elements.

how = "flatten"

Instead, we can set how = "flatten" to return a flattened unnested version of the pruned list. This is more efficient than first returning the pruned list with how = "prune" and unlisting or flattening the list in a subsequent step.

## drop all logical NA's and return unnested listrrapply(  renewable_oceania,  classes = "numeric",  how = "flatten") |>  head(n = 10)#>        Australia      New Zealand             Fiji    New Caledonia #>             9.32            32.76            24.36             4.03 #> Papua New Guinea  Solomon Islands          Vanuatu             Guam #>            50.34            65.73            33.67             3.03 #>         Kiribati Marshall Islands #>            45.43            11.75

Hint: the options argument allows to tune several options specific to certain choices of how. With how = "flatten", we can choose to not coerce the flattened list to a vector and/or to include all parent list names in the result similar to how = "unlist" but then with a custom name separator.

## flatten to simple list with full namesrrapply(  renewable_oceania,  classes = "numeric",  how = "flatten",  options = list(namesep = ".", simplify = FALSE)) |>  str(list.len = 10, give.attr = FALSE)#> List of 22#>  $ Oceania.Australia and New Zealand.Australia        : num 9.32#>  $ Oceania.Australia and New Zealand.New Zealand      : num 32.8#>  $ Oceania.Melanesia.Fiji                             : num 24.4#>  $ Oceania.Melanesia.New Caledonia                    : num 4.03#>  $ Oceania.Melanesia.Papua New Guinea                 : num 50.3#>  $ Oceania.Melanesia.Solomon Islands                  : num 65.7#>  $ Oceania.Melanesia.Vanuatu                          : num 33.7#>  $ Oceania.Micronesia.Guam                            : num 3.03#>  $ Oceania.Micronesia.Kiribati                        : num 45.4#>  $ Oceania.Micronesia.Marshall Islands                : num 11.8#>   [list output truncated]

how = "melt"

Using how = "melt", we can return a melted data.frame of the pruned list similar in format to reshape2::melt() applied to a nested list. The rows of the melted data.frame contain the parent node paths of the elements in the pruned list. The "value" column contains the values of the terminal or leaf nodes analogous to the flattened list returned by how = "flatten".

## drop all logical NA's and return melted data.frameoceania_melt <- rrapply(  renewable_oceania,  classes = "numeric",  how = "melt") head(oceania_melt, n = 10)#>         L1                        L2               L3 value#> 1  Oceania Australia and New Zealand        Australia  9.32#> 2  Oceania Australia and New Zealand      New Zealand 32.76#> 3  Oceania                 Melanesia             Fiji 24.36#> 4  Oceania                 Melanesia    New Caledonia  4.03#> 5  Oceania                 Melanesia Papua New Guinea 50.34#> 6  Oceania                 Melanesia  Solomon Islands 65.73#> 7  Oceania                 Melanesia          Vanuatu 33.67#> 8  Oceania                Micronesia             Guam  3.03#> 9  Oceania                Micronesia         Kiribati 45.43#> 10 Oceania                Micronesia Marshall Islands 11.75

Remark: if no names are present in a certain sublist of the input list, how = "melt" replaces the names in the melted data.frame by list element indices "1", "2", etc.

## drop some area names renewable_oceania1 <- renewable_oceaniarenewable_oceania1[[1]] <- unname(renewable_oceania[[1]])## drop all logical NA's and return melted data.framerrapply(  renewable_oceania1,  classes = "numeric",  how = "melt") |>  head(n = 10)#>         L1 L2               L3 value#> 1  Oceania  1        Australia  9.32#> 2  Oceania  1      New Zealand 32.76#> 3  Oceania  2             Fiji 24.36#> 4  Oceania  2    New Caledonia  4.03#> 5  Oceania  2 Papua New Guinea 50.34#> 6  Oceania  2  Solomon Islands 65.73#> 7  Oceania  2          Vanuatu 33.67#> 8  Oceania  3             Guam  3.03#> 9  Oceania  3         Kiribati 45.43#> 10 Oceania  3 Marshall Islands 11.75

A melted data.frame can be used to reconstruct a nested list with how = "unmelt". No skeleton object as e.g. required by relist() is needed, only an ordinary data.frame in the format returned by how = "melt". This option can be convenient to construct nested lists from a rectangular data.frame format without having to resort to recursive function definitions.

## reconstruct nested list from melted data.framerrapply(oceania_melt, how = "unmelt") |>  str(list.len = 3, give.attr = FALSE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 2#>   .. ..$ Australia  : num 9.32#>   .. ..$ New Zealand: num 32.8#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 7#>   .. ..$ Guam                            : num 3.03#>   .. ..$ Kiribati                        : num 45.4#>   .. ..$ Marshall Islands                : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

how = "bind"

Nested lists containing repeated observations can be unnested with how = "bind". Each repeated sublist is expandedas a single row in a wide data.frame and identical sublist component names are aligned as individual columns. By default, the list layer containing the repeated observations is identified by the minimal depth detected across leaf elements, but this can also be overridden using the coldepth option in the options argument. Note that the returned data.frame is similar in format to repeated application of tidyr::unnest_wider() to a nested data.frame, with the same coercion rules applied to the individual columns as `how = “unlist”.

data("pokedex")str(pokedex, list.len = 3)#> List of 1#>  $ pokemon:List of 151#>   ..$ :List of 16#>   .. ..$ id            : int 1#>   .. ..$ num           : chr "001"#>   .. ..$ name          : chr "Bulbasaur"#>   .. .. [list output truncated]#>   ..$ :List of 17#>   .. ..$ id            : int 2#>   .. ..$ num           : chr "002"#>   .. ..$ name          : chr "Ivysaur"#>   .. .. [list output truncated]#>   ..$ :List of 15#>   .. ..$ id            : int 3#>   .. ..$ num           : chr "003"#>   .. ..$ name          : chr "Venusaur"#>   .. .. [list output truncated]#>   .. [list output truncated]## unnest list to wide data.framerrapply(pokedex, how = "bind")[, c(1:3, 5:8)] |>  head(n = 10)#>    id num       name          type height   weight            candy#> 1   1 001  Bulbasaur Grass, Poison 0.71 m   6.9 kg  Bulbasaur Candy#> 2   2 002    Ivysaur Grass, Poison 0.99 m  13.0 kg  Bulbasaur Candy#> 3   3 003   Venusaur Grass, Poison 2.01 m 100.0 kg  Bulbasaur Candy#> 4   4 004 Charmander          Fire 0.61 m   8.5 kg Charmander Candy#> 5   5 005 Charmeleon          Fire 1.09 m  19.0 kg Charmander Candy#> 6   6 006  Charizard  Fire, Flying 1.70 m  90.5 kg Charmander Candy#> 7   7 007   Squirtle         Water 0.51 m   9.0 kg   Squirtle Candy#> 8   8 008  Wartortle         Water 0.99 m  22.5 kg   Squirtle Candy#> 9   9 009  Blastoise         Water 1.60 m  85.5 kg   Squirtle Candy#> 10 10 010   Caterpie           Bug 0.30 m   2.9 kg   Caterpie Candy

Hint: setting namecols = TRUE in the options argument includes the parent list names associated to each row in the wide data.frame as individual columns L1, L2, etc.

## bind to data.frame including parent columnspokemon_evolutions <- rrapply(  pokedex,   how = "bind",   options = list(namecols = TRUE, coldepth = 5)) head(pokemon_evolutions, n = 10)#>         L1 L2             L3 L4 num       name#> 1  pokemon  1 next_evolution  1 002    Ivysaur#> 2  pokemon  1 next_evolution  2 003   Venusaur#> 3  pokemon  2 prev_evolution  1 001  Bulbasaur#> 4  pokemon  2 next_evolution  1 003   Venusaur#> 5  pokemon  3 prev_evolution  1 001  Bulbasaur#> 6  pokemon  3 prev_evolution  2 002    Ivysaur#> 7  pokemon  4 next_evolution  1 005 Charmeleon#> 8  pokemon  4 next_evolution  2 006  Charizard#> 9  pokemon  5 prev_evolution  1 004 Charmander#> 10 pokemon  5 next_evolution  1 006  Charizard

This can be useful to unnest repeated list elements at multiple nested list levels and join the results into a single data.frame:

## merge pokemon evolutions with pokemon namesrrapply(  pokedex,  how = "bind",  options = list(namecols = TRUE))[, c("L1", "L2", "name")] |>  merge(    pokemon_evolutions[, c("L1", "L2", "L3", "name")],    by = c("L1", "L2"),    suffixes = c("", ".evolution")  ) |>  head(n = 10)#>         L1  L2      name             L3 name.evolution#> 1  pokemon   1 Bulbasaur next_evolution        Ivysaur#> 2  pokemon   1 Bulbasaur next_evolution       Venusaur#> 3  pokemon  10  Caterpie next_evolution        Metapod#> 4  pokemon  10  Caterpie next_evolution     Butterfree#> 5  pokemon 100   Voltorb next_evolution      Electrode#> 6  pokemon 101 Electrode prev_evolution        Voltorb#> 7  pokemon 102 Exeggcute next_evolution      Exeggutor#> 8  pokemon 103 Exeggutor prev_evolution      Exeggcute#> 9  pokemon 104    Cubone next_evolution        Marowak#> 10 pokemon 105   Marowak prev_evolution         Cubone

Condition function

Base rapply() allows to apply a function f to list elements of certain types or classes via the classes argument. rrapply() generalizes this concept via an additional condition argument, which accepts any function to use as a condition or predicate to apply f to a subset of list elements. Conceptually, the f function is applied to all leaf elements for which the condition function exactly evaluates to TRUE similar to isTRUE(). If the condition argument is missing, f is applied to all leaf elements. In combination with how = "prune", the condition function provides additional flexibility in selecting and filtering elements from a nested list,

## drop all NA's using condition functionrrapply(  renewable_oceania,  condition = \(x) !is.na(x),  how = "prune") |>  str(list.len = 3, give.attr = FALSE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 2#>   .. ..$ Australia  : num 9.32#>   .. ..$ New Zealand: num 32.8#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 7#>   .. ..$ Guam                            : num 3.03#>   .. ..$ Kiribati                        : num 45.4#>   .. ..$ Marshall Islands                : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

More interesting is to consider a condition that cannot also be defined using the classes argument. For instance, we can filter all countries with values that satisfy a certain numeric condition:

## filter all countries with values above 85%rrapply(  renewable_energy_by_country,   condition = \(x) x > 85,   how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ World:List of 1#>   ..$ Africa:List of 1#>   .. ..$ Sub-Saharan Africa:List of 3#>   .. .. ..$ Eastern Africa:List of 7#>   .. .. .. ..$ Burundi                    : num 89.2#>   .. .. .. ..$ Ethiopia                   : num 91.9#>   .. .. .. ..$ Rwanda                     : num 86#>   .. .. .. ..$ Somalia                    : num 94.7#>   .. .. .. ..$ Uganda                     : num 88.6#>   .. .. .. ..$ United Republic of Tanzania: num 86.1#>   .. .. .. ..$ Zambia                     : num 88.5#>   .. .. ..$ Middle Africa :List of 2#>   .. .. .. ..$ Chad                            : num 85.3#>   .. .. .. ..$ Democratic Republic of the Congo: num 97#>   .. .. ..$ Western Africa:List of 1#>   .. .. .. ..$ Guinea-Bissau: num 86.5## or by passing arguments to condition via ...rrapply(  renewable_energy_by_country,   condition = "==",   e2 = 0,   how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ World:List of 4#>   ..$ Americas:List of 1#>   .. ..$ Latin America and the Caribbean:List of 1#>   .. .. ..$ Caribbean:List of 1#>   .. .. .. ..$ Antigua and Barbuda: num 0#>   ..$ Asia    :List of 1#>   .. ..$ Western Asia:List of 4#>   .. .. ..$ Bahrain: num 0#>   .. .. ..$ Kuwait : num 0#>   .. .. ..$ Oman   : num 0#>   .. .. ..$ Qatar  : num 0#>   ..$ Europe  :List of 2#>   .. ..$ Northern Europe:List of 1#>   .. .. ..$ Channel Islands:List of 1#>   .. .. .. ..$ Guernsey: num 0#>   .. ..$ Southern Europe:List of 1#>   .. .. ..$ Gibraltar: num 0#>   ..$ Oceania :List of 2#>   .. ..$ Micronesia:List of 1#>   .. .. ..$ Northern Mariana Islands: num 0#>   .. ..$ Polynesia :List of 1#>   .. .. ..$ Wallis and Futuna Islands: num 0

Note that the NA elements are not returned, as the condition function does not evaluate to TRUE for NA values.

As the condition function is a generalization of the classes argument, it remains possible to use deflt together with how = "list" or how = "unlist" to set a default value to all leaf elements for which the condition is not TRUE:

## replace all NA elements by zerorrapply(  renewable_oceania,   condition = Negate(is.na),   deflt = 0,   how = "list") |>  str(list.len = 3, give.attr = FALSE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 6#>   .. ..$ Australia                        : num 9.32#>   .. ..$ Christmas Island                 : num 0#>   .. ..$ Cocos (Keeling) Islands          : num 0#>   .. .. [list output truncated]#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 8#>   .. ..$ Guam                                : num 3.03#>   .. ..$ Kiribati                            : num 45.4#>   .. ..$ Marshall Islands                    : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

To be consistent with base rapply(), the deflt argument can still only be used in combination with how = "list" or how = "unlist".

Using the ... argument

The first argument to f always evaluates to the content of the list element to which f is applied. Any further arguments, (besides the special arguments .xname, .xpos, .xparents and .xsiblings discussed below), that are independent of the list content can be supplied via the ... argument. Since rrapply() accepts a function in two of its arguments f and condition, any arguments defined via the ... need to be defined as function arguments in both the f and condition functions (if existing), even if they are not used in the function itself.

To clarify, consider the following example which replaces all missing values by a value defined in a separate argument newvalue:

## this is not ok!tryCatch({  rrapply(    renewable_oceania,     condition = is.na,     f = \(x, newvalue) newvalue,     newvalue = 0,     how = "replace"  )}, error = function(error) error$message)#> [1] "2 arguments passed to 'is.na' which requires 1"## this is okrrapply(  renewable_oceania,   condition = \(x, newvalue) is.na(x),   f = \(x, newvalue) newvalue,   newvalue = 0,   how = "replace") |>  str(list.len = 3, give.attr = FALSE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 6#>   .. ..$ Australia                        : num 9.32#>   .. ..$ Christmas Island                 : num 0#>   .. ..$ Cocos (Keeling) Islands          : num 0#>   .. .. [list output truncated]#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 8#>   .. ..$ Guam                                : num 3.03#>   .. ..$ Kiribati                            : num 45.4#>   .. ..$ Marshall Islands                    : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

Special arguments .xname, .xpos, .xparents and .xsiblings

With base rapply(), the f function only has access to the content of the list element under evaluation, and there is no convenient way to access its name or location in the nested list from inside the f function. To overcome this limitation, rrapply() defines the special arguments .xname, .xpos, .xparents and .xsiblings inside the f and condition functions (in addition to the principal function argument):

  • .xname evaluates to the name of the list element;
  • .xpos evaluates to the position of the element in the nested list structured as an integer vector;
  • .xparents evaluates to a vector of parent list names in the path to the current list element;
  • .xsiblings evaluates to the parent list containing the current list element and its direct siblings.

Using the .xname and .xpos arguments, we can transform or filter list elements based on their names and/or positions in the nested list:

## apply f based on element's namerrapply(  renewable_oceania,  condition = \(x) !is.na(x),  f = \(x, .xname) sprintf("Renewable energy in %s: %.2f%%", .xname, x),  how = "flatten") |>  head(n = 5)#>                                      Australia #>         "Renewable energy in Australia: 9.32%" #>                                    New Zealand #>      "Renewable energy in New Zealand: 32.76%" #>                                           Fiji #>             "Renewable energy in Fiji: 24.36%" #>                                  New Caledonia #>     "Renewable energy in New Caledonia: 4.03%" #>                               Papua New Guinea #> "Renewable energy in Papua New Guinea: 50.34%"## filter elements by namerrapply(  renewable_energy_by_country,  condition = \(x, .xname) .xname %in% c("Belgium", "Netherlands", "Luxembourg"),  how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ World:List of 1#>   ..$ Europe:List of 1#>   .. ..$ Western Europe:List of 3#>   .. .. ..$ Belgium    : num 9.14#>   .. .. ..$ Luxembourg : num 13.5#>   .. .. ..$ Netherlands: num 5.78

Knowing that Europe is located at renewable_energy_by_country[[c(1, 5)]], we can filter all European countries with a renewable energy share above 50% using the .xpos argument as follows,

## filter European countries > 50% using .xposrrapply(  renewable_energy_by_country,  condition = \(x, .xpos) identical(.xpos[1:2], c(1L, 5L)) && x > 50,  how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ World:List of 1#>   ..$ Europe:List of 2#>   .. ..$ Northern Europe:List of 3#>   .. .. ..$ Iceland: num 78.1#>   .. .. ..$ Norway : num 59.5#>   .. .. ..$ Sweden : num 51.4#>   .. ..$ Western Europe :List of 1#>   .. .. ..$ Liechtenstein: num 62.9

This can be done more conveniently using the .xparents argument, which this does not require looking up the location of Europe in the nested list,

## filter European countries > 50% using .xparentsrrapply(  renewable_energy_by_country,  condition = function(x, .xparents) "Europe" %in% .xparents && x > 50,  how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ World:List of 1#>   ..$ Europe:List of 2#>   .. ..$ Northern Europe:List of 3#>   .. .. ..$ Iceland: num 78.1#>   .. .. ..$ Norway : num 59.5#>   .. .. ..$ Sweden : num 51.4#>   .. ..$ Western Europe :List of 1#>   .. .. ..$ Liechtenstein: num 62.9

Using the .xpos argument, we can quickly look up the position of a specific element in the nested list,

## return position of Sweden in listrrapply(  renewable_energy_by_country,  condition = \(x, .xname) .xname == "Sweden",  f = \(x, .xpos) .xpos,  how = "flatten")#> $Sweden#> [1]  1  5  2 14

Using the .xsiblings argument, we can look up the direct neighbors of an element in the nested list,

## look up neighbors of Sweden in listrrapply(  renewable_energy_by_country,  condition = \(x, .xsiblings) "Sweden" %in% names(.xsiblings),  how = "flatten") |>  head(n = 10)#> Aland Islands       Denmark       Estonia Faroe Islands       Finland #>            NA         33.06         26.55          4.24         42.03 #>       Iceland       Ireland   Isle of Man        Latvia     Lithuania #>         78.07          8.65          4.30         38.48         31.42

We can also use the .xpos argument to determine the maximum depth of the list or the length of the longest sublist as follows,

## maximum list depthrrapply(  renewable_energy_by_country,   f = \(x, .xpos) length(.xpos),   how = "unlist") |>  max()#> [1] 5## longest sublist lengthrrapply(  renewable_energy_by_country,   f = \(x, .xpos) max(.xpos),   how = "unlist") |>  max()#> [1] 28

When unnesting nested lists with how = "bind", the .xname, .xpos or .xparents arguments can be useful to decide which list elements to include in the unnested data.frame:

## filter elements and unnest list  rrapply(  pokedex,  condition = \(x, .xpos, .xname) length(.xpos) < 4 & .xname %in% c("num", "name", "type"),  how = "bind") |>  head()#>   num       name          type#> 1 001  Bulbasaur Grass, Poison#> 2 002    Ivysaur Grass, Poison#> 3 003   Venusaur Grass, Poison#> 4 004 Charmander          Fire#> 5 005 Charmeleon          Fire#> 6 006  Charizard  Fire, Flying

Modifying list elements

By default, both base rapply() and rrapply() recurse into any list-like element. Setting classes = "list" in rrapply() overrides this behavior and applies the f function to any list element (i.e. a sublist) that satisfies the condition argument. If the condition is not satisfied for a list element, rrapply() recurses further into the sublist, applies f to the elements that satisfy condition and so on. The use of classes = "list" signals the rrapply() function not to descend into list objects by default. For this reason this behavior can only be triggered via the classes argument and not through the use of e.g. condition = is.list.

The mode classes = "list" can be useful to e.g. collapse sublists or calculate summary statistics across elements in a nested list:

## calculate mean value of Europerrapply(  renewable_energy_by_country,    condition = \(x, .xname) .xname == "Europe",  f = \(x) mean(unlist(x), na.rm = TRUE),  classes = "list",  how = "flatten")#>   Europe #> 22.36565

Note that the principal argument in the f function now evaluates to a list. For this reason, we first have to unlist the sublist before calculating the mean.

To calculate the mean renewable energy shares for each continent, we can make use of the fact that the .xpos vector of each continent has length (i.e. depth) 2:

## calculate mean value for each continent## (Antartica's value is missing)rrapply(  renewable_energy_by_country,   condition = \(x, .xpos) length(.xpos) == 2,  f = \(x) mean(unlist(x), na.rm = TRUE),  classes = "list") |>  str(give.attr = FALSE)#> List of 1#>  $ World:List of 6#>   ..$ Africa    : num 54.3#>   ..$ Americas  : num 18.2#>   ..$ Antarctica: logi NA#>   ..$ Asia      : num 17.9#>   ..$ Europe    : num 22.4#>   ..$ Oceania   : num 17.8

Remark: if classes = "list", the f function is only applied to the (non-terminal) list elements. To apply f to both terminal and non-terminal elements in the nested list, we can include additional classes, such as classes = c("list", "numeric", "character"). To apply f to any terminal and non-terminal element in the nested list, we can even combine classes = c("list", "ANY"). To illustrate, we search across all list elements for the country or region with M49-code "155":

## filter country or region by M49-coderrapply(  renewable_energy_by_country,  condition = \(x) attr(x, "M49-code") == "155",  f = \(x, .xname) .xname,  classes = c("list", "ANY"),   how = "unlist")#> World.Europe.Western Europe #>            "Western Europe"

As a more complex example, we unnest the Pokémon evolutions in pokedex into a wide data.frame by returning the sublists with Pokémon evolutions as character vectors:

## simplify pokemon evolutions to character vectors rrapply(  pokedex,  condition = \(x, .xname) .xname %in% c("name", "next_evolution", "prev_evolution"),   f = \(x) if(is.list(x)) sapply(x, `[[`, "name") else x,  classes = c("list", "character"),  how = "bind") |>  head(n = 9)#>         name        next_evolution         prev_evolution#> 1  Bulbasaur     Ivysaur, Venusaur                     NA#> 2    Ivysaur              Venusaur              Bulbasaur#> 3   Venusaur                    NA     Bulbasaur, Ivysaur#> 4 Charmander Charmeleon, Charizard                     NA#> 5 Charmeleon             Charizard             Charmander#> 6  Charizard                    NA Charmander, Charmeleon#> 7   Squirtle  Wartortle, Blastoise                     NA#> 8  Wartortle             Blastoise               Squirtle#> 9  Blastoise                    NA    Squirtle, Wartortle

Hint: as data.frames are also list-like objects, rrapply() applies f to individual data.frame columns by default. Set classes = "data.frame" to avoid this behavior and apply the f and condition functions to complete data.frame objects instead of individual data.frame columns.

## create a nested list of data.framesoceania_df <- rrapply(  renewable_oceania,  condition = \(x, .xpos) length(.xpos) == 2,  f = \(x) data.frame(name = names(x), value = unlist(x)),  classes = "list",  how = "replace")## this does not work!tryCatch({  rrapply(    oceania_df,    f = function(x) subset(x, !is.na(value)), ## filter NA-rows of data.frame    how = "replace"  )}, error = function(error) error$message)#> [1] "object 'value' not found"## this does workrrapply(  oceania_df,  f = function(x) subset(x, !is.na(value)),  classes = "data.frame",  how = "replace")[[1]][1:2]#> $`Australia and New Zealand`#>                    name value#> Australia     Australia  9.32#> New Zealand New Zealand 32.76#> #> $Melanesia#>                              name value#> Fiji                         Fiji 24.36#> New Caledonia       New Caledonia  4.03#> Papua New Guinea Papua New Guinea 50.34#> Solomon Islands   Solomon Islands 65.73#> Vanuatu                   Vanuatu 33.67

Recursive list updating

how = "recurse"

If classes = "list" and how = "recurse", rrapply() applies the f function to any list element that satisfies the condition argument, but recurses further into any updated list element after application of f. This can be useful to e.g. recursively update the class or other attributes of all elements in a nested list:

## recursively remove all list attributesrrapply(  renewable_oceania,  f = \(x) c(x),  classes = c("list", "ANY"),  how = "recurse") |>  str(list.len = 3, give.attr = TRUE)#> List of 1#>  $ Oceania:List of 4#>   ..$ Australia and New Zealand:List of 6#>   .. ..$ Australia                        : num 9.32#>   .. ..$ Christmas Island                 : logi NA#>   .. ..$ Cocos (Keeling) Islands          : logi NA#>   .. .. [list output truncated]#>   ..$ Melanesia                :List of 5#>   .. ..$ Fiji            : num 24.4#>   .. ..$ New Caledonia   : num 4.03#>   .. ..$ Papua New Guinea: num 50.3#>   .. .. [list output truncated]#>   ..$ Micronesia               :List of 8#>   .. ..$ Guam                                : num 3.03#>   .. ..$ Kiribati                            : num 45.4#>   .. ..$ Marshall Islands                    : num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

how = "names"

The option how = "names" is a special case of how = "recurse", where the value of f is used to replace the name of the evaluated list element instead of its content (as with all other how options). By default, how = "names" uses classes = c("list", "ANY") in order to allow updating of all names in the nested list.

## recursively replace all names by M49-codesrrapply(  renewable_oceania,  f = \(x) attr(x, "M49-code"),  how = "names") |>  str(list.len = 3, give.attr = FALSE)#> List of 1#>  $ 009:List of 4#>   ..$ 053:List of 6#>   .. ..$ 036: num 9.32#>   .. ..$ 162: logi NA#>   .. ..$ 166: logi NA#>   .. .. [list output truncated]#>   ..$ 054:List of 5#>   .. ..$ 242: num 24.4#>   .. ..$ 540: num 4.03#>   .. ..$ 598: num 50.3#>   .. .. [list output truncated]#>   ..$ 057:List of 8#>   .. ..$ 316: num 3.03#>   .. ..$ 296: num 45.4#>   .. ..$ 584: num 11.8#>   .. .. [list output truncated]#>   .. [list output truncated]

Conclusion

To conclude, we return to the list recursion exercise in the first section. Using rrapply(), one possible solution is to split the question into two steps as follows:

## look up position of Euler (Leonhard)euler <- rrapply(  students,  condition = \(x, .xname) .xname == "Euler" && attr(x, "given") == "Leonhard",  f = \(x, .xpos) .xpos,  classes = "list",  how = "flatten")[["Euler"]]## filter descendants of Euler (Leonhard) and replace missing values by zerorrapply(  students,  condition = \(x, .xpos) identical(.xpos[seq_along(euler)], euler),   f = \(x) replace(x, is.na(x), 0),  how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ Bernoulli:List of 1#>   ..$ Bernoulli:List of 1#>   .. ..$ Euler:List of 2#>   .. .. ..$ Euler   : num 0#>   .. .. ..$ Lagrange:List of 3#>   .. .. .. ..$ Fourier: num 73788#>   .. .. .. ..$ Plana  : num 0#>   .. .. .. ..$ Poisson: num 128235

Knowing that Johann Euler is a descendant of Leonhard Euler, we can further simplify this into a single function call using the .xparents argument:

## filter descendants of Euler (Leonhard) and replace missing values by zerorrapply(  students,  condition = \(x, .xparents) "Euler" %in% .xparents,  f = \(x) replace(x, is.na(x), 0),  how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ Bernoulli:List of 1#>   ..$ Bernoulli:List of 1#>   .. ..$ Euler:List of 2#>   .. .. ..$ Euler   : num 0#>   .. .. ..$ Lagrange:List of 3#>   .. .. .. ..$ Fourier: num 73788#>   .. .. .. ..$ Plana  : num 0#>   .. .. .. ..$ Poisson: num 128235

Or alternatively, we could first update the names of the elements in the nested list to include both first and last names and then prune the list in a second step:

## include first names in list element namesstudents_fullnames <- rrapply(  students,   f = \(x, .xname) paste(attr(x, "given"), .xname),  how = "names")## filter descendants of Euler (Leonhard) and replace missing values by zerorrapply(  students_fullnames,  condition = \(x, .xparents) "Leonhard Euler" %in% .xparents,  f = \(x) replace(x, is.na(x), 0),  how = "prune") |>  str(give.attr = FALSE)#> List of 1#>  $ Jacob Bernoulli:List of 1#>   ..$ Johann Bernoulli:List of 1#>   .. ..$ Leonhard Euler:List of 2#>   .. .. ..$ Johann Euler   : num 0#>   .. .. ..$ Joseph Lagrange:List of 3#>   .. .. .. ..$ Jean-Baptiste Fourier: num 73788#>   .. .. .. ..$ Giovanni Plana       : num 0#>   .. .. .. ..$ Simeon Poisson       : num 128235

Additional details

The latest stable version of the rrapply-package is available on CRAN. Additional details and examples on how to use the rrapply() function can be found at https://jorischau.github.io/rrapply/ and a quick reference sheet can be downloaded from the github repository at https://github.com/JorisChau/rrapply/.

Session Info

sessionInfo()#> R version 4.2.1 (2022-06-23)#> Platform: x86_64-pc-linux-gnu (64-bit)#> Running under: Ubuntu 20.04.4 LTS#> #> Matrix products: default#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0#> #> locale:#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              #>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    #>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   #>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 #>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            #> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       #> #> attached base packages:#> [1] stats     graphics  grDevices utils     datasets  methods   base     #> #> other attached packages:#> [1] rrapply_1.2.5#> #> loaded via a namespace (and not attached):#>  [1] bookdown_0.27   digest_0.6.29   R6_2.5.1        jsonlite_1.8.0 #>  [5] magrittr_2.0.3  evaluate_0.15   blogdown_1.10   stringi_1.7.8  #>  [9] rlang_1.0.4     cli_3.3.0       rstudioapi_0.13 jquerylib_0.1.4#> [13] bslib_0.3.1     rmarkdown_2.14  tools_4.2.1     stringr_1.4.0  #> [17] xfun_0.31       yaml_2.3.5      fastmap_1.1.0   compiler_4.2.1 #> [21] htmltools_0.5.2 knitr_1.39      sass_0.4.1
To leave a comment for the author, please follow the link and comment on their blog: R-bloggers | A Random Walk.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Efficient list recursion in R with {rrapply}

{rspm}: easy access to RSPM binary packages with automatic management of system requirements

$
0
0
[This article was first published on R – Enchufa2, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

There are many community projects out there that provide binary R packages for various distributions. You may know Michael Rutter’s legendary c2d4u.team/c2d4u4.0+ PPA, but this situation has been greatly improved more recently with Detlef Steuer’s autoCRAN OBS repo for OpenSUSE, my iucar/cran Copr repo for Fedora, and Dirk Eddelbuettel’s r2u repo, again, for Ubuntu. These have obvious advantages that come with the system package management layer, such as lightning-fast installations and updates, with automatic dependency management, reversibility, and multitenancy (several users sharing the same set of packages), among others (see this paper for further details). Moreover, the {bspm} package adds the integration layer that we were lacking for all these years, enabling a bridge to the system package manager that doesn’t require admin rights or for you to leave your beloved R console (i.e. the Windows experience on Linux).

However, it may be noticed that CentOS/RHEL was the great forgotten here, but there are quite a lot of users out there tied to this distro for different reasons. Moreover, such reasons usually imply that they don’t have access to admin rights at all, not even for the setup.

So here I announce that I created the {rspm} package, which, as its name indicates, enables easy access to RStudio Public Package Manager (i.e., it does the repo setup for you), but also monitors and scans every installation to automatically detect, download, install and configure any missing system requirements. Most importantly, this is done in full user-mode (i.e., system requirements are installed into the user home) in a dynamic way (no need to restart, no need to manage environment variables). It is definitely not as fast as the other projects, but it is complementary in the sense that this may be compatible with {renv}/{pak}-based workflows (I didn’t try yet, but it would require minor adjustments if not none at all).

I made this primarily targeted at CentOS Stream 8, but for nearly the same price I added support for Ubuntu (bionic, focal, jammy) too (although note that this requires the installation of the apt-file utility). Please give it a spin if you feel like it, and let me know how it goes. Here’s a demo of the installation of {sf}, which, as you may know, has quite a number of system requirements:

Finally, I would like to thank RStudio for their investment in providing this extremely useful resource for the Linux R community.

Article originally published in Enchufa2.es: {rspm}: easy access to RSPM binary packages with automatic management of system requirements.
To leave a comment for the author, please follow the link and comment on their blog: R – Enchufa2.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: {rspm}: easy access to RSPM binary packages with automatic management of system requirements

New Package yfR

$
0
0
[This article was first published on rOpenSci - open tools for open science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Package yfR recently passed peer review at rOpenSci and is all about downloading stock price data from Yahoo Finance (YF). I wrote this package to solve a particular problem I had as a teacher: I needed a large volume of clean stock price data to use in my classes, either for explaining how financial markets work or for class exercises. While there are several R packages to import raw data from YF, none solved my problem.

Package yfR facilitates the importation of data, organizing it in the tidy format and speeding up the process using a cache system and parallel computing. yfR is a backwards-incompatible substitute of BatchGetSymbols, released in 2016 (see vignette yfR and BatchGetSymbols for details).

Introducing yfR

Yahoo Finance provides a vast repository of stock price data around the globe. It covers a significant number of markets and assets, and is therefore used extensively in academic research and teaching. In order to import the financial data from YF, all you need is a ticker (id of a stock, e.g. “GM” for General Motors) and a time period – first and last date.

Features of yfR

Package yfR distinguishes itself from other similar packages with the following features:

  • Fetches daily/weekly/monthly/annual stock prices/returns from yahoo finance and outputs a dataframe (tibble) in the long format (stacked data);

  • A feature called collections facilitates download of multiple tickers from a particular market/index. You can, for example, download data for all stocks in the SP500 index with a simple call to yf_collection_get("SP500");

  • A session-persistent smart cache system is available by default. This means that the data is saved locally and only missing portions are downloaded, if needed.

  • All dates are compared to a benchmark index such as SP500 (^GSPC) and, whenever an individual asset does not have a sufficient number of dates, the software drops it from the output. This means you can choose to ignore tickers with a high proportion of missing dates.

  • A customized function called yf_convert_to_wide() can transform the long dataframe into a wide format (tickers as columns), which is much used in portfolio optimization. The output is a list where each element is a different target variable (prices, returns, volumes).

  • Parallel computing with package furrr is available, speeding up the data importation process.

Available columns

The main function of the package, yfR::yf_get, returns a dataframe with the financial data. All price data is measured at the unit of the financial exchange. For example, price data for GM (NASDAQ/US) is measured in US dollars, while price data for PETR3.SA (B3/BR) is measured in Reais (Brazilian currency).

The returned data contains the following columns:

ticker: The requested tickers (ids of stocks);

ref_date: The reference day (this can also be year/month/week when using argument freq_data);

price_open: The opening price of the day/period;

price_high: The highest price of the day/period;

price_close: The closing/last price of the day/period;

volume: The financial volume of the day/period, in the unit of the exchange;

price_adjusted: The stock price adjusted for corporate events such assplits, dividends and others – this is usually what you want/need for studyingstocks as it represents the real financial performance of stockholders;

ret_adjusted_prices: The arithmetic or log return (see input type_return) for the adjusted stockprices;

ret_adjusted_prices: The arithmetic or log return (see input type_return) for the closing stockprices;

cumret_adjusted_prices: The accumulated arithmetic/log return for the period (starts at 100%).

Installation

Package yfR is available in its stable version in CRAN, but you can also find the latest features and bug fixes in GitHub and rOpenSci repository. Below you can find the R commands for installation in each case.

# CRAN (stable)install.packages('yfR')# GitHub (dev version)devtools::install_github('ropensci/yfR')# rOpenSciinstall.packages("yfR", repos = c("https://ropensci.r-universe.dev", "https://cloud.r-project.org"))

Examples of usage

The SP500 historical performance

In this example we are going to download price data for the SP500 index from 1950 to today (2022-07-25), analyze its financial performance and also visualize its prices using ggplot2.

library(yfR)library(lubridate) # for date manipulationslibrary(dplyr) # for data manipulations# set options for algorithmmy_ticker <- '^GSPC'first_date <- "1950-01-01"last_date <- Sys.Date()# fetch datadf_yf <- yf_get(tickers = my_ticker,first_date = first_date,last_date = last_date)# output is a tibble with dataglimpse(df_yf)Rows: 18,257Columns: 11$ ticker <chr> "^GSPC", "^GSPC", "^GSPC", "^GSPC", "^GSPC", "^…$ ref_date <date> 1950-01-03, 1950-01-04, 1950-01-05, 1950-01-06…$ price_open <dbl> 16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09…$ price_high <dbl> 16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09…$ price_low <dbl> 16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09…$ price_close <dbl> 16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09…$ volume <dbl> 1260000, 1890000, 2550000, 2010000, 2520000, 21…$ price_adjusted <dbl> 16.66, 16.85, 16.93, 16.98, 17.08, 17.03, 17.09…$ ret_adjusted_prices <dbl> NA, 0.0114045618, 0.0047477745, 0.0029533373, 0…$ ret_closing_prices <dbl> NA, 0.0114045618, 0.0047477745, 0.0029533373, 0…$ cumret_adjusted_prices <dbl> 1.000000, 1.011405, 1.016206, 1.019208, 1.02521…

The output of yfR is a tibble (dataframe) with the stock price data. We can use it to 1) get the number of years within the data, and 2) calculate the annual financial performance of the index:

n_years <- interval(min(df_yf$ref_date),max(df_yf$ref_date))/years(1)total_return <- last(df_yf$price_adjusted)/first(df_yf$price_adjusted) - 1cat(paste0("n_years = ", n_years, "\n","total_return = ",total_return))n_years = 72.5479452054795total_return = 236.792910144058

In 1950-01-03, the index was valued at 16.66. Today (2022-07-25), after roughly 72 years, the value of the index is 3961.629883. The total return for the SP500, without accounting for inflation, is equivalent to an impressive 23 679%! Overall, anyone holding stocks for that long has done very well financially.

Additionally, we can also calculate performance as the compounded annual return, which is the usual figure reported when looking stocks in the long run:

ret_comp <- (1 + total_return)^(1/n_years) - 1cat(paste0("Comp Return = ",scales::percent(ret_comp, accuracy = 0.01)))Comp Return = 7.83%

Over the 72 of existence, the SP500 index returned an annual compounded interest of 7.83%. This is quite in line with the roughly 8% per year reported in the media.

To visualize the data, we can use a log plot and see the value of the SP500 index over time:

library(ggplot2)p <- ggplot(df_yf, aes(x = ref_date, y = price_adjusted)) +geom_line() +labs(title = paste0("SP500 Index Value (",year(min(df_yf$ref_date)), ' - ',year(max(df_yf$ref_date)), ")"),x = "Time",y = "Index Value",caption = "Data from Yahoo Finance <https://finance.yahoo.com/>") +theme_light() +scale_y_log10()p
Black and white line graph showing the SP500 index value increasing over time.The x axis is time from 1950 to 2020 and the y axis is on a log scale and shows index values increasing from <30 to  data-recalc-dims=3000." width="450"/>

SP500 index value since 1950

Performance of many stocks

In this second example, instead of using a single stock/index, we will investigate the financial performance of a set of ten stocks using dplyr. First, let’s download the current composition of the SP500 index and select 10 random stocks.

set.seed(20220713)n_tickers <- 10df_sp500 <- yf_index_composition("SP500")✔ Got SP500 composition with 503 rowsrnd_tickers <- sample(df_sp500$ticker, n_tickers)cat(paste0("The selected tickers are: ",paste0(rnd_tickers, collapse = ", ")))The selected tickers are: AAPL, DHI, AMZN, CMS, FCX, NRG, EXR, CFG, CI, AWK

And now we fetch the data using yfR::yf_get:

df_yf <- yf_get(tickers = rnd_tickers,first_date = '2010-01-01',last_date = Sys.Date())

Out of the 10 stocks, one was left out due to the high number of missing days. Internally, yf_get compares every ticker to a benchmark time series, in this case the SP500 index itself (see yf_get’s argument bench_ticker). Whenever the proportion of missing days is higher than the default case (thresh_bad_data = 0.75), the algorithm drops the ticker from the output. In the end, we are left with just nine stocks.

First, let’s look at their accumulated return over time:

library(ggplot2)p <- ggplot(df_yf,aes(x = ref_date,y = cumret_adjusted_prices,color = ticker)) +geom_line() +labs(title = paste0("SP500 Index Value (",year(min(df_yf$ref_date)), ' - ',year(max(df_yf$ref_date)), ")"),x = "Time",y = "Accumulated Return (from 100%)",caption = "Data from Yahoo Finance <https://finance.yahoo.com/>") +theme_light() +scale_y_log10()p
Line graph showing the accumulated returns of 9 stocks on the SP500 index value. The x axis shows time running from 2010 to 2022, while the y axis shows accumulated return (from 100%) ranging from 0.1 to  data-recalc-dims= 10 on a log scale. Three stocks show sharply increasing patterns, four show moderately increasing patterns and two show fluctuating horizontal trends." width="450"/>

Accumulated Return of 9 stocks

As we can see, some stocks, such as AMZN and AAPL, did much better than others. We can check this numerically by reporting their compounded return over the period:

library(dplyr)tab_perf <- df_yf |>group_by(ticker) |>summarise(n_years = interval(min(ref_date),max(ref_date))/years(1),total_ret = last(price_adjusted)/first(price_adjusted) - 1,ret_comp = (1 + total_ret)^(1/n_years) - 1)tab_perf |>mutate(n_years = floor(n_years),total_ret = scales::percent(total_ret),ret_comp = scales::percent(ret_comp)) |>knitr::kable(caption = "Financial Performance of Several Stocks")

Table: Table 1: Financial Performance of Several Stocks

tickern_yearstotal_retret_comp
AAPL122 258%28.65%
AMZN121 729%26.07%
AWK12776%18.88%
CI12665%17.61%
CMS12526%15.74%
DHI12696%17.98%
EXR122 146%28.15%
FCX12-12%-0.99%
NRG1281%4.85%

Final thoughts

Package yfR was created to facilitate the importation and organization of YF data sets. In the examples of this post, we can see how easy it is to download the data and do some simple performance statistics. We only scratched the surface, there are many ways to analyze stock data, not just financial performance.

Acknowledgements

Package yfR was reviewed by Alexander Fischer and Nic Crane, and I’m very grateful for their feedback, which improved the package significantly. I’m also grateful to Joshua Ulrich, the maintainer of quantmod, which wrote quantmod::getSymbols, the main function used by yfR::yf_get

To leave a comment for the author, please follow the link and comment on their blog: rOpenSci - open tools for open science.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: New Package yfR

How to Create Summary Tables in R

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Create Summary Tables in R appeared first on Data Science Tutorials

How to Create Summary Tables in R?, The describe() and describeBy() methods from the psych package is the simplest to use for creating summary tables in R.

How to apply a transformation to multiple columns in R?

library(psych)

Let’s create a summary table

describe(df)

We can now create a summary table that is organized by a certain variable.

describeBy(df, group=df$var_name)

The practical application of these features is demonstrated in the examples that follow.

Example 1:- Create a simple summary table

Let’s say we have the R data frame shown below:

make a data frame

df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2', 'P1'),
points=c(150, 222, 229, 421, 330, 211, 219),
rebounds=c(17, 28, 36, 16, 17, 29, 15),
steals=c(11, 151, 152, 73, 85, 79, 58))

Now we can view the data frame

df
   team points rebounds steals
1   P1    150       17     11
2   P1    222       28    151
3   P1    229       36    152
4   P2    421       16     73
5   P2    330       17     85
6   P2    211       29     79
7   P1    219       15     58

For each variable in the data frame, a summary table can be made using the describe() function.

Add new calculated variables to a data frame and drop all existing variables

library(psych)

Now will create a summary table

describe(df)
vars n   mean    sd median trimmed   mad min max range skew kurtosis
team*       1 7   1.43  0.53      1    1.43  0.00   1   2     1 0.23    -2.20
points      2 7 254.57 90.56    222  254.57 16.31 150 421   271 0.71    -1.03
rebounds    3 7  22.57  8.30     17   22.57  2.97  15  36    21 0.44    -1.73
steals      4 7  87.00 50.34     79   87.00 31.13  11 152   141 0.08    -1.47
            se
team*     0.20
points   34.23
rebounds  3.14
steals   19.03

Here’s how to interpret each value in the output:

vars: column number

n: Number of valid cases

mean: The mean value

median: The median value

trimmed: The trimmed mean (default trims 10% of observations from each end)

mad: The median absolute deviation (from the median)

min: The minimum value

max: The maximum value

range: The range of values (max – min)

skew: The skewness

kurtosis: The kurtosis

se: The standard error

Any variable that has an asterisk (*) next to it has been transformed from being categorical or logical to becoming a numerical variable with values that represent the numerical ordering of the values.

How to Use Spread Function in R?-tidyr

We shouldn’t take the summary statistics for the variable “team” which has been transformed into a numerical variable.

Also, take note that the setting fast=TRUE allows you to merely compute the most typical summary statistics.

Now we can create a smaller summary table

describe(df, fast=TRUE)
         vars n   mean    sd min  max range    se
team        1 7    NaN    NA Inf -Inf  -Inf    NA
points      2 7 254.57 90.56 150  421   271 34.23
rebounds    3 7  22.57  8.30  15   36    21  3.14
steals      4 7  87.00 50.34  11  152   141 19.03

Additionally, we have the option of only computing the summary statistics for a subset of the data frame’s variables:

make a summary table using only the columns “points” and “rebounds”

describe(df[ , c('points', 'rebounds')], fast=TRUE)
         vars n   mean    sd min max range    se
points      1 7 254.57 90.56 150 421   271 34.23
rebounds    2 7  22.57  8.30  15  36    21  3.14

Example 2: Make a summary table that is grouped by a certain variable.

The describeBy() function can be used to group the data frame’s summary table by the variable “team” using the following code.

build the summary table with teams as the primary grouping.

How to Use Mutate function in R – Data Science Tutorials

describeBy(df, group=df$team, fast=TRUE)

Descriptive statistics by group

group: P1
         vars n mean    sd min  max range    se
team        1 4  NaN    NA Inf -Inf  -Inf    NA
points      2 4  205 36.91 150  229    79 18.45
rebounds    3 4   24  9.83  15   36    21  4.92
steals      4 4   93 70.22  11  152   141 35.11
-------------------------------------------------------------
group: P2
         vars n   mean     sd min  max range    se
team        1 3    NaN     NA Inf -Inf  -Inf    NA
points      2 3 320.67 105.31 211  421   210 60.80
rebounds    3 3  20.67   7.23  16   29    13  4.18
steals      4 3  79.00   6.00  73   85    12  3.46

The summary statistics for each of the three teams in the data frame are displayed in the output.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post How to Create Summary Tables in R appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Create Summary Tables in R

R Ladies Philly is Making a Difference with its Annual Datathon Focused on Local Issues

$
0
0
[This article was first published on R Consortium, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Alice Walsh and Karla Fettich of the R Ladies Philly talked to the R Consortium about the thriving R Community in Philadelphia. The group has broadened its reach both locally and internationally during the pandemic. However, they have a deep commitment to the local community and remain focused on local issues. Every year, the group partners with local non-profit organizations to host a Datathon to promote learning while contributing to the local community. 

Alice Walsh is a founding organizer of the R Ladies Philly. She works at Pathos, an Oncology Therapeutics company, using data to position cancer drugs. Alice got her Ph.D. in Bioengineering from the University of Pennsylvania. Her work is at the intersection of bioinformatics, molecular biology, and data science.

Karla Fettich is a co-organizer of the R Ladies Philly. She works as a Senior Data Scientist at AmeriHealth Caritas, where she builds identification and stratification solutions for different populations in the healthcare industry, and coordinates larger data science efforts. Karla got her Ph.D. in Psychology and Neuroscience from Temple University.


What is the R community like in Philadelphia?

Alice: The R community in Philadelphia is very vibrant, I would say. Even though it’s not known as a technology hub, Philadelphia is a city with a lot of data and tech happening. It’s not like the Bay Area or Silicon Valley, but there’s a very vibrant data science tech community in Philadelphia. 

We know other R-User groups and Data Science groups in Philadelphia, and we have collaborated with them. There’s the Data Philly Group and also the Philly R-User Group, which took a hiatus during the pandemic and is back now. There are also some Python groups. 

The Healthcare industry in Philadelphia is robust and several members of our group are working in healthcare. We also have several members who work in media because Comcast is a large local employer. Overall, the Philadelphia R community is characterized by a focus on specific industries. 

Karla: Not only vibrant but it’s also a great, supportive, and fun community. We have been to a couple of Python events and they just really don’t have that vibe. In the R community, people are keen to learn. Users at all levels are happy to share their knowledge and learn from others. There’s always a lot of excitement and everyone’s just really eager to work together. So the R community in Philadelphia has been very collaborative. 

How has COVID affected your ability to connect with members?

Alice: Our pre-pandemic events were always in person and for many people, it was difficult to commute. Our events are now online, and we have been able to reach a lot more people. We also have an audience joining our events internationally, and it has been amazing to broaden our reach during the pandemic. It has also become easier to share our events because we record them and upload them on our YouTube channel. 

But now we are figuring out that some events are better online than in person. We don’t do a lot of speaking events and most of our events are interactive workshops. And I have actually found that it is very good to be delivered in an online format.

We have also realized that it is really difficult to do networking online. And that is also something which was an important part of our mission. Connecting people to mentors who are in the industry can help them with career moves and things like that. We have done online networking events, but I think that’s something we have to do in person. So from now on, we are trying to be very strategic about when we have an in-person event versus when we have an online event. We want to pick the format that best suits the content and makes us reach a maximum number of people. We are still figuring it out. 

Karla: I just wanted to add to the comment about how we can now reach an international or broader audience. It’s been great. Not necessarily globally, but also reaching the people who might not ‌commute to a physical location though they are around in the area. We have been able to reach more of those. But I think the challenge we have encountered is trying to stay true to our mission, which is to focus on the local community. So we love and appreciate having a global community join us. But it has made it ‌tricky to figure out how we can still keep the local essence of our chapter.

In the past year, did you have to change your techniques to connect and collaborate with members? For example, did you use GitHub, video conferencing, online discussion groups more? Can these techniques be used to make your group more inclusive to people that are unable to attend physical events in the future?  

Alice: Maybe a good example of how we use different tools would be our collaborative community data project we have every year, which is Karla’s brainchild. We pair up with a local nonprofit to help them work with their data. Volunteers from our community work with them to show them what they can do with their data. The volunteers get training to work with an actual dataset, and the training partner gets to learn something useful to take forward. Maybe they hire data scientists or maybe they decide there’s more potential to use their data. 

So that project then involves a lot of collaboration. We use Zoom to do the actual kickoff meeting with the partner, and we use tools like Slido for organizing Q&A during live events. We use Google Docs for additional Q&A, usually to capture questions and answers asynchronously. People type in their questions and when the partner has time, they can go in and answer them. We also have a Slack workspace where teams can have their own channels. In the past, they would meet in a coffee shop and work on it together. Instead, now they meet up in the Slack channel or have a Zoom meeting to discuss what they are working on. 

I think that’s a good example of how we use a lot of different tools for one project. And then we aggregate all the code and results to GitHub. We always have a repo for each year’s projects and that way we can bring everything together in a final report.

I think the plan is that we will continue to have a mix of online and in-person events. Right now, I think it is a challenge for small groups like us, without a big budget, to host a hybrid event. That requires a media team because your speaker needs a microphone and someone needs to film. We would love to have a technology solution to make it happen.

For now, we will host in-person events when we feel it can be done better in person. We will also try to have a lot of programming workshops online, so that we can kind of have the best of both worlds. Recently, when we tried to move back to in-person events, many people asked us if we will be a hybrid event. But we don’t have the technical capability to do that at the moment. I think other people are figuring it out, so maybe we can learn from them, but it’s definitely a challenge. 

Karla: For the Datathons, we used to have the kickoff and conclusion meetings in person. People could come and present their findings and have the partners involved. It was also a good way to ask questions, get everybody involved, and network. These meetings have moved online, and it has been easy to record and save for people to refer back to. While everything else went really well online, I feel that the first and last meetings worked better in person.

Can you tell us about one recent presentation or speaker that was especially interesting and what was the topic and why was it so interesting? 

Karla: I would like to mention this year’s Datathon which went from February to March. Our partner this year was a non-profit organization that helps elderly or disabled people in the community and connects them with the services they need. They hadn’t really explored their data, and they really wanted to know their impact on their community. 

So we split our team of volunteers into three different groups, each tackling a different aspect of the question. One group focused on the impact on the community and another group focused on data visualization and helping in putting it together for decision making. That’s something we know is really useful in the industry, but not everyone takes advantage of that. Another group focused on opportunities for further growth and comparisons between current impact and what else is out there. 

So it’s been a very insightful Datathon because each team dug really deep into the data. They presented the data in a way that was clear and would help the organization. The organization has really taken this report to heart, and they have been working on it. It helped them rethink how they relate to the data and what ‌data collection they should do going forward so that they can leverage data better in the future. I know their board is currently discussing the results as well, so they are planning on taking action based on the results. It has been a really fascinating Datathon, just like the ones in the past. Each Datathon uncovers something really interesting and leads to other projects afterward. 

Alice: The thing that I find really special about doing this work is that we always focus on issues that are important to our local community and local groups. So this year we were working with a local nonprofit that works with the local senior citizens and folks who need help. It is very meaningful, and that’s been our mission as well. So while we love having reached a broader audience now, we want to make sure that we can focus on what makes Philadelphia unique and try to tap into that.

What trends do you see in R language affecting your organization over the next year?

Alice: We ‌see trends in the topics people are interested in. Most of our events in the past have been very educational. We usually have people do a workshop on a specific package or an intermediate or advanced R topic because that’s been very popular. So we have trends over time on what’s popular, and the ‌workshops people are interested in attending and presenting. 

For example, we are doing a book club this summer to talk about using tidymodels. The tidymodels framework is relatively new to R, and machine learning has always been a big topic that people enjoy. I think it’s mostly because it’s broadly relevant across all industries. I do oncology research, and there are applications there, but also in manufacturing, geospatial and other fields. So when new packages or developments come out related to these core topics, they will influence our programming. Over time, there are changes in the R landscape which bring changes in what we talk about. 

Karla: I think recently, machine learning and data visualization have been popular. I think geospatial stuff has also been very popular in the past. We are ‌actively listening to our community and seeing what they are interested in learning and doing, and try to accommodate that with workshops. We try to encourage people who are interested in that topic to lead a talk or a workshop, and they don’t have to be experts. They can either do it themselves or they can find experts to do that. We encourage people to speak up about what they are interested in and then we tailor our events. 

Do you know of any data journalism efforts by your members? If not, are there particular data journalism projects that you’ve seen in the last year that you feel had a positive impact on society?

Alice: I can’t think of something super specific by our members. In the past, some of our Datathon efforts have hit on local issues around which there has been journalism. For example, the opioid epidemic is very important in Philadelphia. Because we have been very hard hit by opioid use disorder, and there’s a project we did there. That was a couple of years ago now. 

Karla: From that Datathon, another project emerged. Because during that project, one thing we were focusing on was mapping treatment locations for opioid use disorder. They have actually taken that idea and worked towards putting up a website and it has been released recently. 

When is your next event? Please give details!

Our next event should be on our Meetup. We are doing a community-wide book club around tidymodels and that is coming up in August.


How do I Join?

R Consortium’s R User Group and Small Conference Support Program (RUGS) provides grants to help R groups around the world organize, share information and support each other. We have given grants over the past four years, encompassing over 65,000 members in 35 countries. We would like to include you! Cash grants and meetup.com accounts are awarded based on the intended use of the funds and the amount of money available to distribute. We are now accepting applications!

The post R Ladies Philly is Making a Difference with its Annual Datathon Focused on Local Issues appeared first on R Consortium.

To leave a comment for the author, please follow the link and comment on their blog: R Consortium.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Ladies Philly is Making a Difference with its Annual Datathon Focused on Local Issues

RStudio is becoming Posit

$
0
0
[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today we are very excited to announce that RStudio has a new name: Posit. This is a big change, and below we’ll talk about exactly why we are doing this and what it means. But first—why Posit? Posit is a real word that means to put forth an idea for discussion. Data scientists spend much of their day positing claims that they then evaluate with data. When considering a new name for the company we wanted something that reflects both the work our community engages in (testing hypotheses!) as well as the scientific aspiration to build ever- greater levels of knowledge and understanding.

The R and RStudio communities have become something very special. We’ve helped people pose and answer difficult and consequential questions with data. We’ve built open source tools to make “code-first” data science accessible and approachable to millions of people, and established reproducibility as a baseline expectation for analysis and communication. And around all of this we’ve seen the development of an inclusive, supportive, diverse community, sincerely interested in empowering each other to do more.

One of the central ideas that this community has rallied behind is the belief that it’s imperative to use open source software for scientific work. Scientific work needs to be reproducible, resilient (not captive to a software vendor), and must encourage broad participation in the creation of the tools themselves. At the same time, it is challenging to secure long-term, sustainable funding for the open source software needed to make this happen.

As the community has grown and we’ve seen the impact of our collective efforts, we have realized that one of the most important problems that RStudio has solved is melding its core mission of creating open source software with the imperatives of sustaining a commercial enterprise. This is a tricky business, and especially so today, as corporations are frequently forced into doing whatever it takes to sustain growth and provide returns to shareholders, even against the interests of their own customers! To avoid this problem and codify our mission into our company charter, we re-incorporated as a Public Benefit Corporation in 2019.

Our charter defines our mission as the creation of free and open source software for data science, scientific research, and technical communication. This mission intentionally goes beyond “R for Data Science”—we hope to take the approach that’s succeeded with R and apply it more broadly. We want to build a company that is around in 100 years time that continues to have a positive impact on science and technical communication. We’ve only just started along this road: we’re experimenting with tools for Python and our new Quarto project aims to impact scientific communication far beyond data science.

In many ways we are at the outset of a new phase of RStudio’s development. For the first phase, we made the potentially confusing decision of naming our company after our IDE that was initially focused on R users. We kept that name even as our offerings grew to much more than just an IDE, and served many languages apart from R. While that made sense at the time, it’s become increasingly challenging to keep that name as our charter has grown broader.

While we of course feel sad moving away from the RStudio name that’s served us so well, we also feel excited about the future of Posit. We’re thrilled that we found a name that we think so accurately captures what people do with our tools and we’re excited to make our broader mission more clear to the outside world. We’re also happy that the RStudio name will live on, retaining its original purpose: identifying the best IDE for data science with R.

What does the new name mean for our commercial software? In many ways, nothing: our commercial products have supported Python for over 2 years. But we will rename them to Posit Connect, Posit Workbench, and Posit Package Manager so it’s easier for folks to understand that we support more than just R. What about our open source software? Similarly, not much is changing: our open source software is and will continue to be predominantly for R. That said, over the past few years we’ve already been investing in other languages like reticulate (calling Python from R), Python features for the IDE, and support for Python and Julia within Quarto. You can expect to see more multilanguage experiments in the future.

So while you will see our name change in a bunch of places (including our main corporate website), we are still continuing on the same path. That path has widened as we have succeeded in the original mission, and we are excited at the chance to bring what we all love so much about the R community to everyone.

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: RStudio is becoming Posit

How to Set Up Quarto with Docker, Part 1: Static Content

$
0
0
[This article was first published on R - Hosting Data Apps, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
How to Set Up Quarto with Docker, Part 1: Static Content

Quarto is an open-source scientific and technical publishing system built on Pandoc. It is a cross-platform tool to create dynamic content with Python, R, Julia, and Observable.

How to Set Up Quarto with Docker, Part 1: Static ContentQuarto.org

Quarto documents follow literate programming principles where the code “chunks” are weaved together with text chunks. From an R programming perspective, Quarto documents with a .qmd file extension are very similar to R Markdown documents (.Rmd).

Both .Rmd and .qmd files have a YAML header between the triple dashes. The main difference between the two is how the options are specified. Here is a Quarto example:

---title: "Quarto Demo"format:   html:    code-fold: true---## Air Quality@fig-airquality further explores the impact of temperature on ozone level.```{r}#| label: fig-airquality#| fig-cap: Temperature and ozone level.#| warning: falselibrary(ggplot2)ggplot(airquality, aes(Temp, Ozone)) +   geom_point() +   geom_smooth(method = "loess")```

In this post, I am going to walk you through how to containerize Quarto documents. You will learn how to install Quarto inside a Docker container and how to use the command line tool to render and eventually serve static Quarto documents.

This post builds on a previous R Markdown-focused article:

Containerizing Interactive R Markdown Documents
R Markdown is a reproducible authoring format supporting dozens of static and dynamic output formats. Let’s review why and how you should containerize Rmd files.
How to Set Up Quarto with Docker, Part 1: Static ContentHosting Data AppsPeter Solymos
How to Set Up Quarto with Docker, Part 1: Static Content

Prerequisites

The code from this post can be found in the analythium/quarto-docker-examples GitHub repository:

GitHub – analythium/quarto-docker-examples: Quarto Examples with Docker
Quarto Examples with Docker. Contribute to analythium/quarto-docker-examples development by creating an account on GitHub.
How to Set Up Quarto with Docker, Part 1: Static ContentGitHubanalythium
How to Set Up Quarto with Docker, Part 1: Static Content

You will also need Docker Desktop installed.

If you want Quarto to be installed on your local machine, follow these two links to get started: Quarto docs, and RStudio install resources.

Create a Quarto parent image

We build a parent image with Quarto installed so that we can use this image in subsequent FROM instructions in the Dockerfiles. See the Dockerfile.base in the repository. The image is based on the `is based on the eddelbuettel/r2u image using Ubuntu 20.04.

FROM eddelbuettel/r2u:20.04RUN apt-get update && apt-get install -y --no-install-recommends \    pandoc \    pandoc-citeproc \    curl \    gdebi-core \    && rm -rf /var/lib/apt/lists/*RUN install.r \    shiny \    jsonlite \    ggplot2 \    htmltools \    remotes \    renv \    knitr \    rmarkdown \    quartoRUN curl -LO https://quarto.org/download/latest/quarto-linux-amd64.debRUN gdebi --non-interactive quarto-linux-amd64.debCMD ["bash"]

The important bits are to have curl and gdebi installed so that we can grab the quarto-linux-amd64.deb file and install the Quarto command line tool. This will install the latest version.

If you want a specific Quarto version (like 0.9.522), use the following lines instead, and change the version to the one you want:

ARG QUARTO_VERSION="0.9.522"RUN curl -o quarto-linux-amd64.deb -L https://github.com/quarto-dev/quarto-cli/releases/download/v${QUARTO_VERSION}/quarto-${QUARTO_VERSION}-linux-amd64.debRUN gdebi --non-interactive quarto-linux-amd64.deb

Now we can build the image:

docker build \    -f Dockerfile.base \    -t analythium/r2u-quarto:20.04 .

We can run the container interactively to check the installation:

docker run -it --rm analythium/r2u-quarto:20.04 bash

Type quarto check, you should see check marks:

root@d8377016be7f:/# quarto check[✓] Checking Quarto installation......OK      Version: 1.0.36      Path: /opt/quarto/bin[✓] Checking basic markdown render....OK[✓] Checking Python 3 installation....OK      Version: 3.8.10      Path: /usr/bin/python3      Jupyter: (None)      Jupyter is not available in this Python installation.      Install with python3 -m pip install jupyter[✓] Checking R installation...........OK      Version: 4.2.1      Path: /usr/lib/R      LibPaths:        - /usr/local/lib/R/site-library        - /usr/lib/R/site-library        - /usr/lib/R/library      rmarkdown: 2.14[✓] Checking Knitr engine render......OK

We are missing Jupyter, but we won't need it for this tutorial. If you want to learn more about the available quarto commands and options, type quarto help in the shell to see what is available.

Type exit to quit the session.

To leave a comment for the author, please follow the link and comment on their blog: R - Hosting Data Apps.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Set Up Quarto with Docker, Part 1: Static Content

Error in rbind(deparse.level, …) : numbers of columns of arguments do not match

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Error in rbind(deparse.level, …) : numbers of columns of arguments do not match appeared first on finnstats.

If you are interested to learn more about data science, you can find more articles here finnstats.

Error in rbind(deparse.level, …) : numbers of columns of arguments do not match, this issue happens when you try to row-bind two or more data frames together in R using the rbind() function, but the data frames don’t all have the same number of columns.

Error in rbind(deparse.level, …) : numbers of columns of arguments do not match

This guide explains in detail how to resolve this issue. How to Reproduce the Error?

Suppose we have the two R data frames shown below.

Let’s create the first data frame

df1 <- data.frame(x=c(11, 14, 14, 25, 13),
                  y=c(24, 24, 12, 28, 20))
df1
  x  y
1 11 24
2 14 24
3 14 12
4 25 28
5 13 20

Now we can create a second data frame

df2 <- data.frame(x=c(12, 21, 12, 50, 17),
                  y=c(31, 61, 20, 20, 10),
                  z=c(21, 17, 17, 18, 25))
df2
  x  y  z
1 12 31 21
2 21 61 17
3 12 20 17
4 50 20 18
5 17 10 25

Now imagine that we try to row-bind these two data frames into a single data frame using rbind:

Row-binding the two data frames together is being attempted.

rbind(df1, df2)
Error in rbind(deparse.level, ...) :
  numbers of columns of arguments do not match

The two data frames don’t have the same number of columns, thus we get an error.

How to correct the issue?

There are two solutions to this issue:

Approach1: Use rbind on Common Columns

Using the intersect() method to identify the shared column names between the data frames and then row-binding the data frames solely to those columns is one technique to solve this issue.

Find common column names

common <- intersect(colnames(df1), colnames(df2))

Now row-bind only on common column names

df3 <- rbind(df1[common], df2[common])

Now we can view the result

df3
   x  y
1  11 24
2  14 24
3  14 12
4  25 28
5  13 20
6  12 31
7  21 61
8  12 20
9  50 20
10 17 10

Approach 2: Use bind_rows() from dplyr

Using the bind_rows() function from the dplyr package, which automatically fills in NA values for column names that do not match, is another way to solve this issue:

library(dplyr)

Let’s bind together the two data frames

df3 <- bind_rows(df1, df2)

Let’s view the result

df3
   x  y  z
1  11 24 NA
2  14 24 NA
3  14 12 NA
4  25 28 NA
5  13 20 NA
6  12 31 21
7  21 61 17
8  12 20 17
9  50 20 18
10 17 10 25

Due to the absence of column z in this data frame, NA values have been filled in for the values from df1.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { } #mailpoet_form_3 form { margin-bottom: 0; } #mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; } #mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; } #mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; } #mailpoet_form_3 .mailpoet_checkbox { } #mailpoet_form_3 .mailpoet_submit { } #mailpoet_form_3 .mailpoet_divider { } #mailpoet_form_3 .mailpoet_message { } #mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; } #mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; } #mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Error in rbind(deparse.level, …) : numbers of columns of arguments do not match appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Error in rbind(deparse.level, …) : numbers of columns of arguments do not match

How to Create an Interaction Plot in R?

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Create an Interaction Plot in R? appeared first on Data Science Tutorials

How to Create an Interaction Plot in R?, To find out if the means of three or more independent groups that have been divided based on two factors differ, a two-way ANOVA is performed.

When we want to determine whether two distinct factors have an impact on a certain response variable, we employ a two-way ANOVA.

The interpretation of the link between the factors and the response variable may be affected, though, when there is an interaction effect between the two factors.

For instance, we might be interested in finding out how gender and exercise affect the response variable weight loss.

While both variables may have an impact on weight loss, it’s also likely that they will work in concert.

For instance, it’s likely that men and women lose weight through exercise at varying rates. Exercise and gender have an interaction effect in this situation.

An interaction plot is the most effective tool for spotting and comprehending the effects of interactions between two variables.

This particular plot style shows the values of the first factor on the x-axis and the fitted values of the response variable on the y-axis.

The values of the second component of interest are depicted by the lines in the plot.

An interaction plot in R can be made and read using the instructions in this tutorial.

Example: How to Create an Interaction Plot in R

Let’s say researchers want to know if gender and activity volume affect weight loss.

They enlist 30 men and 30 women to take part in an experiment where 10 of each gender are randomly assigned to follow a program of either no activity, light exercise, or severe exercise for one month in order to test this.

To see the interaction impact between exercise and gender, use the following procedures to generate a data frame in R, run a two-way ANOVA, and create an interactive graphic.

Step 1: Create the data.

The code below demonstrates how to make a data frame in R:

Make this illustration repeatable.

set.seed(123)

Now we can create a data frame

data <- data.frame(gender = rep(c("Male", "Female"), each = 30),                   exercise = rep(c("None", "Light", "Intense"), each = 10, times = 2),                   weight_loss = c(runif(10, -3, 3), runif(10, 0, 5), runif(10, 5, 9),                                   runif(10, -4, 2), runif(10, 0, 3), runif(10, 3, 8)))

Let’s view the first six rows of the data frame

head(data)  gender exercise weight_loss1   Male     None  -1.27453492   Male     None   1.72983083   Male     None  -0.54613854   Male     None   2.29810445   Male     None   2.64280376   Male     None  -2.7266610

Step 2: Fit the two-way ANOVA model.

How to apply a two-way ANOVA to the data is demonstrated in the code below:

two-way ANOVA model fit

model <- aov(weight_loss ~ gender * exercise, data = data)

Now view the model summary

summary(model)Df Sum Sq Mean Sq F value   Pr(>F)   gender           1   43.7   43.72  19.032 5.83e-05 ***exercise         2  438.9  219.43  95.515  < 2e-16 ***gender:exercise  2    2.9    1.46   0.634    0.535   Residuals       54  124.1    2.30                    ---Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Note that the p-value (0.535) for the interaction term between exercise and gender is not statistically significant, which indicates that there is no significant interaction effect between the two factors.

Step 3: Create the interaction plot.

The code below demonstrates how to make an interaction plot for gender and exercise.

interaction.plot(x.factor = data$exercise, #x-axis variable                 trace.factor = data$gender, #variable for lines                 response = data$weight_loss, #y-axis variable                 fun = median, #metric to plot                 ylab = "Weight Loss",                 xlab = "Exercise Intensity",                 col = c("pink", "blue"),                 lty = 1, #line type                 lwd = 2, #line width                 trace.label = "Gender")

Interaction plot in R

In general, there is no interaction effect if the two lines on the interaction plot are parallel. However, there will probably be an interaction effect if the lines cross.

As we can see in this plot, there is no intersection between the lines for men and women, suggesting that there is no interaction between the variables of exercise intensity and gender.

This is consistent with the p-value in the ANOVA table’s output, which indicated that the interaction term in the ANOVA model was not statistically significant.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { }#mailpoet_form_1 form { margin-bottom: 0; }#mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; }#mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; }#mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; }#mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; }#mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; }#mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; }#mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; }#mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; }#mailpoet_form_1 .mailpoet_checkbox { }#mailpoet_form_1 .mailpoet_submit { }#mailpoet_form_1 .mailpoet_divider { }#mailpoet_form_1 .mailpoet_message { }#mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; }#mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post How to Create an Interaction Plot in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Create an Interaction Plot in R?

How to Use the sweep Function in R?

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Use the sweep Function in R? appeared first on finnstats.

If you are interested to learn more about data science, you can find more articles here finnstats.

How to Use the sweep Function in R?, The sweep() function in R allows you to carry out various operations on a matrix’s rows or columns.

The basic syntax used by this function is as follows.

sweep(x, MARGIN, STATS, FUN)

where:

x: Name of the matrix

MARGIN: The margin on which to run the function (1 = rows, 2 = columns)

STATS: The value(s) to use in the function

FUN: The function to perform

How to Use the sweep Function in R

The sweep() function is used in many R settings in the examples that follow.

Example 1: Perform an operation on rows using sweep()

The sweep() function can be used to add a particular number to the values in each row of the matrix by using the following code.

Let’s define a matrix

mat <- matrix(1:15, nrow=5)

Now we can view the matrix

mat
     [,1] [,2] [,3]
[1,]    1    6   11
[2,]    2    7   12
[3,]    3    8   13
[4,]    4    9   14
[5,]    5   10   15

add specific numbers to each row

sweep(mat, 1, c(5, 10, 10, 10, 10), "+")
     [,1] [,2] [,3]
[1,]    6   11   16
[2,]   12   17   22
[3,]   13   18   23
[4,]   14   19   24
[5,]   15   20   25

Here’s how the sweep() function worked in this scenario.

5 was added to each value in the first row

10 was added to each value in the second row and so on.

Please take note that although we may have chosen to employ a different operation, we utilized addition (+) in this example.

For instance, the code below demonstrates how to multiply the values in each row by specific amounts.

multiply values in each row by a certain amount

sweep(mat, 1, c(2, 2, 2, 2, 2), "*")
     [,1] [,2] [,3]
[1,]    2   12   22
[2,]    4   14   24
[3,]    6   16   26
[4,]    8   18   28
[5,]   10   20   30

Example 2: Perform an operation on columns by using sweep()

The sweep() function can be used to increase the values in each column of the matrix by a certain amount as demonstrated by the following code.

Fill in specific figures in each column

sweep(mat, 2, c(1, 1, 1), "+")
      [,1] [,2] [,3]
[1,]    2    7   12
[2,]    3    8   13
[3,]    4    9   14
[4,]    5   10   15
[5,]    6   11   16

The sweep() method is inferred as follows in this case.

1 was added to each value in the first column.

1 was added to each value in the second column.

1 was added to each value in the third column.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { } #mailpoet_form_3 form { margin-bottom: 0; } #mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; } #mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; } #mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; } #mailpoet_form_3 .mailpoet_checkbox { } #mailpoet_form_3 .mailpoet_submit { } #mailpoet_form_3 .mailpoet_divider { } #mailpoet_form_3 .mailpoet_message { } #mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; } #mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; } #mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post How to Use the sweep Function in R? appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Use the sweep Function in R?

Charity R Workshops in support of Ukraine

$
0
0
[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Learn R (as well as Python and other tools for data analysis) and contribute to charity at the same time. 

Since April, we have been running a series of weekly workshops on R and other tools for data analysis, all proceeds from which go to support Ukraine. Our workshops cover topics for people with different prior levels of experience in R: from complete beginners to experienced users. All workshops are recorded, so you can register even if you are not able to attend in person! Introduction to R Shiny Our next workshop on R will take place on August 4th and will cover Introduction to R Shiny.  You will (a) learn how to set up basic statistical simulations, (b) learn how to create data-based applications, and (c) cover the nuts and bolts of the user interface in Shiny. To register you can donate 20 euros here or here and fill in this registration form, attaching the donation confirmation which will be emailed to you. If you are a student who is not able to pay the registration fee, you can sign up for the waiting list here. If you are not interested in this workshop but would like to support us, make a donation & support students in learning, you can sponsor a student by donating 20 euros per student here or here and filling in this form. You can read more about this workshop here. Introduction to Quarto We also will have a workshop on Introduction to Quarto on August 11th. You will learn how to create and publish documents using Quarto, a modern platform for creating professional articles, slide decks, websites, and other publications. By way of an introductory example, participants will be walked through the process of crafting and publishing their own personal professional website. To register you can donate 20 euros here or here  and fill in this registration form, attaching the donation confirmation which will be emailed to you. If you are a student who is not able to pay the registration fee, you can sign up for the waiting list here. If you are not interested in this workshop but would like to support us, make a donation & support students in learning, you can sponsor a student by donating 20 euros per student here or here and filling in this form. You can read more about this workshop here. Previous workshops If you make a donation, you can also get recordings and materials of any of our previous workshops. We have a wide range on workshops in R available from Introduction to R in Tidyverse and Data Visualization with ggplot to Text Data Analysis in R, Web Scraping in R and Introduction to Spatial Data Analysis in R. You can read more information about all of our past workshops and find out how to get access to the recordings and the materials here (scroll to ‘previous workshops’ section). More information You can find more information about any of our upcoming workshops here. You can also subscribe to our mailing list to get updates about the future workshops here. If you experience any issues with registration process or have any questions or suggestions, feel free to email me at dariia.mykhailyshyn2@unibo.it
Charity R Workshops in support of Ukraine was first posted on July 27, 2022 at 7:48 pm.
To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Charity R Workshops in support of Ukraine

Time to upskill in R? EARL’s workshop lineup has something for every data practitioner.

$
0
0
[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
It’s well-documented that data skills are in high demand, making the industry even more competitive for employers looking for experienced data analysts, data scientists and data engineers – the fastest-growing job roles in the UK. In support of this demand, it’s great to see the government taking action to address the data skills gap as detailed in their newly launched Digital Strategy. The range of workshops available at EARL 2022 is designed to help data practitioners extend their skills via a series of practical challenges. Led by specialists in Shiny, Purrr, Plumber, ML and time series visualisation, you’ll leave with tips and skills you can immediately apply to your commercial scenarios.

The EARL workshop lineup.

Time Series Visualisation in R. How does time affect our perception of data? Is the timescale important? Is the direction of time relevant? Sometimes cumulative effects are not visible with traditional statistical methods, because smaller increments stay under the radar. When a time component is present, it’s likely that the current state of our problem depends on the previous states. With time series visualisations we can capture changes that may otherwise go undetected. Find out more. Explainable Machine Learning. Explaining how your ML products make decisions empowers people on the receiving end to question and appeal these decisions. Explainable AI is one of the many tools you need to ensure you’re using ML responsibly. AI and, more broadly, data can be a dangerous accelerator of discrimination and biases: skin diseases were found to be less effectively diagnosed on black skin by AI-powered software, and search engines advertised lower-paid jobs to women. Staying away from it might sound like a safer choice, but this would mean missing out on the huge potential it offers. Find out more. Introduction to Plumber APIs. 90% of ML models don’t make it into production. With API building skills in your DS toolbox, you should be able to beat this statistic in your own projects. As the field of data science matures, much emphasis is placed on moving beyond scripts and notebooks and into software development and deployment. Plumber is an excellent tool to make the results from your R scripts available on the web. Find out more. Functional Programming with Purrr. Iteration is a very common task in Data Science. A loop in R programming is of course one option – but purrr (a package from the tidyverse) allows you to tackle iteration in a functional way, leading to cleaner and more readable code. Find out more. How to Make a Game with Shiny. Shiny is only meant to be used to develop dashboards, right? Or is it possible to develop more complex applications with Shiny? What would be the main limitations? Could R and Shiny be used as a general-purpose framework to develop web applications? Find out more. Sound interesting? Check out the full details – our workshops spaces traditionally go fast so get yourself and your team booked in while there are still seats available. Book your Workshop Day Pass tickets now.
Time to upskill in R? EARL’s workshop lineup has something for every data practitioner. was first posted on July 27, 2022 at 7:48 pm.
To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Time to upskill in R? EARL’s workshop lineup has something for every data practitioner.

How to Standardize Data in R?

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Standardize Data in R? appeared first on Data Science Tutorials

How to Standardize Data in R?, A dataset must be scaled so that the mean value is 0 and the standard deviation is 1, which is known as standardization.

The z-score standardization, which scales numbers using the following formula, is the most used method for doing this.

Two-Way ANOVA Example in R-Quick Guide – Data Science Tutorials

(xi – xbar) / s

where:

xi: The ith value in the dataset

xbar: The sample mean

s: The sample standard deviation

The examples below demonstrate how to scale one or more variables in a data frame using the z-score standardization in R by using the scale() function and the dplyr package.

Standardize just one variable

In a data frame containing three variables, the following code demonstrates how to scale just one of the variables.

library(dplyr)

Now make this example reproducible

set.seed(123)

Now let’s create an original data frame

df <- data.frame(var1= runif(10, 0, 50),
                 var2= runif(10, 2, 20),
                 var3= runif(10, 5, 30))

Now we can view the original data frame

df
        var1      var2      var3
1  14.378876 19.223000 27.238483
2  39.415257 10.160015 22.320085
3  20.448846 14.196271 21.012670
4  44.150870 12.307401 29.856744
5  47.023364  3.852644 21.392645
6   2.277825 18.196849 22.713262
7  26.405274  6.429579 18.601651
8  44.620952  2.757072 19.853551
9  27.571751  7.902573 12.228993
10 22.830737 19.181066  8.677841

scale var1 to have mean = 0 and standard deviation = 1

df2 <- df %>% mutate_at(c('var1'), ~(scale(.) %>% as.vector))
df2
         var1      var2      var3
1  -0.98619132 19.223000 27.238483
2   0.71268801 10.160015 22.320085
3  -0.57430484 14.196271 21.012670
4   1.03402981 12.307401 29.856744
5   1.22894699  3.852644 21.392645
6  -1.80732540 18.196849 22.713262
7  -0.17012290  6.429579 18.601651
8   1.06592790  2.757072 19.853551
9  -0.09096999  7.902573 12.228993
10 -0.41267825 19.181066  8.677841

You’ll notice that the other two variables didn’t change; only the first variable was scaled.

The new scaled variable has a mean value of 0, and a standard deviation of 1, as we can immediately confirm.

Bind together two data frames by their rows or columns in R (datasciencetut.com)

compute the scaled variable’s mean.

mean(df2$var1)
[1] 2.638406e-17 basically zero

calculate the scaled variable’s standard deviation.

sd(df2$var1)
[1] 1

Standardize Multiple Variables

Multiple variables in a data frame can be scaled simultaneously using the code provided below:

scale var1 and var2 to have mean = 0 and standard deviation = 1

df3 <- df %>% mutate_at(c('var1', 'var2'), ~(scale(.) %>% as.vector))
df3
       var1       var2      var3
1  -0.98619132  1.2570692 27.238483
2   0.71268801 -0.2031057 22.320085
3  -0.57430484  0.4471923 21.012670
4   1.03402981  0.1428686 29.856744
5   1.22894699 -1.2193121 21.392645
6  -1.80732540  1.0917418 22.713262
7  -0.17012290 -0.8041315 18.601651
8   1.06592790 -1.3958243 19.853551
9  -0.09096999 -0.5668114 12.228993
10 -0.41267825  1.2503130  8.677841

Standardize All Variables

Using the mutate_all function, the following code demonstrates how to scale each variable in a data frame.

scale all variables to have mean = 0 and standard deviation = 1

How to Rank by Group in R? – Data Science Tutorials

df4 <- df %>% mutate_all(~(scale(.) %>% as.vector))
df4
        var1       var2        var3
1  -0.98619132  1.2570692  1.09158171
2   0.71268801 -0.2031057  0.30768348
3  -0.57430484  0.4471923  0.09930665
4   1.03402981  0.1428686  1.50888235
5   1.22894699 -1.2193121  0.15986731
6  -1.80732540  1.0917418  0.37034828
7  -0.17012290 -0.8041315 -0.28496363
8   1.06592790 -1.3958243 -0.08543481
9  -0.09096999 -0.5668114 -1.30064291
10 -0.41267825  1.2503130 -1.86662844
.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post How to Standardize Data in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Standardize Data in R?

Posit – Why Rstudio is changing its name

$
0
0
[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
RStudio rebrand, name change to Posit

RStudio has officially announced a name change from RStudio to Posit

Why did RStudio change its name to Posit?

For the past few years, Posit (formerly RStudio) has been shifting from R-exclusive tooling to a language agnostic ecosystem. Much to our enjoyment, we’ve seen the RStudio IDE grow to be more Python-friendly and the Posit data science ecosystem become “A Single Home for R & Python.”

The name that was synonymous with open source R development is re-branding to better represent the business as a whole. 

So, what does this mean for RStudio product users? Well, besides a brighter future filled with more capabilities – not much in the short-term. There will be a rebranding of tools and commercial products:

  • RStudio Connect = Posit Connect
  • RStudio Workbench = Posit Workbench
  • RStudio Package Manager = Posit Package Manager

But overall, Posit will not be shifting away from R. So don’t worry, RStudio IDE will still be around and the leaders in open source R development aren’t slowing down.

In a press release by Sharon Machlis, Hadley Wickham, RStudio’s chief scientist, was quoted as saying, “We’re not pivoting from R to Python.” 

Hadley put us all at ease by continuing, “..I’m not going to stop writing R code” and later declaring “I’m not going to learn Python.”

And although RStudio is looking to balance out the share of engineers working on R with other development over time, the majority of work will remain R-related. 

What is Posit?

Posit, PBC is the new corporate name of the company formerly known as RStudio, PBC. It is a rebranding that reflects the expansion into Python and VS Code, among other things. 

The new name opens up the company to step out of its surficial typecasting as an R-only company. Posit can continue to grow the RStudio IDE while moving out of its beloved shadow that cemented ‘RStudio’ as the premier R IDE.

With the rebranding, Posit is moving faster toward what RStudio was already well on its way to becoming: 

The bridge to better data science.

With a new name and tantalizing plans for the future, that bridge is open for expansion and here at Appsilon, we are so excited. 

Posit’s services and commercial software are still the same, powerful tools as before; with the same support for more than just R. If your enterprise is looking to scale up and build better data products, Posit is the answer. 

Contact Appsilon, a Posit (RStudio) Certified Partner and reseller to learn more.

The post Posit – Why Rstudio is changing its name appeared first on Appsilon | Enterprise R Shiny Dashboards.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Posit – Why Rstudio is changing its name

An introductory workshop in Shiny, August 3rd to 5th

$
0
0
[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This workshop aims to introduce people with basic R knowledge to develop interactive web applications with the Shiny framework.

The course consists of a one-hour session, where we will demonstrate basic UI, reactive UI, CSS personalization and dashboard creation. Questions are super welcome!

The course will be delivered online using Zoom at different times.

  1. August 3rd from 17.30 to 18.30 https://www.buymeacoffee.com/pacha/e/81922
  2. August 4th from 17.30 to 18.30 https://www.buymeacoffee.com/pacha/e/81923
  3. August 5th from 17.30 to 18.30 https://www.buymeacoffee.com/pacha/e/81924

Check the timezone. For this workshop, it is New York Time (https://www.timeanddate.com/worldclock/usa/new-york)

Previous knowledge required: Basic R (examples: reading a CSV file, transforming columns and making graphs using ggplot2).

Finally, here’s a short demo of a part of what this workshop covers https://youtu.be/DW-HPfohfwg.

To leave a comment for the author, please follow the link and comment on their blog: Pachá.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: An introductory workshop in Shiny, August 3rd to 5th

Announcing Quarto, a new scientific and technical publishing system

$
0
0
[This article was first published on RStudio | Open source & professional software for data science teams on RStudio, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Today we’re excited to announce Quarto, a new open-source scientific and technical publishing system. Quarto is the next generation of R Markdown, and has been re-built from the ground up to support more languages and environments, as well as to take what we’ve learned from 10 years of R Markdown and weave it into a more complete, cohesive whole. While Quarto is a “new” system, it’s important to note that it’s highly compatible with what’s come before. Like R Markdown, Quarto is also based on Knitr and Pandoc, and despite the fact that Quarto does some things differently, most existing R Markdown documents can be rendered unmodified with Quarto. Quarto also supports Jupyter as an alternate computational engine to Knitr, and can also render existing Jupyter notebooks unmodified.

Some highlights and features of note:

  • Choose from multiple computational engines (Knitr, Jupyter, and Observable) which makes it easy to use Quarto with R, Python, Julia, Javascript, and many other languages.
  • Author documents as plain text markdown or Jupyter notebooks, using a variety of tools including RStudio, VS Code, Jupyter Lab, or any notebook or text editor you like. Publish high-quality reports, presentations, websites, blogs, books, and journal articles in HTML, PDF, MS Word, ePub, and more. Write with scientific markdown extensions, including equations, citations, crossrefs, diagrams, figure panels, callouts, advanced layout, and more.

Now is a great time to start learning Quarto as we recently released version 1.0, our first stable release after nearly two years of development. Get started by heading to https://quarto.org.

If you are a dedicated R Markdown user, fear not, R Markdown is by no means going away! See our FAQ for R Markdown Users or Yihui Xie’s blog post on Quarto for additional details on the future of R Markdown.

Below we’ll go into more depth on why we decided to create a new system as well as talk more about Quarto’s support for the Jupyter ecosystem.

Why a new system?

The goal of Quarto is to make the process of creating and collaborating on scientific and technical documents dramatically better. Quarto combines the functionality of R Markdown, bookdown, distill, xaringian, etc. into a single consistent system with “batteries included” that reflects everything we’ve learned from R Markdown over the past 10 years.

The number of languages and runtimes used for scientific discourse is very large and the Jupyter ecosystem in particular is extraordinarily popular. Quarto is, at its core, multi-language and multi-engine, supporting Knitr, Jupyter, and Observable today and potentially other engines tomorrow.

While R Markdown is fundamentally tied to R, which severely limits the number of practitioners it can benefit, Quarto is RStudio’s attempt to bring R Markdown to everyone! Unlike R Markdown, Quarto doesn’t require or depend on R. Quarto was designed to be multilingual, beginning with R, Python, Javascript, and Julia, with the idea that it will work even for languages that don’t yet exist.

While creating a new system has given us the opportunity for a fresh look at things, we have also tried to be as compatible as possible with existing investments in learning, content, and code. If you know R Markdown well, you already know Quarto well, and many of your documents are already compatible with Quarto.

Quarto and Jupyter

While the R community has mostly focused on plain text R Markdown for literate programming, the Python community has a very strong tradition of using Jupyter notebooks for interactive computing and the interweaving of narrative, code, and output. With Quarto we are hoping to bring what we’ve learned about publishing dynamic documents with R to the Jupyter ecosystem.

One compelling benefit of Quarto supporting both Knitr and Jupyter is that you can create websites and books that include content from both systems in a single project. Whether users prefer to author in plain markdown, computational markdown, or Jupyter notebooks, they can all contribute to the same project. Similarly, code written in R, Python, Julia, and other languages can co-exist in the same project. We believe that providing a common set of tools will facilitate collaboration and make it much easier to weave together contributions from diverse participants into a cohesive whole.

We also want to enable the many tools built around Jupyter to have access to state of the art scientific publishing capabilities. A great example of this is some recent work we’ve done with https://fast.ai to help integrate Quarto with the nbdev literate programming system. nbdev enables the development of Python libraries within Jupyter Notebooks, putting all code, tests and documentation in one place. In nbdev 2, library documentation written in notebooks can be used to automatically create a Quarto website for the library with a single function call.

Getting more involved with Jupyter as part of working on Quarto has been a great experience. We’re excited to do more with the Jupyter community and to continue supporting the ecosystem as a sponsor of NumFOCUS.

Learning more

Here are some resources that will help you learn more about Quarto:

We’re excited to begin the journey of making Quarto the very best scientific publishing system we can, and look forward to sharing many more developments in the months and years ahead.

To leave a comment for the author, please follow the link and comment on their blog: RStudio | Open source & professional software for data science teams on RStudio.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Announcing Quarto, a new scientific and technical publishing system

R Quarto Tutorial – How To Create Interactive Markdown Documents

$
0
0
[This article was first published on Tag: r - Appsilon | Enterprise R Shiny Dashboards, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
R Quarto Thumbnail

R Quarto is a next-gen version of R Markdown. The best thing is – it’s not limited to R programming language. It’s also available in Python, Julia, and Observable. In this R Quarto tutorial, we’ll stick with the most popular statistical language and create Markdown documents directly in RStudio.

With Quarto, you can easily create high-quality articles, reports, presentations, PDFs, books, Word documents, ePubs, and even entire websites. For example, the entire Hands-On Programming with R book by Garrett Grolemund is written in Quarto. Talk about scalability!

We’ll kick off today’s article by installing and configuring Quarto, and then we’ll dive into the good stuff.

R Markdown and PowerPoint presentations? Learn to create slideshows with R.

Table of contents:


How to Get Started with R Quarto

First things first, head over to their website and hit the big blue “Get Started” button. You’ll be presented with the following screen:

Image 1 - Getting started with R Quarto

Image 1 – Getting started with R Quarto

From there, download a CLI that matches your operating system – either Windows, Linux, or Mac. We’re on macOS, and Quarto CLI downloads as a .pkg file which installs easily with a couple of mouse clicks.

Once installed, launch RStudio (you can also use VSCode or any other text editor), and create a new text file, just as it’s shown below:

Image 2 - Creating a new text document in RStudio

Image 2 – Creating a new text document in RStudio

Important note: Make sure to give the .qmd extension to the Quarto file. We’ve named ours quarto.qmd, for reference.

Almost there! The final step is to install two R packages – tidyverse and palmerpenguins. Install them from the R console by running the following commands:

install.packages("tidyverse")install.packages("palmerpenguins")

And that’s it! You’re ready to create your first R Quarto document.

Render Code, Tables, and Charts with R Quarto

As with any Markdown document, a good practice is to include a YAML header to provide some context around your document. The entire header has to be demarcated by three dashes (---) on both ends.

The example below shows how to include the document title, author, and date to the YAML header of an R Quarto file:

---title: "Quarto Demo"author: "Appsilon"date: "2022-5-24"---

Once you write yours, hit the “Render” button to preview the document. It’s located in the toolbar just below the document name:

Image 3 - Toolbar for rendering R Quarto documents

Image 3 – Toolbar for rendering R Quarto documents

Here’s what the rendered document looks like on our end:

Image 4 - Rendered R Quarto document

Image 4 – Rendered R Quarto document

You can also check the “Render on Save” checkbox so you don’t have to manually render the document each time you make a change. Optionally, you can also render Quarto documents through the R console:

install.packages("quarto")quarto::quarto_render("notebook.Rmd")

We’ll stick to the first option. Now it’s time for the good stuff. We’ll embed code snippets right into the Markdown documents to show the MTCars dataset, both as a table and as a chart.

Render Table in Quarto

First, let’s add a new section and show the first six rows of the dataset by calling the head() function:

## MTCarsTable below shows the first six rows from the MTCars dataset```{r}head(mtcars)```

Here’s what the document looks like after rerendering:

Image 5 - Rendered R Quarto document (2)

Image 5 – Rendered R Quarto document (2)

Amazing! Both code and table are rendered with beautiful stylings. If you hover over the code block, you’ll see an option to copy it with a single click. That feature is particularly useful for longer code snippets, and you don’t have to lift a finger to implement it.

Render Chart with Quarto

Let’s also see how to render a chart with Quarto. The procedure is identical – just add the code – but there are a couple of things you can tweak. First, it’s a good practice to reference a figure. In Quarto, that’s done by writing @figure-name in the text and then assigning labels to the figure in the code. The other good practice is to add a caption to your figures. Sure, you can add it directly through ggplot2, but adding it through Quarto will automatically match the text to document styles.

Long story short, here’s how to add a scatter plot to a Quarto document:

@fig-mtscatter shows a relationship between `wt` and `mpg` features in the MTCars dataset.```{r}#| label: fig-mtscatter#| fig-cap: Weight of vehicle per 1000lbs (wt) vs. Miles/Gallon (mpg)library(ggplot2)ggplot(mtcars, aes(x = wt, y = mpg)) +     geom_point(color = "#0099f9", size = 5)```

And here’s what it looks like when rendered:

Image 6 - Rendered R Quarto document (3)

Image 6 – Rendered R Quarto document (3)

Are you new to data visualization in R? Here’s a complete guide to scatter plots.

And that’s how you can create a basic Quarto document! Next, let’s see how you can export it.

How to Export R Quarto Documents

There are numerous ways of exporting Quarto documents. Some options include HTML, ePub, Open Office, Web, Word, and PDF. We’ll show you have to work with the last two options, as these are the most common.

First, let’s address Microsoft Word. All you have to do is to add a couple of lines to the YAML header of the Markdown file. The example below shows how to add a table of contents with a custom title alongside some other options. Here’s a full reference manual, if you’re interested to learn more.

---title: "Quarto Demo"author: "Appsilon"date: "2022-5-24"format:    docx:        toc: true        toc-depth: 2        toc-title: Table of contents        number-sections: true        highlight-style: github---

After rendering, the MS Word document will be opened automatically. It’s also saved to the same directory where your .qmd document is located. Here’s what it looks like on our end:

Image 7 - Rendered Word document

Image 7 – Rendered Word document

It’s read-only by default, but you can always duplicate it to make changes. Onto the PDF now. Before starting, you’ll need to install a distribution of TeX. TinyTeX is recommended by Quarto authors, so that’s what we’ll use. Run this line from the shell:

quarto tools install tinytex

Once installed, modify the YAML header accordingly. Here’s an entire formatting reference.

---title: "Quarto Demo"author: "Appsilon"date: "2022-5-24"format:    pdf:        toc: true        toc-depth: 2        toc-title: Table of contents        number-sections: true        highlight-style: github---

As you can see, the only change you had to do is to replace docx with pdf– all arguments are identical. Here’s what the document looks like:

Image 8 - Rendered PDF document

Image 8 – Rendered PDF document

And that’s how you can export R Quarto Markdown files to Word and PDF documents. As mentioned, other options are available, but we won’t cover them today.


Summing up the R Quarto tutorial

Today you’ve learned the basics of creating highly-customizable Markdown documents in R Quarto. The package is a breeze to work with, and the documentation is easy to get around with.

If you’re developing packages for R, Python, or Julia, there’s no reason not to use Quarto to create amazing documentation around your product. It’s much more capable than other alternatives, and also has all the exporting options you’ll need.

Thinking about a career in R and R Shiny? Here’s everything you need to know to land your first job.

If you decide to give R Quarto a try, make sure to let us know. Share your Markdown documents with us on Twitter – @appsilon. We’d love to see what you come up with.

The post R Quarto Tutorial – How To Create Interactive Markdown Documents appeared first on Appsilon | Enterprise R Shiny Dashboards.

To leave a comment for the author, please follow the link and comment on their blog: Tag: r - Appsilon | Enterprise R Shiny Dashboards.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Quarto Tutorial – How To Create Interactive Markdown Documents

How to Set Axis Limits in ggplot2?

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Set Axis Limits in ggplot2? appeared first on finnstats.

If you are interested to learn more about data science, you can find more articles here finnstats.

How to Set Axis Limits in ggplot2?, ggplot2 can frequently be used to set the axis bounds on a plot. The following functions make it simple to accomplish this:

xlim(): defines the x-axis’s lowest and upper limits.

ylim(): defines the y-axis’s lower and upper limits.

Keep in mind that both of these approaches will eliminate data that is outside of the restrictions, which occasionally has unforeseen results.

Use coord_cartesian() instead to alter the axis bounds without losing data observations.

The coord_cartesian() function provides the x- and y-axis bounds without removing observations.

How to Set Axis Limits in ggplot2

Using the scatterplot below, which was created using the built-in R dataset mtcars, this tutorial demonstrates various uses for these functions.

Let’s load the library ggplot2

library(ggplot2)

Now we can create a simple scatterplot

ggplot(mtcars, aes(mpg, wt)) +  geom_point()

Example 1: Set X-Axis Limits Using xlim()

Using the xlim() method, the scatterplot’s x-axis boundaries can be defined as seen in the following code:

make a scatterplot with an x-axis of 10 to 40.

ggplot(mtcars, aes(mpg, wt)) +  geom_point() +  xlim(10, 40)

Additionally, you can use NA to merely provide the upper limit of the x-axis and let ggplot2 determine the lower limit for you.

Now we can create a scatterplot with x-axis upper limit at 40

ggplot(mtcars, aes(mpg, wt)) +  geom_point() +  xlim(NA, 40)

Example 2: Set Y-Axis Limits Using ylim()

Using the ylim() method, the scatterplot’s y-axis boundaries can be defined as seen in the following code.

make a scatterplot using a 2 to 4 y-axis.

ggplot(mtcars, aes(mpg, wt)) +  geom_point() +  ylim(2, 4)
Warning message:Removed 8 rows containing missing values (geom_point).

Now let’s try to create a scatterplot with y-axis lower limit at 2

ggplot(mtcars, aes(mpg, wt)) +  geom_point() +  xlim(2, NA)

Example 3: Set Axis Limits Using coord_cartesian()

The coord_cartesian() method is used to set the scatterplot’s y-axis bounds as shown in the following code.

generate a scatterplot with a 2–4 y-axis.

ggplot(mtcars, aes(mpg, wt)) +  geom_point() +  coord_cartesian(xlim =c(10, 30), ylim = c(2, 4))
.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { }#mailpoet_form_3 form { margin-bottom: 0; }#mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; }#mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; }#mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; }#mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; }#mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; }#mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; }#mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; }#mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; }#mailpoet_form_3 .mailpoet_checkbox { }#mailpoet_form_3 .mailpoet_submit { }#mailpoet_form_3 .mailpoet_divider { }#mailpoet_form_3 .mailpoet_message { }#mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; }#mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; }#mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

ggplot2 Guide

If you are interested to learn more about data science, you can find more articles here finnstats.

The post How to Set Axis Limits in ggplot2? appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Set Axis Limits in ggplot2?

How to convert characters from upper to lower case in R?

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to convert characters from upper to lower case in R? appeared first on Data Science Tutorials

How to convert characters from upper to lower case in R?. This article discusses how to change a character’s case in R from upper to lower and vice versa.

Will include examples for the R methods tolower(), toupper(), casefold(), and chartr() throughout the lesson.

How to convert characters from upper to lower case or vice versa in R?

We must first generate a character string in R before proceeding to the examples. We’ll use the character string “Datasciencetut.com” throughout this tutorial:

Let’s create an example character string

string<- "Datasciencetut.com"
string
[1] "Datasciencetut.com"

Example 1: tolower & toupper R Functions

Will describe how to utilize the tolower and toupper R functions in the first example.

Statistical test assumptions and requirements – Data Science Tutorials

With the tolower command, we may change every character in our string to lower case:

tolower(string)                          
[1] "datasciencetut.com"

Contrarily, the toupper command is used to change every character to upper case.

toupper(string)                               
[1] "DATASCIENCETUT.COM"

Example 2: casefold Function

The translation to tiny or uppercase letters is also possible with the casefold R function.

If we use the casefold function with the upper = FALSE (default option) argument, our characters are changed to lower case.

casefold(string, upper = FALSE)            
[1] "datasciencetut.com"

…or to upper case if we specify upper = TRUE:

casefold(string, upper = TRUE)                
[1] "DATASCIENCETUT.COM"

In actuality, the casefold function is identical to the tolower and toupper functions.

Best Books to learn Tensorflow – Data Science Tutorials

Example 3: chartr Function

An whole character string can be converted to lowercase or uppercase using the tolower, toupper, and casefold methods.

The chartr function can be used to convert some characters to lower case and others to upper case:

chartr(old = "Datasciencetut.com", new = "DataScienceTut.COM", string)    
[1] "DataSCienCetut.COM"

A new character pattern or an old character pattern can both be specified using the chartr function. The previous pattern is subsequently replaced by the new one.

One way ANOVA Example in R-Quick Guide – Data Science Tutorials

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post How to convert characters from upper to lower case in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to convert characters from upper to lower case in R?
Viewing all 12095 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>