Quantcast
Channel: R-bloggers
Viewing all 12098 articles
Browse latest View live

Great Britain Railway Network

$
0
0
[This article was first published on R - datawookie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
The {blimey} logo.

Introducing the nascent R package {blimey} (repository). At this stage it contains only the following data:

  • railways— latitude and longitude segments along railway lines (wide format);
  • railways_pivot— latitude and longitude segments along railway lines (long format); and
  • railway_stations— codes, names and locations of railway stations.

Load a couple of core packages.

library(dplyr)library(ggmap)

Load the {blimey} package.

library(blimey)

Let’s take a look at the railways data.

head(railways)# A tibble: 6 × 7    fid elr    trid lat_start lat_end lon_end lon_start  <dbl> <chr> <dbl>     <dbl>   <dbl>   <dbl>     <dbl>1     0 MLN3   1100      50.3    50.3   -5.06     -4.832     1 SDS    3100      50.3    50.3   -4.83     -4.833     2 MLN3   1600      50.3    50.3   -5.07     -5.074     3 MLN4   3700      50.2    50.2   -5.45     -5.455     4 MLN4   3601      50.1    50.1   -5.53     -5.536     5 MLN4   1100      50.2    50.1   -5.50     -5.45

We’re going to plot those data out on a map. Define the bounding box for the map.

bb <- c(left = -8, bottom = 49.80, right = 3, top = 59.50)

Use {ggmap} to load Stamen Map tiles for the toner lite map.

map_toner <- get_stamenmap(bb, zoom = 8, maptype = "toner-lite")

Define some colours for plotting the lines and stations.

COLOUR_MUTED_BLUE <- "#1f77b4"COLOUR_SAFETY_ORANGE <- "#ff7f0e"

Create a function which will tweak the map appearance.

theme_map <- function(plot) {  plot +   coord_map() +  scale_x_continuous(expand = expansion(0, 0)) +  scale_y_continuous(expand = expansion(0, 0)) +  theme(    legend.position = "none",    axis.title = element_blank(),    axis.text = element_blank(),    axis.ticks = element_blank(),    text = element_text(size = 12)  )}

Show the railway line segments over the toner lite map.

map <- ggmap(map_toner, extent = "normal") +  geom_segment(    data = railways,    aes(      x = lon_start,      y = lat_start,      xend = lon_end,      yend = lat_end,    ),    size = 1, alpha = 0.5, col = COLOUR_MUTED_BLUE  )(map <- theme_map(map))

Now add in the railway stations.

(ggmap(map_toner, extent = "normal") +  geom_segment(    data = railways,    aes(      x = lon_start,      y = lat_start,      xend = lon_end,      yend = lat_end    ),    size = 2, alpha = 0.5, col = COLOUR_MUTED_BLUE  ) +  geom_point(    data = railway_stations,    aes(      x = lon,      y = lat    ),    size = 2, alpha = 0.5, col = COLOUR_SAFETY_ORANGE  )) %>%  theme_map()

The maps are fairly large. Open them in a separate tab so that you can zoom in on the details.

To leave a comment for the author, please follow the link and comment on their blog: R - datawookie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Great Britain Railway Network

Mapping a marathon with {rStrava}

$
0
0
[This article was first published on R on Nicola Rennie, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After a long run in the Forest of Bowland when visiting Lancaster for a few days, I decided to try out the {rStrava} package to make some maps of where I’d been. This tutorial blog will walk through the process of getting the data from Strava, making the map, and animating it with {gganimate}.

Last week's route mapped out in #rstats using {rStrava} and {gganimate}🏃‍♀️🏃‍♀️🏃‍♀️https://t.co/O510G8E6hrpic.twitter.com/WgUAFV6eFA

— Nicola Rennie (@nrennie35) July 15, 2022

What is {rStrava}?

Strava is a an app for tracking physical activities, mostly used for running and cycling. The {rStrava} package lets you access data through the Strava API.

Since {rStrava} is available on CRAN, you can install it in the usual way using:

install.packages("rStrava")library(rStrava)

There are two levels of data you can get using {rStrava} depending on whether or not you have an authentication token. If you don’t have a token, you can access some basic summary information on athletes with public profiles and their recent activities. You don’t even need a Strava account to access this data. You can see which functions don’t require a token by running:

help.search('notoken', package = 'rStrava')

However, if you want detailed data on your activities or if your profile is private (like mine) you need an authentication token to get the data into R. To get an authentication token you do need a Strava account of your own. The instructions on the {rStrava} README are pretty easy to follow to set up your authentication token.

Setting up an authentication token

On the Strava website, in the settings, you can make an API application. There are three pieces of information you need to fill out:

  • Application Name: the name of your app (this can be almost anything). I used the blog title.
  • Website: Must be a valid URL, but can otherwise be pretty much anything.
  • Authorization Callback Domain: change to localhost or any domain. If deploying an app that uses the Strava API, you’ll need to update this.

After you click “Create”, you’ll be prompted to upload an icon (can be any image), and this will generate a token for you.

Now, you need to add this token into R. You can do this using the config() function from {httr}, and the strava_oauth() function from {rStrava}. The strava_oauth() function needs four pieces of information, all provided as character strings.

strava_token <- httr::config(token = strava_oauth(app_name,                                                  app_client_id,                                                  app_secret,                                                  app_scope = "activity:read_all"))

The app_name is the name you gave to the app when making your token on the Strava website. The app_client_id and app_secret were generated after you clicked “Create” on the Strava website, and you can simply pass these in. You will also perhaps want to change the app_scope argument. By default, this is set to "public", but you may want to get information on your activities which are not public. You can save the token as a variable, to pass into the {rStrava} functions. I’ve called it strava_token.

Reading in the data

With the authentication token set you can now begin to get data into R, directly from the Strava API. First of all, I grabbed the data on my activities using the get_activity_list() function, for which I need to pass in my Strava token. I then use the get_activity_streams() function to get detailed information on a specific activity. Here the id is the activity id i.e., the number that comes at the end of the URL string for the activity: https://www.strava.com/activities/{id}.

my_acts <- get_activity_list(strava_token) id = {id}strava_data <- get_activity_streams(my_acts,                                    strava_token,                                    id = id)

This is what the output of strava_data looks like:

  altitude cadence distance grade_smooth heartrate      lat       lng moving time velocity_smooth         id1     24.8      84   0.0027          2.0       105 54.04575 -2.798552  FALSE    0          0.0000 74192251872     24.9      85   0.0066          1.3       112 54.04572 -2.798607   TRUE    3          4.6548 74192251873     24.9      85   0.0078          1.0       117 54.04572 -2.798626   TRUE    4          4.5180 74192251874     24.9      86   0.0078          0.8       117 54.04570 -2.798638  FALSE    5          3.6144 74192251875     24.9      86   0.0102          0.9       118 54.04567 -2.798653   TRUE    6          4.4892 74192251876     24.9      85   0.0130          1.1       119 54.04565 -2.798678   TRUE    7          5.2812 7419225187

There are some nice built-in mapping functions in {rStrava} that I recommend checking out, but since I’m going to build my own here, I don’t need to use {rStrava} again. I saved the data as a CSV file so that I could go back and work on it again without having to re-download it using {rStrava}.

write.csv(strava_data, "strava_data.csv", row.names = F)

Data wrangling

The data the comes out of the get_activity_streams() function is already very clean, so the data wrangling for this example is very minimal. In fact, I only used two functions, neither of which was really necessary. I converted the data frame to a tibble using as_tibble() because I prefer working with tibbles. Since all the data is for a single activity in this case, the id column is a bit redundant so I also used select() from {dplyr} to remove the id column.

library(tidyverse)strava_data %>%   as_tibble() %>%   select(-id)

Background maps

Now it’s finally time to start building a map! Here, I loaded the rest of the R packages I’ll be using for mapping and animating.

library(sf)library(ggmap)library(osmdata)library(rcartocolor)library(gganimate)

Here, {sf} isn’t technically necessary but useful if you want to make a geometry object in R (more on that later). {ggmap} and {osmdata} are used for creating a background map. {ggplot2} has already been loaded eariler with the rest of the tidyverse, and along with {rcartocolor} for a nice colour scheme, this will plot the main map. Then, {gganimate} is used for animating the map.

Before I actually mapped my run, I wanted to get a background map. I used the getbb() (bounding box) function from {osmdata} to get the approximate coordinates around where I started my run using the place name as input.

getbb("Lancaster, UK")        min       maxx -2.983647 -2.458735y 53.918066 54.239557

I then played around to get the exact rectangle I wanted, and specified it manually. Now, bb specifies the minimum and maximum latitude and longitude of where my background map should cover.

bb <- matrix(c(-2.9, -2.53, 53.95, 54.10),              ncol = 2,              nrow = 2,             byrow = TRUE,             dimnames = list(c("x", "y"), c("min", "max")))

This bounding box can be passed into get_map() from {ggmap} to get the background map. By default, {ggmap} uses Google Maps, for which an API key is required. Setting the source = "stamen" means that you don’t have to register a Google API key. You can also choose a maptype, and here I chose "toner-hybrid". I’d recommend playing around with the different types to see which one you like - use ?get_map() for a list of options. You can also choose whether or not you want a colour or black and white background. I opted for a black and white ("bw") background map, as I later found it difficult to get enough contrast between my data points and the background map otherwise.

bg_map <- get_map(bb,                  source = "stamen",                  maptype = "toner-hybrid",                   color = "bw")

The background map can be visualised using ggmap().

ggmap(bg_map)

Overlaying the activity data

I’m simply going to use {ggplot2} to overlay the data in strava_data on top of my background map. Using {ggplot2}, there are (at least) two different ways we could add the data: either using geom_point() or geom_sf(). We’ll start with geom_point().

g <- ggmap(bg_map) +     geom_point(data = strava_data,                inherit.aes = FALSE,                aes(x = lng,                     y = lat,                     colour = altitude),                  size = 1)

Here, we specify strava_data as the data argument in geom_point(). Note that there is no ggplot() call here, as it’s hidden inside the ggmap() function. Therefore, we also want to specify inherit.aes = FALSE to make sure that the hidden aesthetics carried through by ggmap() don’t interfere with our point data. I specify the x and y coordinates as the longitude and latitude, respectively, and colour the points based on the altitude. I also played around with the size of the points until it looked the way I wanted it to. Note that, alternatively you could use geom_line() in exactly the same way.

Since, longitude and latitude are geographic data, it may make sense to instead convert them to a geometry object using the {sf} package. This may be necessary if your background map and coordinate data use different coordinate systems. In this case, it doesn’t actually matter. But I’ll show you anyway, just in case you need it. First, we convert our strava_data tibble into an sf object using st_as_sf(). We also specify which columns from strava_data are the longitude and latitude, with the longitude column coming first. We set the coordinate reference system (crs) as 4326 to match the coordinate system used. Setting remove = FALSE also keeps the original latitude and longitude columns in the tibble, even after converting to an sf object.

strava_sf <- st_as_sf(strava_data,                      coords = c("lng", "lat"),                      crs = 4326,                      remove = FALSE)

The strava_sf object is now an sf object so it can be used with geom_sf() instead of geom_point(). Here, we don’t need to specify the x and y aesthetics as they are automatically detected from the sf object. You may get a Coordinate system already present. Adding new coordinate system, which will replace the existing one. warning. This is because geom_sf() and ggmap() are both trying to set the (same) coordinate system.

g <- ggmap(bg_map) +     geom_sf(data = strava_sf,             inherit.aes = FALSE,             aes(colour = altitude),              size = 1) g

The maps returned using geom_point() and geom_sf() are essentially the same in this case.

Styling the map

The inital map looks okay, but we can add some styling to make it look better. I’m a big fan of {rcartocolor} for colour palettes. I can get the hex codes of the "SunsetDark" palette, and use the same hex codes for the title font later.

my_colors <- carto_pal(7, "SunsetDark")my_colors

I change the colour of my points using scale_colour_carto_c() from {rcartocolor}, and change the title that appears in the legend at the same time. I also add a caption using the labs() function. Finally, I edit the theme. The theme_void() function is really useful for maps because it removes most of the theme elements which aren’t very useful on maps like this e.g. axis labels, axis ticks, grid lines. I use the theme() function to bring the legend and the plot caption (used as a title here) inside the plot area. This create a little bit of white space at the bottom of the plot, so I remove it using plot.margin. I also edit the colour and size of the caption text.

g <- g +   scale_colour_carto_c(name = "Altitude (m)", palette = "SunsetDark") +  labs(caption = "Lancaster - Forest of Bowland ") +  theme_void() +  theme(legend.position = c(0.85, 0.7),         legend.title = element_text(face = "bold", hjust = 0.5),         plot.caption = element_text(colour = "#dc3977", face = "bold", size = 16,                                     vjust = 10),         plot.margin = unit(c(0, 0, -0.75, 0), unit = "cm"))g

Animating with {gganimate}

I was pretty happy with the final static image, but why not animate it? {gganimate} makes it really easy to animate ggplot objects. For this example, I’d strongly recommend using the geom_point() version of the map.

g <- ggmap(bg_map) +  geom_point(data = strava_data,          inherit.aes = FALSE,          aes(colour = altitude,               x = lng,               y = lat),           size = 1) +   scale_colour_carto_c(name = "Altitude (m)", palette = "SunsetDark") +  labs(caption = "Lancaster - Forest of Bowland ") +  theme_void() +  theme(legend.position = c(0.85, 0.7),         axis.title = element_blank(),         legend.title = element_text(face = "bold", hjust = 0.5),         plot.caption = element_text(colour = "#dc3977", face = "bold", size = 16,                                     vjust = 10),         plot.margin = unit(c(0, 0, -0.75, 0), unit = "cm"))

Although you can animate plots with sf data using {gganimate}, it’s a little bit trickier and it takes longer to render. So why not make our lives a little easier? There are two functions we need to animate our map:

  • transition_time() specifies which variable in strava_data we want to animate over.
  • shadow_mark() means the animation plots points cumulatively over time rather than just plotting a single point for each time.
g = g +   transition_time(time = time) +  shadow_mark()

The animate() function then actually builds the animation. Usually renderer = gifski_renderer() should be the default, but I kept getting individual images instead of a gif unless I specified it manually - to investigate later. Here, I also specified the width and height (using a little bit of trial and error to avoid white space caused by the fixed ratio from ggmap()). anim_save() then saves the gif to a file (analogously to ggsave() from {ggplot2}).

animate(g, renderer = gifski_renderer(), height = 372, width = 538, units = "px")anim_save("mapping_marathon.gif")

And that’s it! You now have an animated map of your Strava recorded run (or cycle, or walk, or …)! If you want to create a map of your own, you can find the R code used in this blog on my website. Thanks very much to the creators of {rStrava} for such an easy to use package!

Tweet

To leave a comment for the author, please follow the link and comment on their blog: R on Nicola Rennie.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Mapping a marathon with {rStrava}

R Read and Write xlsx Files

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An xlsx is a file format used for Microsoft Excel spreadsheets. Excel can be used to store tabular data.

R has built-in functionality that makes it easy to read and write an xlsx file.


Sample xlsx File

To demonstrate how we read xlsx files in R, let's suppose we have an excel file named studentinfo.xlsx with the following data:

We will be reading these data with the help of R's built-in functions.


Install and Load xlsx Package

In order to read, write, and format Excel files into R, we first need to install and load the xlsx package as:

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

Here, we have successfully installed and loaded the xlsx package.

Now, we are able to read data from an xlsx file.


Read an xlsx File in R

In R, we use the read.xlsx() function to read an xlsx file available in our current directory. For example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# read studentinfo.xlsx file from our current directory
read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
5    Sammy      20     Economics        Ohio
6    Pam        21     Arts             Arizona

In the above example, we have read the studentinfo.xlsx file that is available in our current directory. Notice the code,

read_data <- read.xlsx("studentinfo.xlsx", sheetIndex = 1)

Here,

  • read.xlsx() - reads the xlsx file studentinfo.xlsx and creates a dataframe that is stored in the read_data variable.
  • sheetIndex = 1 - reads specified worksheet i.e. 1

Note:

  • If the file is in some other location, we have to specify the path along with the file name as: read.xlsx("D:/folder1/studentinfo.xlsx", sheetIndex = 1).
  • We can also use the read.xlsx2() function if the dataset we are working on is larger.

xlsx rowIndex and colIndex Argument in R

In R, we can also read a specific range of data from excel files. We can pass the rowIndex and colIndex argument inside read.xlsx() to read specific range.

  • rowIndex - reads a specific range of rows
  • colIndex - read a specific range of columns

Example: Read Range of Rows

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# read first five rows of xlsx file
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  rowIndex = 1:5
)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas

In the above example, we have passed rowIndex = 1:5 inside read.xlsx() so the function reads only the first five rows from the studentinfo.xlsx file.

Example: Read Range of Columns

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# read first three columns of xlsx file
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  colIndex = 1:3
)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           
1    Abby       24     Business         
2    Hazzle     23     Engineering   
3    Cathy      20     Engineering    
4    Paterson   22     Arts                
5    Sammy      20     Economics      
6    Pam        21     Arts                

Here, colIndex = 1:3 inside read.xlsx() reads only the first three columns from the studentinfo.xlsx file.


xlsx startRow Argument in R

Sometimes the excel file may contain headers at the beginning that we may not want to include. For example,

Here, the 1st Row of the excel file contains a header, and the 2nd row is empty. So we don't want to include these two rows.

To start reading data from a specific row in the excel worksheet, we pass the startRow argument inside read.xlsx().

Let's take a look at an example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# start reading from 3rd row
read_data <- read.xlsx("studentinfo.xlsx", 
  sheetIndex = 1,
  startRow = 3
)

# display xlsx file
print(read_data)

Output

      Name      Age   Faculty           State
1    Abby       24     Business         Florida
2    Hazzle     23     Engineering      Arizona
3    Cathy      20     Engineering      Colorado
4    Paterson   22     Arts             Texas
5    Sammy      20     Economics        Ohio
6    Pam        21     Arts             Arizona

In the above example, we have used the startRow argument inside the read.xlsx() function to start reading from the specified row.

startRow = 3 means the first two rows are ignored and read.xlsx() starts reading data from the 3rd row.


Write Into xlsx File in R

In R, we use the write.xlsx() function to write into an xlsx file. We pass the data in the form of dataframe. For example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))

# write dataframe1 into file1 xlsx file
write.xlsx(dataframe1, "file1.xlsx")

In the above example, we have used the write.xlsx() function to export a data frame named dataframe1 to a xlsx file. Notice the arguments passed inside write.xlsx(),

write.xlsx(dataframe1, "file1.xlsx")

Here,

  • dataframe1 - name of the data frame we want to export
  • file1.xlsx - name of the xlsx file

Finally, the file1.xlsx file would look like this in our directory:


Rename Current Worksheet

We can rename the current worksheet by using the sheetName argument inside the write.xlsx() function. For example,

# install xlsx package 
install.package("xlsx")

# load xlsx file
library("xlsx")

# create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))

# name current worksheet
write.xlsx(dataframe1, "file1.xlsx",
  sheetName = "Voting Eligibility"
)

Here, we have passed sheetname = "Voting Eligibility" inside write.xlsx(), so the name of the sheet is changed to "Voting Eligibility".

So the file1.xlsx looks like this:

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Read and Write xlsx Files

R Matrix

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A matrix is a two-dimensional data structure where data are arranged into rows and columns. For example,

{IMAGE: 2 * 3 matrix with integer data}

Here, the above matrix is 2 * 3 (pronounced “two by three”) matrix because it has 2 rows and 3 columns.


Create a Matrix in R

In R, we use the matrix() function to create a matrix.

The syntax of the matrix() function is

matrix(vector, nrow, ncol)

Here,

  • vector – the data items of same type
  • nrow– number of rows
  • ncol– number of columns
  • byrow (optional) – if TRUE, the matrix is filled row-wise. By default, the matrix is filled column-wise.

Let's see an example,

# create a 2 by 3 matrix
matrix1 <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)

print(matrix1)

Output

    [,1] [,2] [,3]
[1,]    1    2    3
[2,]    4    5    6

In the above example, we have used the matrix() function to create a matrix named matrix1.

matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3, byrow = TRUE)

Here, we have passed data items of integer type and used c() to combine data items together. And nrow = 2 and ncol = 3 means the matrix has 2 rows and 3 columns.

Since we have passed byrow = TRUE, the data items in the matrix are filled row-wise. If we didn't pass byrow argument as

matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)

The output would be

    [,1]  [,2] [,3]
[1,]    1    3    5
[2,]    2    4    6

Access Matrix Elements in R

We use the vector index operator [ ] to access specific elements of a matrix in R.

The syntax to access a matrix element is

matrix[n1, n2]

Here,

  • n1 - specifies the row position
  • n2 - specifies the column position

Let's see an example,

matrix1 <- matrix(c("Sabby", "Cathy", "Larry", "Harry"), nrow = 2, ncol = 2)

print(matrix1)

# access element at 1st row, 2nd column
cat("\nDesired Element:", matrix1[1, 2])

Output

      [,1]          [,2]   
[1,] "Sabby" "Larry"
[2,] "Cathy" "Harry"

Desired Element: Larry

In the above example, we have created a 2 by 2 matrix named matrix1 with 4 string type datas. Notice the use of index operator [],

matrix1[1, 2]

Here, [1, 2] specifies we are trying to access element present at 1st row, 2nd column i.e. "Larry".

Access Entire Row or Column

In R, we can also access the entire row or column based on the value passed inside [].

  • [n, ] - returns the entire element of the nth row.
  • [ ,n] - returns the entire element of the nth column.

For example,

matrix1 <- matrix(c("Sabby", "Cathy", "Larry", "Harry"), nrow = 2, ncol = 2)

print(matrix1)

# access entire element at 1st row
cat("\n1st Row:", matrix1[1, ])

# access entire element at 2nd column
cat("\n2nd Column:", matrix1[, 2])

Output

       [,1]         [,2]   
[1,] "Sabby" "Larry"
[2,] "Cathy" "Harry"

1st Row: Sabby Larry
2nd Column: Larry Harry

Here,

  • matrix1[1, ] - access entire elements at 1st row i.e. Sabby and Larry
  • matrix1[ ,2] - access entire elements at 2nd row i.e. Larry and Harry

Access More Than One Row or Column

We can access more than one row or column in R using the c() function.

  • [c(n1,n2), ] - returns the entire element of n1 and n2 row.
  • [ ,c(n1,n2)] - returns the entire element of n1 and n2 column.

For example,

# create 2 by 3 matrix
matrix1 <- matrix(c(10, 20, 30, 40, 50, 60), nrow = 2, ncol = 3)

print(matrix1)

# access entire element of 1st and 3rd row
cat("\n1st and 2nd Row:", matrix1[c(1,3), ])

# access entire element of 2nd and 3rd column
cat("\n2nd and 3rd Column:", matrix1[  ,c(2,3)])

Output

       [,1] [,2]  [,3]
[1,]   10   30   50
[2,]   20   40   60

1st and 3rd Row: 10 20 30 40 50 60
2nd and 3rd Column: 30 40 50 60

Here,

  • [c(1,3), ] - returns the entire element of 1st and 3rd row.
  • [ ,c(2,3)] - returns the entire element of 2nd and 3rd column.

Modify Matrix Element in R

We use the vector index operator [] to modify the specified element. For example,

matrix1[1,2] = 140

Here, the element present at 1st row, 2nd column is changed to 140.

Let's see an example,

# create 2 by 2 matrix
matrix1 <- matrix(c(1, 2, 3, 4), nrow = 2, ncol = 2)

# print original matrix
print(matrix1)

# change value at 1st row, 2nd column to 5 
matrix1[1,2] = 5

# print updated matrix
print(matrix1)

Output

    [,1] [,2]
[1,]    1    3
[2,]    2    4

     [,1] [,2]
[1,]    1    5
[2,]    2    4

Combine Two Matrices in R

In R, we use the cbind() and the rbind() function to combine two matrices together.

  • cbind() - combines two matrices by columns
  • rbind() - combines two matrices by rows

The number of rows and columns of two matrices we want to combine must be equal. For example,

# create two 2 by 2 matrices 
even_numbers <- matrix(c(2, 4, 6, 8), nrow = 2, ncol = 2)
odd_numbers <- matrix(c(1, 3, 5, 7), nrow = 2, ncol = 2)

# combine two matrices by column
total1 <- cbind(even_numbers, odd_numbers)
print(total1)

# combine two matrices by row
total2 <- rbind(even_numbers, odd_numbers)
print(total2)

Output

    [,1] [,2] [,3] [,4]
[1,]    2    6    1    5
[2,]    4    8    3    7
     [,1] [,2]
[1,]    2    6
[2,]    4    8
[3,]    1    5
[4,]    3    7

Here, first we have used the cbind() function to combine the two matrices: even_numbers and odd_numbers by column. And rbind() to combine two matrices by row.


Check if Element Exists in R Matrix

In R, we use the %in% operator to check if the specified element is present in the matrix or not and returns a boolean value.

  • TRUE - if specified element is present in the matrix
  • FALSE - if specified element is not present in the matrix

For example,

matrix1 <- matrix(c("Sabby", "Cathy", "Larry", "Harry"), nrow = 2, ncol = 2)

"Larry" %in% matrix1 # TRUE

"Kinsley" %in% matrix1 # FALSE

Output

TRUE
FALSE

Here,

  • "Larry" is present in matrix1, so the method returns TRUE
  • "Kinsley" is not present in matrix1, so the method returns FALSE

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Matrix

Part 2 of 3: 300+ milestone for Big Book of R

$
0
0
[This article was first published on R programming – Oscar Baruffa, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is part 2 of a 3 part series highlighting a selection of 35 new entries into Big Book of R.

Read part 1.

The site now has well over 300 free R programming titles.

Onto the second batch of new books! Sign up to the newsletter for the next update happening in a few days.


Handling Strings With R

by Gaston Sanchez

Handling character strings in R? Wait a second… you exclaim, R is not a scripting language like Perl, Python, or Ruby. Why would you want to use R for handling and processing text? Well, because sooner or later (I would say sooner than later) you will have to deal with some kind of string manipulation for your data analysis. So it’s better to be prepared for such tasks and know how to perform them inside the R environment.

https://www.bigbookofr.com/getting-cleaning-and-wrangling-data.html#handling-strings-with-r

Data Science: Theories, Models, Algorithms, and Analytics

by Sanjiv Ranjan Das

I developed these class notes for my Machine Learning with R course. It traces my evolution as a data scientist into redundancy, I expect I will be replaced by a machine soon! 

https://www.bigbookofr.com/machine-learning.html#data-science-theories-models-algorithms-and-analytics

Pack YouR Code

by Gaston Sanchez

The ultimate goal of this book is to teach you how to create a relatively simple R package based on the so-called S3 classes.

https://www.bigbookofr.com/r-package-development.html#pack-your-code

rlist Tutorial

by Kun Ren

rlist is a set of tools for working with list objects. Its goal is to make it easier to work with lists by providing a wide range of functions on non-tabular data stored in them. This package supports filtering, mapping, grouping, sorting, updating, searching and many other functions. It is pipe-friendly and strongly recommends functional programming style in list operations. This tutorial serves as complete guide to using rlist functionality to work with non-tabular data.

https://www.bigbookofr.com/packages.html#rlist-tutorial

Doing Bayesian Data Analysis in brms and the tidyverse

by A Solomon Kurz

Kruschke began his text with “This book explains how to actually do Bayesian data analysis, by real people (like you), for realistic data (like yours).” In the same way, this project is designed to help those real people do Bayesian data analysis.

https://www.bigbookofr.com/statistics.html#doing-bayesian-data-analysis-in-brms-and-the-tidyverse

CSSS 508 Introduction to R for Social Scientists

by Charles Lanfear, Rebecca Ferrell

Course material with Youtube Video

https://www.bigbookofr.com/social-science.html#csss-508-introduction-to-r-for-social-scientists

Principles of Econometrics with R

by Constantin Colonescu

R supplementary resource for the “Principles of Econometrics” textbook by Carter Hill, William Griffiths and Guay Lim, 4-th edition

https://www.bigbookofr.com/finance.html#principles-of-econometrics-with-r

Using R for Introductory Econometrics

by Florian Heiss

An R book supplement to the Wooldridge’s “Introductory Econometrics” textbook

https://www.bigbookofr.com/finance.html#using-r-for-introductory-econometrics

Introduction to R for Econometrics

by Kieran Marray

This is a short introduction to R to go with the first year econometrics courses at the Tinbergen Institute. It is aimed at people who are relatively new to R, or programming in general.

The goal is to give you enough of knowledge of the fundamentals of R to write and adapt code to fit econometric models to data, and to simulate your own data, working alone or with others. You will be able to: read data from csv files, plot it, manipulate it into the form you want, use sets of functions others have built (packages), write your own functions to compute estimators, simulate data to test the performance of estimators, and present the results in a nice format.

Most importantly, when things inevitably go wrong, you will be able to begin to interpret error messages and adapt others’ solutions to fit your needs.

https://www.bigbookofr.com/finance.html#introduction-to-r-for-econometrics

Introduction to Econometrics with R

by Florian Oswald, Vincent Viers, Jean-Marc Robin, Pierre Villedieu, Gustave Kenedi

Welcome to Introductory Econometrics for 2nd year undergraduates at ScPo! On this page we outline the course and present the Syllabus. 2018/2019 was the first time that we taught this course in this format, so we are in year 3 now.

https://www.bigbookofr.com/finance.html#introduction-to-econometrics-with-r-1

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:600px;}/* Add your own Mailchimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Subscribe for updates. I write about R, data and careers.

Subscribers get a free copy of Project Management Fundamentals for Data Analysts worth $12

* indicates required
Email Address *
First Name *

(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='FNAME';ftypes[1]='text';}(jQuery));var $mcj = jQuery.noConflict(true);

The post Part 2 of 3: 300+ milestone for Big Book of R appeared first on Oscar Baruffa.

To leave a comment for the author, please follow the link and comment on their blog: R programming – Oscar Baruffa.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Part 2 of 3: 300+ milestone for Big Book of R

Get tickets for An introductory course in Shiny on July 19th & 21th at 15 USD

$
0
0
[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An introductory course in Shiny

This course aims to introduce people with basic R knowledge to develop interactive web applications using the Shiny framework.

The course consists of two days, one-hour session per day, where we will discuss topics such as user interface (UI), server-side logic (tables and graphs that respond to user selection), dashboard components, and the creation of modular components. Questions are super welcome!

The course will be held online (Zoom) from 17.30 to 18.30 and 19.00 to 20.00 (Eastern Time, which is New York, Boston, Toronto time), two days a week.

Previous knowledge required: Basic R (examples: reading a CSV file, transforming columns and making graphs using ggplot2).

Course organization:

  • Day 1: Building a working Shiny app, which will be of the modular kind.
  • Day 2: Good practice and robustness checks to point at creating an easy to maintain real-life app.

Maximum number of attendees: 5.

Buy tickets at buymeacoffee.com

An introductory course in Shiny on July 19th and 21th (19.00 to 20.00): https://www.buymeacoffee.com/pacha/e/80408

To leave a comment for the author, please follow the link and comment on their blog: Pachá.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Get tickets for An introductory course in Shiny on July 19th & 21th at 15 USD

Generalized Linear Models, Part I: The Logistic Model

$
0
0
[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Context

Let’s say we are interested in predicting the gender of a candidate for the British General Elections in 1992 by using the Political Parties as a predictor. We have the next data:

library(dplyr)
library(tidyr)

elections <- tibble(
  party = c("Tories", "Labour", "LibDem", "Green", "Other"),
  women = c(57,126,136,60,135),
  men = c(577,508,496,192,546)
)

elections
## # A tibble: 5 × 3
##   party  women   men
##   <chr>  <dbl> <dbl>
## 1 Tories    57   577
## 2 Labour   126   508
## 3 LibDem   136   496
## 4 Green     60   192
## 5 Other    135   546

Being the dependent variable a categorical one, we need to propose a Logistic Model.

Let \(Y_i \mid \pi_i \sim Bin(n, \pi_i)\). If \(n=1\), \(Y_i\) indicates that a candidate is a woman or man.

In this case the Generalized Linear Model matches the probability of success (i.e., the probability of the candidate being a woman if we define that \(Y_i=1\) in that case and zero otherwise).

A good reference for all the mathematical details is McCullag and Nelder, 1983.

Model Specification

Before proceeding, we need to reshape the data.

elections_long <- elections %>% 
  pivot_longer(-party, names_to = "gender", values_to = "candidates") %>% 
  mutate(
    gender_bin = case_when(
      gender == "women" ~ 1L,
      TRUE ~ 0L
    )
  ) %>% 
  mutate_if(is.character, as.factor)

elections_long
## # A tibble: 10 × 4
##    party  gender candidates gender_bin
##    <fct>  <fct>       <dbl>      <int>
##  1 Tories women          57          1
##  2 Tories men           577          0
##  3 Labour women         126          1
##  4 Labour men           508          0
##  5 LibDem women         136          1
##  6 LibDem men           496          0
##  7 Green  women          60          1
##  8 Green  men           192          0
##  9 Other  women         135          1
## 10 Other  men           546          0

To specify a Generalized Linear Model that considers Gender (i.e., 1: female, 0: male) as the response and the Political Party as the predictor, we fit the proposed model in R.

fit <- glm(gender_bin ~ party,
           weights = candidates,
           family = binomial(link = "logit"),
           data = elections_long)

summary(fit)
## 
## Call:
## glm(formula = gender_bin ~ party, family = binomial(link = "logit"), 
##     data = elections_long, weights = candidates)
## 
## Deviance Residuals: 
##      1       2       3       4       5       6       7       8       9      10  
##  16.57  -10.43   20.18  -15.00   20.44  -15.50   13.12  -10.22   20.90  -15.53  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -1.1632     0.1479  -7.864 3.71e-15 ***
## partyLabour  -0.2310     0.1783  -1.296    0.195    
## partyLibDem  -0.1308     0.1768  -0.740    0.459    
## partyOther   -0.2342     0.1764  -1.328    0.184    
## partyTories  -1.1516     0.2028  -5.678 1.37e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 2683.2  on 9  degrees of freedom
## Residual deviance: 2628.7  on 5  degrees of freedom
## AIC: 2638.7
## 
## Number of Fisher Scoring iterations: 5

Odds Ratios

We can obtain the odds ratio of a women candidate moving from Tories to Liberal Democrats. This corresponds to

\[ \frac{\frac{\pi(x = \text{Tories})}{1 – \pi(x = \text{Tories})}\frac{\pi(x = \text{LibDem})}{1 – \pi(x = \text{LibDem})}} \]

From the model, we have \(\text{logit}(\pi) = \beta_0 + \beta_1 x\), this is the same as \[ \log \left[ \frac{\pi}{1 – \pi} \right] = \beta_0 + \beta_1 x \implies \frac{\pi}{1 – \pi} = \exp[\beta_0 + \beta_1 x]. \]

In R, we obtain the odds ratio as a substraction of the estimated coefficients No. 5 and No. 3 for this case. This is, \(\exp[\beta_5 – \beta3]\).

exp(coef(fit)[5] - coef(fit)[3])
## partyTories 
##   0.3602814

Which means that the chances of having a women candidate drop around 65% by moving from Tories to Liberal Democrats.

The opposite exercise would tell us that the chances increase around 10% by moving from Liberal Democrats to Tories.

exp(coef(fit)[3] - coef(fit)[5])
## partyLibDem 
##    2.775608

Hypothesis Testing

Consider the following hypothesis:

  1. \(H_0: \beta = 0\)
  2. \(H_0: \beta_{LABOUR} = \beta_{LIBDEM}\)
  3. \(H_0: \beta_{LABOUR} = \beta_{GREEN}\)

To test these hypothesis we can estimate the constrats, followed by their exponentials and the respective confidence intervals. The function to use in this case corresponds to the General Linear Hypotheses.

For \(H_0: \beta = 0\) we have

library(multcomp)

summary(glht(fit, mcp(party = "Tukey")), test = Chisqtest())
## 
##   General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Linear Hypotheses:
##                       Estimate
## Labour - Green == 0  -0.231049
## LibDem - Green == 0  -0.130770
## Other - Green == 0   -0.234193
## Tories - Green == 0  -1.151640
## LibDem - Labour == 0  0.100278
## Other - Labour == 0  -0.003145
## Tories - Labour == 0 -0.920591
## Other - LibDem == 0  -0.103423
## Tories - LibDem == 0 -1.020870
## Tories - Other == 0  -0.917447
## 
## Global Test:
##   Chisq DF Pr(>Chisq)
## 1 45.75  4  2.773e-09

The global test returns \(p_{CALCULATED} < p_{CRITICAL}\) (\(p_{CRITICAL} = 0.05\)), therefore we reject this hypothesis.

For the other hypothesis we have

summary(glht(fit, mcp(party = "Tukey")))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glm(formula = gender_bin ~ party, family = binomial(link = "logit"), 
##     data = elections_long, weights = candidates)
## 
## Linear Hypotheses:
##                       Estimate Std. Error z value Pr(>|z|)    
## Labour - Green == 0  -0.231049   0.178269  -1.296    0.689    
## LibDem - Green == 0  -0.130770   0.176760  -0.740    0.946    
## Other - Green == 0   -0.234193   0.176391  -1.328    0.669    
## Tories - Green == 0  -1.151640   0.202841  -5.678   <1e-04 ***
## LibDem - Labour == 0  0.100278   0.138831   0.722    0.950    
## Other - Labour == 0  -0.003145   0.138361  -0.023    1.000    
## Tories - Labour == 0 -0.920591   0.170806  -5.390   <1e-04 ***
## Other - LibDem == 0  -0.103423   0.136411  -0.758    0.941    
## Tories - LibDem == 0 -1.020870   0.169230  -6.032   <1e-04 ***
## Tories - Other == 0  -0.917447   0.168845  -5.434   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)
exp(confint(glht(fit, mcp(party = "Tukey")))[[10]])
##                  Estimate       lwr       upr
## Labour - Green  0.7937008 0.4888605 1.2886313
## LibDem - Green  0.8774194 0.5426472 1.4187205
## Other - Green   0.7912088 0.4898201 1.2780433
## Tories - Green  0.3161179 0.1821238 0.5486956
## LibDem - Labour 1.1054788 0.7579508 1.6123517
## Other - Labour  0.9968603 0.6843517 1.4520757
## Tories - Labour 0.3982834 0.2503412 0.6336541
## Other - LibDem  0.9017453 0.6223457 1.3065802
## Tories - LibDem 0.3602814 0.2274274 0.5707435
## Tories - Other  0.3995379 0.2524721 0.6322699
## attr(,"conf.level")
## [1] 0.95
## attr(,"calpha")
## [1] 2.718522

Here, the differences that are not statistically significant reveal that some parties are similar to each other (in the gender dimension), which is the case for Green vs Labour and Labour vs LibDem but not for Greens vs Tories.

Changing the Reference Factor

Consider that Green is the reference factor in the previous model. To change the reference, we can use the Tories or any other party.

library(forcats)

elections_long$party <- fct_relevel(elections_long$party, "Tories", after = 0L)

fit <- glm(gender_bin ~ party,
           weights = candidates,
           family = binomial(link = "logit"),
           data = elections_long)

fit
## 
## Call:  glm(formula = gender_bin ~ party, family = binomial(link = "logit"), 
##     data = elections_long, weights = candidates)
## 
## Coefficients:
## (Intercept)   partyGreen  partyLabour  partyLibDem   partyOther  
##     -2.3148       1.1516       0.9206       1.0209       0.9174  
## 
## Degrees of Freedom: 9 Total (i.e. Null);  5 Residual
## Null Deviance:       2683 
## Residual Deviance: 2629  AIC: 2639

Now we can compute the differences again

summary(glht(fit, mcp(party = "Tukey")))
## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glm(formula = gender_bin ~ party, family = binomial(link = "logit"), 
##     data = elections_long, weights = candidates)
## 
## Linear Hypotheses:
##                       Estimate Std. Error z value Pr(>|z|)    
## Green - Tories == 0   1.151640   0.202841   5.678   <1e-04 ***
## Labour - Tories == 0  0.920591   0.170806   5.390   <1e-04 ***
## LibDem - Tories == 0  1.020870   0.169230   6.032   <1e-04 ***
## Other - Tories == 0   0.917447   0.168845   5.434   <1e-04 ***
## Labour - Green == 0  -0.231049   0.178269  -1.296    0.689    
## LibDem - Green == 0  -0.130770   0.176760  -0.740    0.946    
## Other - Green == 0   -0.234193   0.176391  -1.328    0.669    
## LibDem - Labour == 0  0.100278   0.138831   0.722    0.950    
## Other - Labour == 0  -0.003145   0.138361  -0.023    1.000    
## Other - LibDem == 0  -0.103423   0.136411  -0.758    0.941    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Green vs Tories are still different, but the sign is reversed!

To leave a comment for the author, please follow the link and comment on their blog: Pachá.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Generalized Linear Models, Part I: The Logistic Model

R Read and Write CSV Files

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The CSV (Comma Separated Value) file is a plain text file that uses a comma to separate values.

R has a built-in functionality that makes it easy to read and write a CSV file.


Sample CSV File

To demonstrate how we read CSV files in R, let's suppose we have a CSV file named airtravel.csv with following data:

Month,  1958,   1959,   1960
JAN,    340,    360,    417
FEB,    318,    342,    391
MAR,    362,    406,    419
APR,    348,    396,    461
MAY,    363,    420,    472
JUN,    435,    472,    535
JUL,    491,    548,    622
AUG,    505,    559,    606
SEP,    404,    463,    508
OCT,    359,    407,    461
NOV,    310,    362,    390
DEC,    337,    405,    432

The CSV file above is a sample data of monthly air travel, in thousands of passengers, for 1958-1960.

Now, let's try to read data from this CSV File using R's built-in functions.


Read a CSV File in R

In R, we use the read.csv() function to read a CSV file available in our current directory. For example,

# read airtravel.csv file from our current directory
read_data <- read.csv("airtravel.csv")

# display csv file
print(read_data)

Output

      Month,  1958,  1959,    1960
1    JAN       340    360      417
2    FEB       318    342      391
3    MAR       362    406      419
4    APR       348    396      461
5    MAY       363    420      472
6    JUN       435    472      535
7    JUL       491    548      622
8    AUG       505    559      606
9    SEP       404    463      508
10  OCT        359    407      461
11  NOV        310    362      390
12  DEC        337    405      432

In the above example, we have read the airtravel.csv file that is available in our current directory. Notice the code,

read_data <- read.csv("airtravel.csv")

Here, read.csv() reads the csv file airtravel.csv and creates a dataframe which is stored in the read_data variable.

Finally, the csv file is displayed using print().

Note: If the file is in some other location, we have to specify the path along with the file name as: read.csv("D:/folder1/airtravel.csv").


Number of Rows and Columns of CSV File in R

We use the ncol() and nrow() function to get the total number of rows and columns present in the CSV file in R. For example,

# read airtravel.csv file from our directory
read_data <- read.csv("airtravel.csv")

# print total number of columns
cat("Total Columns: ", ncol(read_data))

# print total number of rows
cat("Total Rows:", nrow(read_data))

Output

Total Columns: 4
Total Rows: 12 

In the above example, we have used the ncol() and nrow() function to find the total number of columns and rows in the airtravel.csv file.

Here,

  • ncol(read_data) - returns total number of columns i.e. 4
  • nrow(read_data) - returns total number of rows i.e. 12

Using min() and max() With CSV Files

In R, we can also find minimum and maximum data in a certain column of a CSV file using the min() and max() function. For example,

# read airtravel.csv file from our directory
read_data <- read.csv("airtravel.csv")

# return minimum value of 1960 column of airtravel.csv
 min_data <- min(read_data$1960)  # 390

# return maximum value of 1958 column of airtravel.csv
 min_data <- max(read_data$1958)  # 505

Output

[1] 390
[1] 505

Here, we have used the min() and max() function to find the minimum and maximum value of the 1960 and 1958 column of the airtravel.csv file respectively.

  • min(read_data$1960) - returns the minimum value from the 1960 column i.e. 390
  • max(read_data$1958) - returns the maximum value from the 1958 column i.e. 505

Subset of a CSV File in R

In R, we use the subset() function to return all the datas from a CSV file that satisfies the specified condition. For example,

# read airtravel.csv file from our directory
read_data <- read.csv("airtravel.csv")

# return subset of csv where number of air 
# traveler in 1958 should be greater than 400
sub_data <- subset(read_data, 1958 > 400) 

print(sub_data)

Output

      Month,  1958,  1959,  1960
6    JUN       435    472      535
7    JUL       491    548      622
8    AUG       505    559      606
9    SEP       404    463      508

In the above example, we have specified a certain condition inside the subset() function to extract data from a CSV file.

subset(read_data, 1958 > 400)

Here, subset() creates a subset of airtravel.csv with data column 1958 having data greater than 400 and stored it in the sub_data data frame.

Since column 1958 has data greater than 400 in 6th, 7th, 8th, and 9th row, only these rows are displayed.


Write Into CSV File in R

In R, we use the write.csv() function to write into a CSV file. We pass the data in the form of dataframe. For example,

# Create a data frame
dataframe1 <- data.frame (
  Name = c("Juan", "Alcaraz", "Simantha"),
  Age = c(22, 15, 19),
  Vote = c(TRUE, FALSE, TRUE))

# write dataframe1 into file1 csv file
write.csv(dataframe1, "file1.csv")

In the above example, we have used the write.csv() function to export a data frame named dataframe1 to a CSV file. Notice the arguments passed inside write.csv(),

write.csv(dataframe1, "file1.csv")

Here,

  • dataframe1 - name of the data frame we want to export
  • file1.csv - name of the csv file

Finally, the file1.csv file would look like this in our directory:

If we pass "quote = FALSE" to write.csv() as:

write.csv(dataframe1, "file1.csv",
  quote = FALSE
)

Our file1.csv would look like this:

All the values which were wrapped by double quotes " " are removed.

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Read and Write CSV Files

2022 Projections & FFA App Updates

$
0
0
[This article was first published on R Archives - Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This offseason, we worked hard to update our FFA Projection and Lineup Optimizer apps. We are excited to provide the apps to our subscribers to test out and provide feedback. You can access the new app at the link below:

https://apps.fantasyfootballanalytics.net/

Preview of new web app

App Improvements

We rebuilt the app using Golem in R/Shiny to create a much faster and more dynamic app experience during drafts. We worked hard to optimize the app to improve loading and computation times, which should result in an overall smoother experience and make it easier to use and to update lineups quickly.

Features

This new app has a similar feature set as the original FFA apps but should be faster to load and update scoring settings using the “Calculate Projections” button when scoring and lineup settings are changed.

Feedback

We have greatly improved the apps over the years in response to feedback from our users. Give it a test run and let us know your thoughts in the comments (or via our Google form for bug reporting)!

-FFA Team

The post 2022 Projections & FFA App Updates appeared first on Fantasy Football Analytics.

To leave a comment for the author, please follow the link and comment on their blog: R Archives - Fantasy Football Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: 2022 Projections & FFA App Updates

How to Use Spread Function in R?-tidyr Part1

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Use Spread Function in R?-tidyr Part1 appeared first on Data Science Tutorials

How to Use Spread Function in R, To “spread” a key-value pair across multiple columns, use the spread() method from the tidyr package.

The basic syntax used by this function is as follows.

spread(data, key value)

where:

data: Name of the data frame

key: column whose values will serve as the names of variables

value: Column where new variables formed from keys will populate with values

How to Use Spread Function in R?

The practical application of this function is demonstrated in the examples that follow.

dplyr Techniques and Tips – Data Science Tutorials

Example 1: Divide Values Between Two Columns

Let’s say we have the R data frame shown below.

Let’s create a data frame

df <- data.frame(player=rep(c('A', 'B'), each=4),
year=rep(c(1, 1, 2, 2), times=2),
stat=rep(c('points', 'assists'), times=4),
amount=c(14, 6, 18, 7, 22, 9, 38, 4))

Now we can view the data frame

df
   player year    stat amount
1     P1    1  points    125
2     P1    1 assists    142
3     P1    2  points    145
4     P1    2 assists    157
5     P2    1  points    134
6     P2    1 assists    213
7     P2    2  points    125
8     P2    2 assists    214

The stat column’s values can be separated into separate columns using the spread() function.

library(tidyr)

Dividing the stats column into several columns

spread(df, key=stat, value=amount)
player year assists points
1     P1    1     142    125
2     P1    2     157    145
3     P2    1     213    134
4     P2    2     214    125

Example 2: Values Should Be Spread Across More Than Two Columns

Let’s say we have the R data frame shown below:

Imagine we have the following data frame

df2 <- data.frame(player=rep(c('P1'), times=8),
year=rep(c(1, 2), each=4),
stat=rep(c('points', 'assists', 'steals', 'blocks'), times=2),
amount=c(115, 116, 212, 211, 229, 319, 213, 314))

Now we can view the data frame

df2
  player year    stat amount
1     P1    1  points    115
2     P1    1 assists    116
3     P1    1  steals    212
4     P1    1  blocks    211
5     P1    2  points    229
6     P1    2 assists    319
7     P1    2  steals    213
8     P1    2  blocks    314

The spread() function can be used to create four additional columns from the stat column’s four distinct values.

library(tidyr)

Dividing the stats column into several columns

spread(df2, key=stat, value=amount)
   player year assists blocks points steals
1     P1    1     116    211    115    212
2     P1    2     319    314    229    213
.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

How to Group and Summarize Data in R – Data Science Tutorials

Have you liked this article? If you could email it to a friend or share it on Facebook, Twitter, or Linked In, I would be eternally grateful.

Please use the like buttons below to show your support. Please remember to share and comment below. 

The post How to Use Spread Function in R?-tidyr Part1 appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Use Spread Function in R?-tidyr Part1

Refine your R Skills with Free Access to DataCamp

$
0
0
[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introducing Free Week 

DataCamp is excited to announce their Free Week commencing Monday, 18 July at 8 AM ET. Anyone interested in developing their R programming skills or improving their data literacy can enjoy unlimited free access for a week until 24 July at 11:59 PM. 

For those who are new to R or are supporting teams using R, now is the time to dig deeper into the programming language. DataCamp offers R courses from introductory courses to more advanced topics, meaning you’ll find learning opportunities no matter your level. 

Free Access for Individuals 

To access DataCamp Free Week for individuals, you’ll only need your email address when signing up on their Free Week page.Once you’ve signed up, you will have access to their entire library of 387 courses, 83 projects, 55 practice sessions, and 15 assessments across Python, R, SQL, Power BI, Tableau, and more. If you are an R loyalist, you’ll be happy to hear about our 149 courses specializing in your favorite programming language.Free Week access for individuals starts at 8 am ET on 18 July and finishes at 11.59 pm ET on 24 July. Not only that, but to ensure you finish the week off with confidence, you also have access to the following resources:Moreover, for their intermediate and advanced learners solely interested in developing your R skills, their skill and career tracks are perfect for you. They have skill and career tracks specially curated for developing R programming skills where you get to develop your knowledge in the following areas:

Free Access for Teams

DataCamp recognizes and appreciates the increasing dependence that companies have on data-driven decisions. With that, we created DataCamp for Business to help businesses upskill their workforce to become data-driven.In DataCamp for Business, you and your team will have access to the following resources:
  • Build learning programs with their team to create custom learning paths just for your team (eg. if your team specializes in R, their team will help create R-specialized learning pathways)
  • Report on your ROI using DataCamp’s visualization and spreadsheet tools to thoroughly understand your team’s progress
  • Upskill your company by investing in professional development for all roles and skill levels
  • Get started and scale using DataCamp’s integrations
If you want to start developing your team today, sign up to DataCamp Teams during Free Week, and gain seven days of free access from the point of sign up. You will receive seven days’ free access from the point of sign-up, and your payment will automatically renew after that period. We cannot wait for you to join them along with 2,500 other companies that have upskilled their team with DataCamp. They are proud to have a diverse list of clients, including top companies from consulting, the FTSE 1000, and 180+ government agencies. 
Refine your R Skills with Free Access to DataCamp was first posted on July 19, 2022 at 5:20 am.
To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Refine your R Skills with Free Access to DataCamp

Leaving a PhD Program

$
0
0
[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Abstract

Purpose

The purpose of this post is to offer the author’s view about the conceptualization of what it is to be a successful Ph.D. student and some of the challenges Iberoamerican students face to have an impact.

Approach

This post is a personal account based on the author’s experience and what I’ve learned from friends and acquaintances.

Findings

The post presents a short critique of the conceptualization and measurement of scholarly impact and distorted self-metrics.

Introduction

I will first present my background so the reader can be aware of potential biases in this account. I am a Chilean and studied as an undergraduate in Chile. My major was in Business and Economics, and then I obtained the equivalent of a Master of Arts in Trade Policy and a Master of Science in Statistics.

I finally decided to continue a Ph.D. after working as a practitioner for several years. I followed that path to develop expertise in critical thinking, project management, and writing. To date, I find that I’m awful at writing.

The worst advice I ever got was “your master’s was in Statistics, it is natural to continue a Ph.D. in Statistics.”

During my master’s, I wrote a thesis in International Trade. I am a statistician interested in applying statistical methods to address specific policy-relevant questions, particularly in international trade, migration, investments and theoretically founded empirical work. My work before starting the Ph.D. was focused on three interrelated areas (I hope it keeps that way):

  1. Gravity modelling and the role of information on the implications of models
  2. Gravity estimation and efficient software implementations
  3. Methodology and data compilation practices

After the master’s, I shortly applied to a top Ph.D. in Statistics program at a highly ranked world-class university. Back then, I applied to one program at one university, something that applicants should never do.

Now I’m at another top program, but this time I’ve learned that realizing one’s strengths and weaknesses is essential.

Main challenges

European Ph.D. programs focus on a specific, while the US/Canada feature 1-2 years of general training. Yet, in both cases choosing a field is not a trivial task. I heard before applying that acquired experience before a Ph.D. is critical, such as internships or a master’s degree, and that you need to find a ‘passion.’

In my own experience, after one year in the program, I found all the things I’ve learned at P. U. Católica de Chile to be sound. But that is not limited to course contents. At PUC, we were told, “We must work well, work with dedication, work at all times. Our work will be contributing directly to the development of society.” That is exactly what you do in a Ph.D.

One thing is to be able to write the IELTS test and obtain a good band. Something very different is to be able to process and elaborate your ideas in a language different from yours.

In my case, I translated books, which was easy because I was taking the same order in which somebody else decided to organize their ideas.

There is also the idea of using scholarly metrics from industrialized countries. Those like me, who come from South America, know that research that impacts our region has to focus on problems that are no longer part of academic fashion in the developed world.

A Ph.D. program is, in my opinion, becoming an expert planner. During my first year, I had to face advanced mathematics courses, which involved high-level abstraction and theoretical discussion that I hadn’t previously accessed.

Because of my background, I had to study day and night to level up, yet to find that I didn’t have the required intelligence, nor would I get ties with genius-level classmates. In my program, I was, by definition, in the lower tier because I had classmates who were true geniuses.

I never thought about anything significant, like solving P = NP. I want to develop structural methods to translate observed trade policy changes and changes in trade costs into policy guidelines at the national and sectoral levels.

A Ph.D. also involves developing the ability to think in ‘linear’ terms, unlike most college and master’s courses, where project completion is highly rewarded. Doctoral-level courses care about many little details. If you can look at abstract problems with a magnifying glass, you might be a successful Ph.D, but there are problems and problems.

In my case, I was interested in problems that ended up poorly fitting with the discipline and the program. I was aware that the issues I’m interested in are at the intersection of Computer Science, Economics and Statistics.

One statistical aspect I love is reading articles and writing software to implement the new methods I find, which has led me to two published articles. Writing software is helpful but won’t count towards points to obtain your Ph.D. or give you a tenure-track position. In short, your GitHub profile has a net value of zero in academia. In industry, it’s a very different story.

Distorted self-metrics

As I’ve mentioned above, a Ph.D. means being a professional planner. In my case, I started a journey of poor diet, insufficient sleep, and studying Monday to Sunday. I fell into the opposite of good planning, drinking coffee at 2 AM on a Wednesday, saying I’ll recover sleep during the weekend, eating a lot of pizza thinking about eating healthy later.

Even when I knew I would never be like the geniuses in the program, the Ph.D. program involves a lot of consistency and self-discipline. You can obtain your degree with dedication and aligned goals for sure. I repeatedly told myself that I was there “Because I Want to Fit In.”

That idea of fitting in led me to have all my meals in front of the computer or with open books, usually working on a proof while trying to fill all my background gaps.

What I’m describing is self-deceit, and it’s not productive. It leads to nowhere.

One of the challenges in gravity modelling (my area of interest) is to produce theoretically consistent models that are also computationally efficient. Still, all those abstract courses I took were not pushing me in a direction to do it better. Another self-deceit is when you consciously ignore that the program is not suitable for you, because you desperately want to obtain the degree at any cost.

Leaving a program

I am older than some of my professors, and I’ve heard many stories about why people leave. Leaving a program is a decision that has influences from all sides. You may feel it is a failure because you do not fit in, or you can feel it will end up in a conflict with your parents or wife.

Some people leave because of not having the right feeling about the program focus or courses. Others face different issues, such as antisemitism or sexism, which shouldn’t happen in an educational context or anywhere.

I wanted to devise a proper measurement of trade frictions, which is crucial for reliable analysis of the impact of international trade and trade policy on welfare and all other economic outcomes of interest to academics and policymakers. Switching programs was the best I could do to achieve that goal.

In my case, I live with a disability, which led me deeply into self-deceit because I didn’t want to leave the program and send a weak signal. I’ve often been infantilized or marginalized because of living with a disability, which should enforce the idea of “I don’t have no time for no monkey business” instead of the concept of fitting in.

Unexpected factors

If you apply to a Ph.D. program, many variables affect the result of the application. In my case, I wrote to and interviewed with different professors and discarded all the options that were not as good in my opinion back then, with limited information or because of the lack of funding.

A Ph.D. program consists of four or more years. During those years, you might (hopefully not!) become seriously ill, get married, move to another city or get an unexpected fantastic offer in the industry. This is a summary of some of the things I’ve heard.

The Ph.D. studies are subject to many external variables as well. We are still in the middle of a pandemic, and COVID-19 influences educational decisions and academic performance.

To leave a comment for the author, please follow the link and comment on their blog: Pachá.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Leaving a PhD Program

How to handle missing data in r

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to handle missing data in r appeared first on finnstats.

How to handle missing data in r, If you’ve ever conducted any research involving measurements taken in the actual world, you are aware that the data is frequently messy.

The quality of the data can be controlled in a lab, but this is not always the case in the actual world. There are occasions when events outside of your control can result in data gaps.

How to handle missing data in r

In R, there are numerous methods for handling missing data. The is.na() function can be used to simply detect it.

Another function in R called na.omit() removes any rows in the data frame that have missing data. NA is used to indicate missing data so that it may be quickly identified.

Removing Missing values in R-Quick Guide »

It is effortlessly accepted by data.frame(). The cbind() function does issue a warning even though it will accept data that contains NA.

By using the na.rm logical boundary, dataframe functions can address missing data in one method.

Delete NA values from r.

The NA number cannot be incorporated into calculations because it is only a placeholder and not a real numeric value.

Therefore, it must be eliminated from the calculations in some way to produce a useful result. An NA value will be produced if the NA value is factored into a calculation.

While this might be OK in some circumstances, in others you require a number. The na.omit() function, which deletes the entire row, and the na.rm logical perimeter, which instructs the function to skip that value, are the two methods used in R to eliminate NA values.

What does the R-word na.rm mean?

When utilizing a dataframe function, the logical argument na.rm in the R language specifies whether or not NA values should be eliminated from the calculation. Literally, it means remove NA.

It is not an operation or a function. It is merely a parameter that many dataframe functions use. ColSums(), RowSums(), ColMeans(), and RowMeans are some of them ().

The function skips over any NA values if na.rm is TRUE. However, if na.rm returns FALSE, the calculation on the entire row or column yields NA.

Na.rm examples in R

We need to set up a dataframe before we can begin our examples.

x<-data.frame(a=c(22,45,51,78),b=c(21,16,18,NA),c=c(110,234,126,511))
x
  a  b   c
1 22 21 110
2 45 16 234
3 51 18 126
4 78 NA 511

For these examples, the missing data set will be the NA in row 4 column b.

Imputing missing values in R »

colMeans(x, na.rm = TRUE, dims = 1)
   a         b         c
 49.00000  18.33333 245.25000
rowSums(x, na.rm = FALSE, dims = 1)
[1] 153 295 195  NA

rowSums(x, na.rm = TRUE, dims = 1)

[1] 153 295 195 589

With the exception of the fact that in the first example, na.rm = FALSE, the second and third examples are identical. That radically alters everything.

Correct data science requires dealing with missing data from a data set. R is used so frequently in statistical research because it makes handling this missing data so simple.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { } #mailpoet_form_3 form { margin-bottom: 0; } #mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; } #mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; } #mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; } #mailpoet_form_3 .mailpoet_checkbox { } #mailpoet_form_3 .mailpoet_submit { } #mailpoet_form_3 .mailpoet_divider { } #mailpoet_form_3 .mailpoet_message { } #mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; } #mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; } #mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

Have you found this article to be interesting? We’d be glad if you could forward it to a friend or share it on Twitter or Linked In to help it spread.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post How to handle missing data in r appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to handle missing data in r

R Functions

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Introduction to R Functions

A function is just a block of code that you can call and run from any part of your program. They are used to break our code in simple parts and avoid repeatable codes.

You can pass data into functions with the help of parameters and return some other data as a result. You can use the function() reserve keyword to create a function in R. The syntax is:

func_name <- function (parameters) {
statement
}

Here, func_name is the name of the function. For example,

# define a function to compute power
power <- function(a, b) {
    print(paste("a raised to the power b is: ", a^b))
}

Here, we have defined a function called power which takes two parameters - a and b. Inside the function, we have included a code to print the value of a raised to the power b.


Call the Function

After you have defined the function, you can call the function using the function name and arguments. For example,

# define a function to compute power
power <- function(a, b) {
    print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments
power(2, 3)

Output

[1] "a raised to the power b is:  8"

Here, we have called the function with two arguments - 2 and 3. This will print the value of 2 raised to the power 3 which is 8.

The arguments used in the actual function are called formal arguments. They are also called parameters. The values passed to the function while calling the function are called actual arguments.


Named Arguments

In the above function call of the power() function, the arguments passed during the function call must be of the same order as the parameters passed during function declaration.

This means that when we call power(2, 3), the value 2 is assigned to a and 3 is assigned to b. If you want to change the order of arguments to be passed, you can use named arguments. For example,

# define a function to compute power
power <- function(a, b) {
    print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments
power(b=3, a=2)

Output

[1] "a raised to the power b is:  8"

Here, the result is the same irrespective of the order of arguments that you pass during the function call.

You can also use a mix of named and unnamed arguments. For example,

# define a function to compute power
power <- function(a, b) {
    print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments
power(b=3, 2)

Output

[1] "a raised to the power b is:  8"

Default Parameters Values

You can assign default parameter values to functions. To do so, you can specify an appropriate value to the function parameters during function definition.

When you call a function without an argument, the default value is used. For example,

# define a function to compute power
power <- function(a = 2, b) {
    print(paste("a raised to the power b is: ", a^b))
}

# call the power function with arguments
power(2, 3)

# call function with default arguments
power(b=3)

Output

[1] "a raised to the power b is:  8"
[1] "a raised to the power b is:  8"

Here, in the second call to power() function, we have only specified the b argument as a named argument. In such a case, it uses the default value for a provided in the function definition.


Return Values

You can use the return() keyword to return values from a function. For example,

# define a function to compute power
power <- function(a, b) {
    return (a^b)
}

# call the power function with arguments
print(paste("a raised to the power b is: ", power(2, 3)))

Output

[1] "a raised to the power b is:  8"

Here, instead of printing the result inside the function, we have returned a^b. When we call the power() function with arguments, the result is returned which can be printed during the call.


Nested Function

In R, you can create a nested function in 2 different ways.

  • By calling a function inside another function call.
  • By writing a function inside another function.

Example 1: Call a Function Inside Another Function Call

Consider the example below to create a function to add two numbers.

# define a function to compute addition
add <- function(a, b) {
    return (a + b)
}

# nested call of the add function 
print(add(add(1, 2), add(3, 4)))

Output

[1] 10

Here, we have created a function called add() to add two numbers. But during the function call, the arguments are calls to the add() function.

First, add(1, 2) and add(3, 4) are computed and the results are passed as arguments to the outer add() function. Hence, the result is the sum of all four numbers.


Example 2: Write a Function Inside Another Function

Let's create a nested function by writing a function inside another function.

# define a function to compute power
power <- function(a) {
    exponent <- function(b) {
        return (a^b)
    }
    return (exponent)
}

# call nested function 
result <- power(2)
print(result(3))

Output

[1] 8

Here, we cannot directly call the power() function because the exponent() function is defined inside the power() function.

Hence, we need to first call the outer function with the argument a and set it to a variable. This variable now acts as a function to which we pass the next argument b.

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Functions

R Histogram

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A histogram is a graphical display of data using bars of different heights.

Histogram is used to summarize discrete or continuous data that are measured on an interval scale.


Create Histogram in R

In R, we use the hist() function to create Histograms. For example,

temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )

# histogram of temperatures vector
result <- hist(temperatures)

print(result)

Output

In the above example, we have used the hist() function to create a histogram of the temperatures vector.

The histogram we have created above is plain and simple, we can add so many things to the Histogram.


Add Title and Label to a Histogram in R

To add a title and a label to our Histogram in R, we pass the main and the xlab parameter respectively inside the hist() function. For example,

temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )

# histogram of temperatures vector
result <- hist(temperatures,
  main = "Histogram of Temperature",
  xlab = "Temperature in degrees Fahrenheit"

print(result)

Output

In the above figure, we can see that we have added a title and a label to the Histogram of the temperatures vector.

hist(temperatures, 
  main = "Maximum Temperatures in a Week",
  xlab = "Temperature in degrees Fahrenheit")

Here,

  • main - adds the title "Maximum Temperatures in a Week"
  • xlab - adds the label "Temperature in degrees Fahrenheit"

Change Bar Color of Histogram in R

In R, we pass the col parameter inside hist() to change the color of bars. For example,

temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )

# histogram of temperatures vector
result <- hist(temperatures,
  main = "Histogram of Temperature",
  xlab = "Temperature in degrees Fahrenheit",
  col = "red")

print(result)

Output

In the above example, we have used the col parameter inside barplot() to change the color of bars.

result <- hist(temperatures, 
  ...
  col = "red"
)

Here, col = "red" changes the color of bars to red.


Range of Axes in R

To provide a range of the axes in R, we pass the xlab and the ylab parameter inside hist(). For example,

temperatures <- c(67 ,72 ,74 ,62 ,76 ,66 ,65 ,59 ,61 ,69 )

# histogram of temperatures vector
result <- hist(temperatures,
  main = "Histogram of Temperature",
  xlab = "Temperature in degrees Fahrenheit",
  col = "red",
  xlim = c(50,100),
  ylim = c(0, 5))

print(result)

Output

In the above example, we have used the xlim and the ylim parameter inside hist() to provide a range of x-axis and y-axis respectively.

result <- hist(temperatures, 
  ...
  xlim = c(50,100),
  ylim = c(0, 5))
)

Here,

  • x-axis ranges from 50 to 100
  • y-axis ranges from 0 to 5

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Histogram

R Pie Chart

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A pie chart is a circular statistical graphic, which is divided into slices to illustrate numerical proportion.

Pie charts represents data visually as a fractional part of a whole, which can be an effective communication tool.


Create Pie Plot in R

In R, we use the pie() function to create a pie chart. For example,

expenditure <- c(600, 300, 150, 100, 200)

# pie chart of of expenditure vector
result <- pie(expenditure)

print(result)

Output

In the above example, we have used the pie() function to create a pie chart of the expenditure vector.

The pie chart we have created above is plain and simple, we can add so many things to the pie chart.


Add Title to a Pie Chart in R

To add a title to our pie chart in R, we pass the main parameter inside the pie() function. For example,

expenditure <- c(600, 300, 150, 100, 200)

result <- pie(expenditure,
  main = "Monthly Expenditure Breakdown"  
)

print(result)

Output

In the above figure, we can see that we have added a title to the pie chart of the expenditure vector.

result <- pie(expenditure
  main = "Monthly Expenditure Breakdown"  
)

Here, the main parameter adds the title "Monthly Expenditure Breakdown" to our pie chart.


Add Labels to Each Pie Chart Slice in R

We pass the labels parameter inside pie() to provide labels to each slice of a pie chart in R.

For example,

expenditure <- c(600, 300, 150, 100, 200)

result <- pie(expenditure,
  main = "Monthly Expenditure Breakdown",
  labels = c("Housing", "Food", "Cloths", "Entertainment", "Other")
)

print(result)

Output

In the above example, we have used the labels parameter to provide names to each slice of pie chart. Notice the code,

pie(expenditure,
  labels = c("Housing", "Food", "Cloths", "Entertainment", "Other")
)

Here, we have assigned "Housing" to the first vector item 600, "Food" to the second vector item 300 and so on.


Change Color of Pie Slices in R

In R, we pass the col parameter inside pie() to change the color of each pie slices. For example,

expenditure <- c(600, 300, 150, 100, 200)

result <- pie(expenditure,
  main = "Monthly Expenditure Breakdown",
  labels = c("Housing", "Food", "Cloths", "Entertainment", "Other"),
  col = c("red", "orange", "yellow", "blue", "green")
)

print(result)

Output

In the above example, we have used the col parameter inside pie() to change the color of each slice of a pie chart.

pie(expenditure,
  ...
  labels = c("Housing", "Food", "Cloths", "Entertainment", "Other"),
  col = c("red", "orange", "yellow", "blue", "green")
)

Here, we have provided a vector of colors which corresponds to each label of a pie chart.


Create a 3D Pie Chart in R

In order to create a 3D pie chart, first we need to import the plotrix package. Then, we use the pie3D() function to create a 3D pie chart. For example,

# import plotrix to use pie3D()
library(plotrix)

expenditure <- c(600, 300, 150, 100, 200)

result <- pie3D(expenditure,
  main = "Monthly Expenditure Breakdown",
  labels = c("Housing", "Food", "Cloths", "Entertainment", "Other"),
  col = c("red", "orange", "yellow", "blue", "green")
)

print(result)

Output

Here, we have used the pie3D() function to create a 3D pie chart.


To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Pie Chart

R Boxplot

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A boxplot is a graph that gives us a good indication of how the values in the data are spread out.

Box plots provide some indication of the data's symmetry and skew-ness.


Dataset to Create Boxplot

In R, first we need to load the dataset of which we want to create the boxplot of.

In this tutorial, we will be using the built-in dataset named mtcars to create a boxplot.

Let's see the first six rows of the dataset we will be using,

# use head() to load first six rows of mtcars dataset
head(mtcars)

Output

We will be creating a stripchart of this dataset.


Create boxplot in R

In R, we use the boxplot() method to create a boxplot. For example,

# boxplot for ozone reading of airquality dataset
boxplot(mtcars$mpg)

Output

In the above example, we have used the boxplot() function and the $ operator to create a boxplot of the mpg reading of the mtcars dataset.

We can pass additional parameters to control the way our plot looks.


Add Title, Label, New Color to a Boxplot in R

We can add titles, provide labels for the axes, and change the color of the boxplot in R. For example,

# add title, label, new color to boxplot
boxplot(mtcars$mpg,
  main="Mileage Data Boxplot",
  ylab="Miles Per Gallon(mpg)",
  xlab="No. of Cylinders",
  col="orange")

Output

In the above figure, we can see that we have added a title, a label to the x-axis and y-axis, and changed the color of the boxplot.

Here,

  • main– adds the title "Mileage Data Boxplot"
  • xlab– adds the label "No. of Cylinders" for x-axis
  • ylab– add the label "Miles Per Gallon(mpg)" for y-axis
  • col = "Orange"– changes the color of boxplot to orange

Boxplot Formula in R

In R, the function boxplot() can also take in formulas of the form y~x where y is a numeric vector which is grouped according to the value of x.

For example, in our dataset mtcars, the mileage per gallon mpg is grouped according to the number of cylinders cyl present in cars.

Let's take a look at example,

boxplot(mpg ~ cyl, data = mtcars,
  main = "Mileage Data Boxplot",
  ylab = "Miles Per Gallon(mpg)",
  xlab = "No. of Cylinders",
  col = "orange")

Output

In the above example, we have created a boxplot for the relation between mpg and cyl. Notice the code

boxplot(mpg ~ cyl, data = mtcars,
  ...
)

Here,

  • mpg ~ cyl– mileage per gallon mpg is grouped according to the number of cylinders cyl in cars
  • data = mtcars– data is taken from mtcars dataset

It is clear from the above figure that less number of cylinders means more mileage per gallon.


Add Notch to Boxplot in R

In R, we add a notch to boxplot to find out how the medians of different data groups match with each other. For example,

boxplot(mpg ~ cyl, data = mtcars,
  main ="Mileage Data Boxplot",
  ylab ="Miles Per Gallon(mpg)",
  xlab ="No. of Cylinders",
  col ="orange",
  notch = TRUE)

Output

In the above example, we have added notch to boxplot to find out how the medians of different data groups match with each other.

Note: If the notches overlap, we can say that the medians are equal to each other.

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Boxplot

R strip Chart

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A strip chart is a type of chart that displays numerical data along a single strip.

A strip chart can be used to visualize dozens of time series at once.


Dataset to Create Strip Chart

In R, first we need to load the dataset of which we want to create the strip chart of.

In this tutorial, we will be using the built-in dataset named airquality to create a strip chart.

Let’s see the first six rows of the dataset we will be using,

# use head() to load first six rows of airquality dataset
head(airquality)

Output

We will be creating a stripchart of this dataset.


Create Strip Chart in R

In R, we use the stripchart() function to create a strip chart. For example,

# strip chart for ozone reading of airquality dataset
stripchart(airquality$Ozone)

Output

In the above example, we have used the stripchart() function and the $ operator to create a strip chart of the Ozone reading of the airquality dataset.

We can pass additional parameters to control the way our plot looks.


Add Title, Label, New Color to a Strip Chart in R

We can add titles, provide labels for the axes, and change the color of the strip chart in R. For example,

# add title, label, new color to strip chart
stripchart(airquality$Ozone,
  main="Mean ozone in parts per billion at Roosevelt Island",
  xlab="Parts Per Billion",
  ylab="Ozone",
  col="orange")

Output

In the above figure, we can see that we have added a title, a label to the x-axis and y-axis, and changed the color of the strip.

Here,

  • main– adds the title "Mean ozone in parts per billion at Roosevelt Island"
  • xlab– adds the label "Parts Per Billion" for x-axis
  • ylab– add the label "Ozone" for y-axis
  • col = "Orange"– changes the color of strip to orange

Jitter Plot in R

Jitter plot is a variant of the strip plot with a better view of overlapping data points. It is useful when there are large clusters of data points.

We pass method = "Jitter" inside the stripchart() method to create a strip chart without overlapping of points. For example,

stripchart(airquality$Ozone,
  main="Mean ozone in parts per billion at Roosevelt Island",
  xlab="Parts Per Billion",
  ylab="Ozone",
  col="orange",
  method = "jitter")

Output

In the above example, we have used the method parameter inside stripchart() to create a jitter plot.

stripchart(airquality$Ozone,
  ...
  method = "jitter")

Here, method = "jitter" specifies the coincident points are plotted like stacked or jitter and no points are overlapped.


Multiple Strip Charts in R

We can draw multiple strip charts in a single plot, by passing in a list of numeric vectors. For example,

# create list of ozone and solar radiation reading of airquality dataset
list1 <- list("Ozone" = airquality$Ozone,  "Solar Radiations" = airquality$Solar.R)

stripchart(list1,
  main="Mean ozone in parts per billion at Roosevelt Island",
  xlab="Parts Per Billion",
  col= c("orange","brown"),
  method = "jitter")

Output

In the above example, we have passed a list named list1 with two vectors: Ozone and Solar Radiation of airquality dataset inside stripchart() to create multiple strips.

We have also provided two colors to represent two different strip charts

  • "orange" - to represent Ozone readings
  • "brown" - to represent Solar.R readings
To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R strip Chart

R break and next

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We use the R break and next statements to alter the flow of a program. These are also known as jump statements in programming:

  • break– terminate a looping statement
  • next– skips an iteration of the loop

R break Statement

You can use a break statement inside a loop (for, while, repeat) to terminate the execution of the loop. This will stop any further iterations.

The syntax of the break statement is:

if (test_expression) {
  break
}

The break statement is often used inside a conditional (if...else) statement in a loop. If the condition inside the test_expression returns True, then the break statement is executed. For example,

# vector to be iterated over
x = c(1, 2, 3, 4, 5, 6, 7)

# for loop with break statement
for(i in x) {
    
    # if condition with break
    if(i == 4) {
        break
    }
    
    print(i)
}

Output

[1] 1
[1] 2
[1] 3

Here, we have defined a vector of numbers from 1 to 7. Inside the for loop, we check if the current number is 4 using an if statement.

If yes, then the break statement is executed and no further iterations are carried out. Hence, only numbers from 1 to 3 are printed.


break Statement in Nested Loop

If you have a nested loop and the break statement is inside the inner loop, then the execution of only the inner loop will be terminated.

Let's check out a program to use break statements in a nested loop.

# vector to be iterated over
x = c(1, 2, 3)
y = c(1, 2, 3)

# nested for loop with break statement
for(i in x) {
    for (j in y) {
        if (i == 2 & j == 2) {
            break
        }
        print(paste(i, j))
    }
}

Output

[1] "1 1"
[1] "1 2"
[1] "1 3"
[1] "2 1"
[1] "3 1"
[1] "3 2"
[1] "3 3"

Here, we have a break statement inside the inner loop.

We have used it inside a conditional statement such that if both the numbers are equal to 2, the inner loop gets terminated.

The flow then moves to the outer loop. Hence, the combination (2, 2) is never printed.


R next Statement

In R, the next statement skips the current iteration of the loop and starts the loop from the next iteration.

The syntax of the next statement is:

if (test_condition) {
  next
}

If the program encounters the next statement, any further execution of code from the current iteration is skipped, and the next iteration begins.

Let's check out a program to print only even numbers from a vector of numbers.

# vector to be iterated over
x = c(1, 2, 3, 4, 5, 6, 7, 8)

# for loop with next statement
for(i in x) {
    
    # if condition with next
    if(i %% 2 != 0) {
        next
    }
    
    print(i)
}

Output

[1] 2
[1] 4
[1] 6
[1] 8

Here, we have used an if statement to check whether the current number in the loop is odd or not.

If yes, the next statement inside the if block is executed, and the current iteration is skipped.

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R break and next

R repeat Loop

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We use the R repeat loop to execute a code block multiple times. However, the repeat loop doesn't have any condition to terminate the lYou can use the repeat loop in R to execute a block of code multiple times. However, the repeat loop does not have any condition to terminate the loop. You need to put an exit condition implicitly with a break statement inside the loop.

The syntax of repeat loop is:

repeat {
      # statements
      if(stop_condition) {
          break
      }
  }

Here, we have used the repeat keyword to create a repeat loop. It is different from the for and while loop because it does not use a predefined condition to exit from the loop.


Example 1: R repeat Loop

Let's see an example that will print numbers using a repeat loop and will execute until the break statement is executed.

x = 1

# Repeat loop
repeat {

    print(x)
    
    # Break statement to terminate if x > 4
    if (x > 4) {
        break
    } 
    
    # Increment x by 1
    x = x + 1
    
}

Output

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

Here, we have used a repeat loop to print numbers from 1 to 5. We have used an if statement to provide a breaking condition which breaks the loop if the value of x is greater than 4.


Example 2: Infinite repeat Loop

If you fail to put a break statement inside a repeat loop, it will lead to an infinite loop. For example,

x = 1
sum = 0

# Repeat loop
repeat {

    # Calculate sum
    sum = sum + x
    
    # Print sum
    print(sum)
    
    # Increment x by 1
    x = x + 1
    
}

Output

[1] 1
[1] 3
[1] 6
[1] 10
.
.
.

In the above program, since we have not included any break statement with an exit condition, the program prints the sum of numbers infinitely.


Example 3: repeat Loop with next Statement

You can also use a next statement inside a repeat loop to skip an iteration. For example,

x = 1

repeat {
    
    # Break if x = 4
    if ( x == 4) {
        break
    } 
    
    # Skip if x == 2
    if ( x == 2 ) {
        # Increment x by 1 and skip
        x = x + 1
        next
    }
    
    # Print x and increment x by 1
    print(x)
    x = x + 1
    
}

Output

[1] 1
[1] 3

Here, we have a repeat loop where we break the loop if x is equal to 4. We skip the iteration where x becomes equal to 2.

To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R repeat Loop
Viewing all 12098 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>