Quantcast
Channel: R-bloggers
Viewing all 12100 articles
Browse latest View live

How to change the column positions in R?

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to change the column positions in R? appeared first on Data Science Tutorials

How to change the column positions in R, you can rearrange the columns in a data frame by using the relocate() function from the dplyr package.

The following techniques can be used to alter the column positions.

Method 1: Move One Column to Front

move ‘x’ column to the front

df %>% relocate(x)

Method 2: Move Several Columns to Front

move ‘x’ and ‘y’ columns to the front

Best Books on Data Science with Python – Data Science Tutorials

df %>% relocate(x, y)

Method 3: Place Column After Another Column in New Position

move ‘x’ column to the position after ‘y’ column

df %>% relocate(x, .after=y)

Method 4: Place Column Before Another Column by Moving the Column

move ‘x’ column to position before ‘y’ column

df %>% relocate(x, .before=y)

The examples that follow demonstrate how to use each technique with the given data frame.

Artificial Intelligence Examples-Quick View – Data Science Tutorials

Let’s make a dataset

df <- data.frame(team=c('P1', 'P1', 'P1', 'P1', 'P2', 'P2', 'P2'),
points=c(110, 112, 123, 154, 215, 146, 87),
assists=c(81, 75, 22, 33, 52, 29, 70),
rebounds=c(46, 56, 18, 19, 87, 80, 93))

Now we can view the dataset

df
   team points assists rebounds
1   P1    110      81       46
2   P1    112      75       56
3   P1    123      22       18
4   P1    154      33       19
5   P2    215      52       87
6   P2    146      29       80
7   P2     87      70       93

Example 1: Move One Column to Front

The relocate() function can be used to move one column to the front as demonstrated by the code below.

Best Books to Learn R Programming – Data Science Tutorials

column “assists” to the front

df %>% relocate(assists)
   assists team points rebounds
1      81   P1    110       46
2      75   P1    112       56
3      22   P1    123       18
4      33   P1    154       19
5      52   P2    215       87
6      29   P2    146       80
7      70   P2     87       93

Example 2: Move a few columns forward

The relocate() function can be used to advance multiple columns by using the following code.

Best Data Science YouTube Tutorials Free to Learn – Data Science Tutorials

shift “points” and “assistances” to the front

df %>% relocate(points, assists)
  points assists team rebounds
1    110      81   P1       46
2    112      75   P1       56
3    123      22   P1       18
4    154      33   P1       19
5    215      52   P2       87
6    146      29   P2       80
7     87      70   P2       93

Example 3: Place Column After Another Column in New Position

The relocate() function can be used to position one column behind another column by using the following code.

5 Free Books to Learn Statistics For Data Science – Data Science Tutorials

place the “team” column after the “assistances” column

df %>% relocate(team, .after=assists)
    points assists team rebounds
1    110      81   P1       46
2    112      75   P1       56
3    123      22   P1       18
4    154      33   P1       19
5    215      52   P2       87
6    146      29   P2       80
7     87      70   P2       93

Example 4: Place Column Before Another Column by Moving the Column

The relocate() function can be used to move one column ahead of another column by providing the following code.

Free Best Online Course For Statistics – Data Science Tutorials

place the “team” column before the “rebounds” column.

df %>% relocate(team, .before=rebounds)
   points assists team rebounds
1    110      81   P1       46
2    112      75   P1       56
3    123      22   P1       18
4    154      33   P1       19
5    215      52   P2       87
6    146      29   P2       80
7     87      70   P2       93
.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post How to change the column positions in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to change the column positions in R?

Beta version of NIMBLE with automatic differentiation, including HMC sampling and Laplace approximation

$
0
0
[This article was first published on R – NIMBLE, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We’re excited to announce that NIMBLE now supports automatic differentiation (AD), also known as algorithmic differentiation, in a beta version available on our website. In this beta version, NIMBLE now provides:

  • Hamiltonian Monte Carlo (HMC) sampling for an entire parameter vector or arbitrary subsets of the parameter vector (i.e., combined with other samplers for the remaining parameters). 
  • Laplace approximation for approximate integration over latent states in a model, allowing maximum likelihood estimation and MCMC based on the marginal likelihood (via the RW_llFunction samplers).
  • The ability for users and algorithm developers to write nimbleFunctions that calculate derivatives of functions, including many but not all mathematical operations that are supported in the NIMBLE language.

We’re making this beta release available to allow our users to test and evaluate the AD functionality and the new algorithms, but it is not recommended for production use at this stage. So please give it a try, and let us know of any problems or suggestions you have, either via the nimble-users list, bug reports to our GitHub repository, or email to nimble.stats@gmail.com

You can download the beta versionand view an extensive draft manual for the AD functionality.

We plan to release this functionality in the next NIMBLE release on CRAN in the coming months. 

To leave a comment for the author, please follow the link and comment on their blog: R – NIMBLE.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Beta version of NIMBLE with automatic differentiation, including HMC sampling and Laplace approximation

How to Change Background Color in ggplot2?

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Change Background Color in ggplot2? appeared first on finnstats.

How to Change Background Color in ggplot2, To alter the background color of different elements in a ggplot2 plot, use the syntax below.

p + theme(panel.background = element_rect(fill = 'lightblue', color = 'purple'),panel.grid.major = element_line(color = 'red', linetype = 'dotted'),panel.grid.minor = element_line(color = 'green', size = 2))

As an alternative, you can utilize built-in ggplot2 themes to have the backdrop color changed for you. The following are some of the most typical themes.

p + theme_bw() #white background and grey gridlinesp + theme_minimal() #no background annotationsp + theme_classic() #axis lines but no gridlines

These examples demonstrate how to apply this syntax in real-world situations.

Example 1: Specify Custom Background Color

The code below demonstrates how to make a straightforward scatterplot in ggplot2 with the standard grey background.

library(ggplot2)

Let’s create a data frame

df <- data.frame(x=c(1, 3, 3, 4, 5, 5, 6, 9, 12, 15),                 y=c(13, 14, 14, 12, 17, 21, 22, 28, 30, 31))df    x  y1   1 222   2 243   3 204   4 105   5 376   6 257   7 208   8 189   9 1210 10 15

Now let’s create a scatterplot

p <- ggplot(df, aes(x=x, y=y)) +       geom_point()

Now we can display the scatterplot

p

The panel’s background color, as well as the major and minor gridlines, can all be changed using the following code.

p + theme(panel.background = element_rect(fill = 'lightblue', color = 'purple'),          panel.grid.major = element_line(color = 'red', linetype = 'dotted'),          panel.grid.minor = element_line(color = 'green', size = 2))

Example 2: Change the background color using the built-in theme

The following code demonstrates how to use different pre-built ggplot2 themes to modify the plots’ backgrounds automatically.

p + theme_bw() #white background and grey gridlines
p + theme_minimal() #no background annotations
p + theme_classic() #axis lines but no gridlines
.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { }#mailpoet_form_3 form { margin-bottom: 0; }#mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; }#mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; }#mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; }#mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; }#mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; }#mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; }#mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; }#mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; }#mailpoet_form_3 .mailpoet_checkbox { }#mailpoet_form_3 .mailpoet_submit { }#mailpoet_form_3 .mailpoet_divider { }#mailpoet_form_3 .mailpoet_message { }#mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; }#mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; }#mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

To read more visit How to Change Background Color in ggplot2?.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post How to Change Background Color in ggplot2? appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Change Background Color in ggplot2?

ggplot2 Transparent Background Quick Guide

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post ggplot2 Transparent Background Quick Guide appeared first on finnstats.

ggplots Transparent Background, the ggplot2 syntax for adding a translucent background to a plot is as follows.

p +  theme(    panel.background = element_rect(fill='transparent'), #transparent panel bg    plot.background = element_rect(fill='transparent', color=NA), #transparent plot bg    panel.grid.major = element_blank(), #remove major gridlines    panel.grid.minor = element_blank(), #remove minor gridlines    legend.background = element_rect(fill='transparent'), #transparent legend bg    legend.box.background = element_rect(fill='transparent') #transparent legend panel  )

Make sure to indicate that the backdrop should be translucent when exporting the plot with ggsave().

ggsave('myplot.png', p, bg='transparent')

The usage of this syntax in practice is demonstrated by the example that follows.

How to Use ggplot2 Transparent Background

The ggplot2 code below demonstrates how to make a straightforward grouped boxplot:

library(ggplot2)

Make this illustration repeatable

set.seed(123)

Now we can create a dataset

df<- data.frame(team=rep(c('P1', 'P2', 'P3'), each=50),program=rep(c('LOW', 'MEDIUM'), each=25),values=seq(1:150)+sample(1:100, 150, replace=TRUE))head(df)team program values1   P1     LOW     842   P1     LOW     583   P1     LOW     424   P1     LOW     725   P1     LOW      66   P1     LOW     46

Yes, let’s create a boxplot

ggplot(df, aes(x=team, y=values, fill=program)) +  geom_boxplot()

To give the plot a transparent background, we can use the following code.

p <- ggplot(df, aes(x=team, y=values, fill=program)) +       geom_boxplot() +       theme(         panel.background = element_rect(fill='transparent'),         plot.background = element_rect(fill='transparent', color=NA),         panel.grid.major = element_blank(),         panel.grid.minor = element_blank(),         legend.background = element_rect(fill='transparent'),         legend.box.background = element_rect(fill='transparent')       )

Now we can display the boxplot

p

We can then export this plot to a PNG file, specifying that the background should be transparent in the exported image.

ggsave(‘boxplot.png’, p, bg=’transparent’)

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { }#mailpoet_form_3 form { margin-bottom: 0; }#mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; }#mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; }#mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; }#mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; }#mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; }#mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; }#mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; }#mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; }#mailpoet_form_3 .mailpoet_checkbox { }#mailpoet_form_3 .mailpoet_submit { }#mailpoet_form_3 .mailpoet_divider { }#mailpoet_form_3 .mailpoet_message { }#mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; }#mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; }#mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

To read more visit ggplot2 Transparent Background Quick Guide.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post ggplot2 Transparent Background Quick Guide appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: ggplot2 Transparent Background Quick Guide

A Side-by-Side Boxplot in R: How to Do It

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post A Side-by-Side Boxplot in R: How to Do It appeared first on Data Science Tutorials

A Side-by-Side Boxplot in R, when a data point or dataset is displayed as a graph, such as a vertical or horizontal boxplot, rather than as a list of numbers, it is frequently much simpler to spot patterns in the data.

There are many distinct kinds of graphs, and each one can display various kinds of connections and patterns.

Descriptive statistics vs Inferential statistics: Guide – Data Science Tutorials

A graph that displays more information than just the location of each value or numerical variable within the sample size is the base R boxplot.

Comparative boxplots

A side-by-side boxplot R gives the user an easy way to compare the features of different data sets. The maximum, minimum, range, center, quartiles, interquartile range, variance, and skewness are among these characteristics.

It can display the connections between two or more related dataset examples or between each data point in a single data set.

How to Join Data Frames for different column names in R (datasciencetut.com)

This kind of graph takes the shape of a box displaying the quartiles and lines displaying the remaining range of the data set.

The visual comparison, which lets you examine things like boxplot outliers, the lower whisker, sample size, log scale, and other graphical features, can speak volumes when used to compare comparable data sets.

A boxplot example that is side-by-side

The boxplot() function, which takes the form of boxplot(data sets), is used to create a side-by-side boxplot graph of the data sets it is applied to in order to create a vertical or horizontal boxplot in R.

Additional alternative parameters for this function include r boxplot options.

Statistical test assumptions and requirements – Data Science Tutorials

main – the main title of the breath.names – labels for each of the data sets.xlab – label before the x-axis,ylab – label for the y-axiscol – color of the boxes.border – color of the border.horizontal – determines the orientation of the graph.notch – the appearance of the boxes.

Boxplot r

Here is a simple illustration of the boxplot() function. Here the values of x are evenly distributed. If you run this code, you will see a balanced boxplot graph.

Similarity Measure Between Two Populations-Brunner Munzel Test – Data Science Tutorials

set.seed(123)x = 1:10boxplot(x)
y = c(1,4,5,6,9)boxplot(y)

Here is a straightforward example of the boxplot() function, with the x values-centered. If you run this code, you’ll see a boxplot graph with a somewhat smaller box than the one seen above.

Applications

There are several uses for R’s boxplot function. Here is an example of the code for comparing the fuel efficiency of 4 and 8-cylinder automobiles.

R side-by-side boxplot creation instructions

The top two boxplot() functions the two graphs side by side. The bottom boxplot() function puts both boxplots in the same graph.

What is Ad Hoc Analysis? – Data Science Tutorials

It also illustrates some of the optional parameters of this function that you can use when learning how to create a boxplot in R.

cyl4 = mtcars$mpg[which(mtcars$cyl==4)]cyl8 = mtcars$mpg[which(mtcars$cyl==8)]par(mfrow=c(1,2))boxplot(cyl4)boxplot(cyl8)par(mfrow=c(1,1))boxplot(cyl4,cyl8,main = "4 cylinders versus 8",ylab = "Miles per gallon",names = c("4 cylinders", "8 cylinders"))

The boxplot() function is an extremely useful graphing tool that many programming languages lack. It serves as an example of why R is a useful tool in data science.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { }#mailpoet_form_1 form { margin-bottom: 0; }#mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; }#mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; }#mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; }#mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; }#mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; }#mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; }#mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; }#mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; }#mailpoet_form_1 .mailpoet_checkbox { }#mailpoet_form_1 .mailpoet_submit { }#mailpoet_form_1 .mailpoet_divider { }#mailpoet_form_1 .mailpoet_message { }#mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; }#mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post A Side-by-Side Boxplot in R: How to Do It appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: A Side-by-Side Boxplot in R: How to Do It

Goats do room

$
0
0
[This article was first published on R – Xi'an's Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The riddle of the week is about 10 goats sequentially moving to their room, which they have chosen at random and independently (among ten rooms), unless another goat already occupies the room, in which case they move to the first free room with a higher number or fail. What is the probability that all goats end up in a room?

Coding the experiment is straightforward:

g=sample(1:N,N,rep=TRUE)o=0*gfor(i in 1:N){    if(min(o[g[i]:N])){f=f+1;break()    }else{      o[min(which(!o[g[i]:N]))+g[i]-1]=1    }}}

returning an estimated probability of approximately 0.764.

As I had some free time during the early mornings at ISBA 2022, I tried to reformulate the question as a continuous event on uniform order statistics, turning to be at most one uniform larger than (N-1)/N, at most two larger than (N-2)/N, and so on… Asking the question on math.stackexchange quickly produced an answer that reversed engineered my formulation back to the goats (or parking lot), with a generic probability of

\dfrac{(N+1)^{N-1}}{N^N}

which of course coincides with the Monte Carlo approximation!

As an aside, I once drank South-African wines named Goats-do-Roam and Goat-Roti at my friends Jim and Maria’s place,  and they were quite enjoyable!

To leave a comment for the author, please follow the link and comment on their blog: R – Xi'an's Og.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Goats do room

RObservations #35 : Predicting Rubik’s Cube Rotations with CNNs

$
0
0
[This article was first published on r – bensstats, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Disclaimer: While working on this project on my local machine I noticed that the code was making my computer heat up. To avoid the risk of overheating my computer I opted to use a Kaggle notebook. As a bonus, I got to use some GPU computing which made training this model much faster than it would be on my machine! Feel free to run the code on your machine or fork the notebook!

Introduction

In my previous blog I explored sentiment prediction by using LSTM networks and its implementation with keras and R. In this blog I am going to share how to predict rotation of Rubiks cubes with convolutional neural networks (CNNs). For this challenge, I had the opportunity to do some basic image preprocessing and construct a CNN which predicts and continuous output as opposed to a categorical one which is more common among CNN application examples. Since the goal is to predict a continuous value, the aim is to reduce the margin of error between the predicted and the true value to be as small as possible.

The Data

The data consists of two folders. The training folder has a .csv file lists the file names of each rubiks cube image among the training images and their respective rotation. The images consist of 5000 512×512 pixel color (RGB) images of rotated rubiks cubes. The other folder has testing images whose angle of rotation are not given. Since the test data does not have any labels, for our purposes it is not going to be very helpful. So for this blog we are going to be working with just the training data and its labels.

The labels for the training data looks like this:

training_labels <- readr::read_csv("../input/rubix-cube/training/training/labels.csv") head(training_labels)
filenamexRot
000000.jpg336.8389
000001.jpg148.4844
000002.jpg244.8217
000003.jpg222.7006
000004.jpg172.3581
000005.jpg205.6921

Thanks to the imager package it is possible to convert the images into matrix form which can then be converted into an array which keras likes. One of the questions that I encountered was related to converting a list of 3D arrays to s 4D array, which I was able to figure out thanks to this stackoverflow question. While the solution is not eloquent, it works.

Due to the size of the dataset, after preprocessing the data needed to be converted and used in groups and the model needed to be trained in even smaller batches. An example for processing the first 3 images in the dataset would be:

library(tidyverse)library(imager)images<- lapply(    training_labels[["filename"]][1:3],    function(x) paste0("../input/rubix-cube/training/training/images/",x) %>%       load.image() %>%       as.cimg()) %>%     lapply(function(x) x[,,,])  # Source: https://stackoverflow.com/questions/62060430/image-list-of-3d-arrays-to-one-4d-arrayimages_array<-array(NA, dim = c(length(images), 512, 512, 3))  for(j in 1:length(images)){    images_array[j,,,]<-images[[j]]  }

They can also be plotted visually with purrr:

par(mfrow=c(1,3))images_array[1:3,,,] %>%   purrr::array_tree(1) %>%  purrr::set_names(training_labels[["xRot"]][1:3]) %>%   purrr::map(as.raster) %>%  purrr::iwalk(~{plot(.x); title(.y)})

With this, the images are preprocessed and able to be used for training our model.

The Model

As far as modelling is concerned, I created a convolutional neural network where the first layer and the input shape matches the dimensions of the images. The subsequent layers are pretty pretty much follow the code used on RStudio’s website with their CNNs example. As far as the loss function is concerned I opted for mean squared error, and have it and mean absolute value as a metrics.

library(keras)model <- keras_model_sequential() %>%   layer_conv_2d(filters = 512,                 kernel_size = c(3,3),                 activation = "relu",                 input_shape = c(512,512,3)) %>%   layer_max_pooling_2d(pool_size = c(2,2)) %>%   layer_conv_2d(filters = 256, kernel_size = c(3,3), activation = "relu") %>%   layer_max_pooling_2d(pool_size = c(2,2)) %>%   layer_conv_2d(filters = 128, kernel_size = c(3,3), activation = "relu") %>%   layer_flatten() %>%   layer_dense(units = 64, activation = "relu") %>%   layer_dense(units = 1, activation = "relu")# Compile the model model %>% compile(  optimizer = "adam",  loss = "mean_squared_error",  metrics = c("mean_squared_error",              "mean_absolute_error"))

Due to the magnitude in size of the data, the data cannot be preprocessed and trained in a single step. In leiu of this, the data is grouped and trained in groups of 100 images with batch sizes being 2.

set.seed(1234)# Using chunkshistory<-list()for (i in 0:49){    start_index <- i*100+1  end_index <- (i+1)*100  images<- lapply(    training_labels[["filename"]][start_index:end_index],    function(x) paste0("../input/rubix-cube/training/training/images/",x) %>%       load.image() %>%       as.cimg()) %>%     lapply(function(x) x[,,,])    # Source: https://stackoverflow.com/questions/62060430/image-list-of-3d-arrays-to-one-4d-array  images_array<-array(NA, dim = c(length(images), 512, 512, 3))  for(j in 1:length(images)){    images_array[j,,,]<-images[[j]]  }             # Split data into train-test groups  labels <- training_labels[["xRot"]][start_index:end_index]    smp_size <- floor(0.75 * length(images))  train_ind <- sample(seq_len(length(images)), size = smp_size)  train_x <- images_array[train_ind,,,]  test_x <- images_array[-train_ind,,,]  train_y <- labels[train_ind]  test_y<- labels[-train_ind]    # training the model  history[[i+1]]<- model %>% fit(x=train_x,                                 y=train_y,                                 epochs=10,                                 batch_size=2,                                 verbose = getOption("keras.fit_verbose", default = 1),                                 validation_split = 0.25,                                 validation_data = list(test_x,test_y))  # Free up unused RAM  gc()}history[[50]]Final epoch (plot to see history):                   loss: 2.339     mean_squared_error: 2.339    mean_absolute_error: 1.303               val_loss: 6.561 val_mean_squared_error: 6.561val_mean_absolute_error: 1.865 plot(history[[50]])

From the model history’s final interation, the validation MSE which isn’t bad. But if we want to make something more production worthy, a better model is definitely required.

If you know how to make this model better, or know of a better approach, please let me know! I would love to learn how to get better at making machine learning models!

Conclusion

There we have it! It was really interesting getting to preprocess images and deal with the quirks of having to deal with processing limitations and still managing to train the model. I will definitely keep this blog handy for my next image classification project.

Thank you for reading!

Want to see more of my content?

Be sure to subscribe and never miss an update!

Email Address:

Subscribe

To leave a comment for the author, please follow the link and comment on their blog: r – bensstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: RObservations #35 : Predicting Rubik’s Cube Rotations with CNNs

Expanding a polynomial with ‘caracas’, part 2

$
0
0
[This article was first published on Saturn Elephant, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Last month, I posted an article showing a way to expand a polynomial in R when the coefficients of the polynomial contain some literal values, with the help of the caracas package. Today I wanted to apply it with a polynomial expression having about 500 characters, and highly factorized. The method took more than 30 minutes, so I looked for a more efficient one.

Thanks to some help on StackOverflow, I came to the following method which is more efficient. It consists in splitting the expression according to its additive terms and to work on each term, instead of expanding the whole polynomial. In the example below I take the polynomial expression defining the isosurface equation of a Solid Möbius strip.

library(caracas)sympy <- get_sympy()# define the variables x,y,z and the constants a,bdef_sym(x, y, z, a, b)# define expression expr <- sympy$parse_expr(  "((x*x+y*y+1)*(a*x*x+b*y*y)+z*z*(b*x*x+a*y*y)-2*(a-b)*x*y*z-a*b*(x*x+y*y))**2-4*(x*x+y*y)*(a*x*x+b*y*y-x*y*z*(a-b))**2") # extraction of monomials in the 'povray' listpovray <- list()terms <- sympy$Add$make_args(expr)for(term in terms){  f <- term$expand()  fterms <- sympy$Add$make_args(f)  for(fterm in fterms){    decomp  <- fterm$as_coeff_mul(x$pyobj, y$pyobj, z$pyobj)    coef    <- decomp[[1]]    mono    <- decomp[[2]]    polexpr <- sympy$Mul$fromiter(mono)    poly    <- polexpr$as_poly(x$pyobj, y$pyobj, z$pyobj)    degree  <- toString(poly$monoms()[[1]])    if(degree %in% names(povray)){      povray[[degree]] <- sympy$Add(povray[[degree]], coef)    }else{      povray[[degree]] <- coef    }  }}polynomial <- vapply(names(povray), function(degree){  coeff <- povray[[degree]] |>    gsub("([ab])\\*\\*(\\d+)", "pow(\\1,\\2)", x = _)  sprintf("xyz(%s): %s,", degree, coeff)}, character(1L))cat(polynomial, sep = "\n", file = "SolidMobiusStrip.txt")

At the last step I use gsub to replace the powers like a**2 to their POV-Ray syntax pow(a,2). The above code writes this POV-Ray code:

xyz(4, 0, 0): pow(a,2)*pow(b,2) - 2*pow(a,2)*b + pow(a,2),xyz(8, 0, 0): pow(a,2),xyz(0, 4, 0): pow(a,2)*pow(b,2) - 2*a*pow(b,2) + pow(b,2),xyz(0, 8, 0): pow(b,2),xyz(6, 0, 0): -2*pow(a,2)*b - 2*pow(a,2),xyz(0, 6, 0): -2*a*pow(b,2) - 2*pow(b,2),xyz(4, 4, 0): pow(a,2) + 4*a*b + pow(b,2),xyz(0, 4, 4): pow(a,2),xyz(4, 0, 4): pow(b,2),xyz(4, 2, 0): -4*pow(a,2)*b - 2*pow(a,2) - 2*a*pow(b,2) - 4*a*b,xyz(6, 2, 0): 2*pow(a,2) + 2*a*b,xyz(2, 4, 0): -2*pow(a,2)*b - 4*a*pow(b,2) - 4*a*b - 2*pow(b,2),xyz(2, 6, 0): 2*a*b + 2*pow(b,2),xyz(1, 3, 3): -4*pow(a,2) + 4*a*b,xyz(3, 1, 1): 4*pow(a,2)*b - 4*pow(a,2) - 4*a*pow(b,2) + 4*a*b,xyz(5, 1, 1): 4*pow(a,2) - 4*a*b,xyz(3, 3, 1): 4*pow(a,2) - 4*pow(b,2),xyz(2, 2, 0): 2*pow(a,2)*pow(b,2) - 2*pow(a,2)*b - 2*a*pow(b,2) + 2*a*b,xyz(4, 0, 2): -2*a*pow(b,2) + 2*a*b,xyz(0, 4, 2): -2*pow(a,2)*b + 2*a*b,xyz(6, 0, 2): 2*a*b,xyz(0, 6, 2): 2*a*b,xyz(2, 4, 2): -2*pow(a,2) + 10*a*b - 2*pow(b,2),xyz(4, 2, 2): -2*pow(a,2) + 10*a*b - 2*pow(b,2),xyz(1, 3, 1): 4*pow(a,2)*b - 4*a*pow(b,2) - 4*a*b + 4*pow(b,2),xyz(1, 5, 1): 4*a*b - 4*pow(b,2),xyz(3, 1, 3): -4*a*b + 4*pow(b,2),xyz(2, 2, 2): -2*pow(a,2)*b + 6*pow(a,2) - 2*a*pow(b,2) - 8*a*b + 6*pow(b,2),xyz(2, 2, 4): 2*a*b,

This is very fast for this example, but it still took 20 minutes with my case, which is a slight modification of an animation by ‘ICN5D’; here it is:

The difference with the original animation is that this one uses an isoclinic rotation for the animation.

To leave a comment for the author, please follow the link and comment on their blog: Saturn Elephant.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Expanding a polynomial with ‘caracas’, part 2

Best Books to learn Tensorflow

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Best Books to learn Tensorflow appeared first on Data Science Tutorials

Best Books to learn Tensorflow, Are you interested in learning Tensorflow and searching for the best resources to do so? If so, you are in the proper location. We compiled a comprehensive list of the top Tensorflow learning materials in this article.

So take a moment to look for the best resources for learning Tensorflow. This article can be bookmarked so that you can access it later.

Best Books on Data Science with Python – Data Science Tutorials

Best Books to learn Tensorflow

So take a moment to look for the best resources for learning Tensorflow. This article can be bookmarked so that you can access it later.

1. Hands-On Neural Networks with TensorFlow 2.0

Key Features:-

1. Recognize the fundamentals of machine learning and the power of deep learning and neural networks.

2. Examine the TensorFlow framework’s structure and learn how to switch to TF 2.0.

3. Create neural network-based solutions for any deep learning issue using TF 2.0.

Best Books to Learn R Programming – Data Science Tutorials

2. TensorFlow 2 Reinforcement Learning Cookbook

Key Features:-

1. Create and implement solutions for production pipelines, goods, and services based on deep reinforcement learning

2. Examine well-liked reinforcement learning algorithms including the actor-critic technique, SARSA, and Q-learning.

3. Create and modify RL-based programs to carry out tasks in the real world.

3. TensorFlow 2 Pocket Reference

Key Features:-

1. Be familiar with TensorFlow model patterns and ML workflows’ best practices.

2. When creating TensorFlow models and workflows, use code snippets as templates.

3. Reduce development time by utilizing the TensorFlow Hub to integrate prebuilt models.

4. Select data ingestion, training paradigms, model storing, and inferencing design options that are well-informed.

5. Talk about typical situations such as model design, data import methodology, model training, and model tuning.

5 Free Books to Learn Statistics For Data Science – Data Science Tutorials

4. Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2

Key Features:-

1. The third edition of the highly regarded, best-selling Python machine learning book.

2. You may go deeper into the theory and application of Python machine learning thanks to clear and understandable explanations.

3. Completely revised and expanded to include TensorFlow 2, models for generative adversarial networks, reinforcement learning, and best practices

5. Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, Natural Language Processing, and Transformers Using TensorFlow

Key Features:-

1. Learn about and become proficient in the following fundamental ideas: perceptrons, gradient-based learning, sigmoid neurons, and backpropagation.

2. Examine how DL frameworks facilitate the creation of increasingly intricate and practical neural networks.

3. Understand how convolutional neural networks (CNNs) are revolutionizing image analysis and categorization.

4. Use long short-term memory (LSTM) and recurrent neural networks (RNNs) to process text and other variable-length sequences.

5. Master NLP using Transformer architecture and sequence-to-sequence networks

6. Create software for image captioning and natural language translation.

6. TinyML: Machine Learning with TensorFlow Lite

Key Features:-

1 Create a magic wand that responds to movements, a camera that can detect individuals, and a voice recognizer.

2. Use ultra-low-power microcontrollers and Arduino.

3. Discover the fundamentals of machine learning and how to create your own models.

4. Develop models that can comprehend data from accelerometers, images, and audio.

Test for Normal Distribution in R-Quick Guide – Data Science Tutorials

7. TensorFlow 2.0 Computer Vision Cookbook

Key Features:-

1. Create, train, and use TensorFlow-based deep learning algorithms for computer vision applications. 2. Find workable solutions to common problems that arise when creating computer vision models.

3. Make it possible for machines to comprehend at the level of humans in order to identify and evaluate digital photos and movies.

8. TensorFlow for Deep Learning: From Linear Regression to Reinforcement Learning

Key Features:-

1. Understand the principles of TensorFlow, including how to carry out simple computations

2. Create straightforward educational programs to comprehend their mathematical underpinnings

3. Explore completely connected deep networks that are utilized by countless applications.

4. Use hyperparameter optimization to transform prototypes into superior models.

5. Use convolutional neural networks to process images

9. Machine Learning Using TensorFlow Cookbook

Key Features:-

1. Deep Learning solutions from Google Developer Experts and Kaggle Masters

2. Understand the core concepts, such as variables, matrices, and data sources.

3. Acquire cutting-edge skills to improve the speed and precision of your algorithms.

11. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow

Key Features:-

1. Examine the landscape of machine learning, especially neural networks

2. Follow a sample machine learning project from beginning to conclusion using Scikit-Learn.

3. Examine several training models, such as ensemble techniques, random forests, decision trees, and support vector machines.

4. Create and train neural networks using the Tensor Flow framework.

5. Learn about several neural network architectures, such as convolutional nets, recurrent nets, and deep reinforcement learning.

6. Acquire knowledge of deep neural network scaling and training methods.

This is the end of the list. I’m hoping that these tools will assist you in learning and mastering Tensorflow. I advise you to save this post as a bookmark for further use.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post Best Books to learn Tensorflow appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Best Books to learn Tensorflow

Create groups based on the lowest and highest values in R?

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Create groups based on the lowest and highest values in R? appeared first on finnstats.

Create groups based on the lowest and highest values in R, to divide an input vector into n buckets, use the ntile() function in the R dplyr package.

The basic syntax used by this function is as follows.

ntile(x, n)

where:

x: Input vector

n: Number of buckets

Note: The bucket sizes might vary by up to one.

Create groups based on the lowest and highest values in R

The practical application of this function is demonstrated in the examples that follow.

Example 1: Use ntile() with a Vector

The ntile() function can be used to divide a vector of 11 elements into 5 groups using the following code.

library(dplyr)

Let’s create a vector

x <- c(10, 13, 14, 26, 27, 18, 11, 12, 15, 20, 13)
x
[1] 10 13 14 26 27 18 11 12 15 20 13

and divide the vector into five buckets.

ntile(x, 5)
[1] 1 2 3 5 5 4 1 1 3 4 2

We can see from the result that each component of the original vector has been assigned to one of five bins.

The bucket with the fewest values is number 1, while the bucket with the biggest values is number 5.

For instance:

Bucket 1 is given the 10, 11, and 12 values with the lowest values.

The bucket with the highest values of 26 and 27 is number 5.

Example 2: Use ntile() with a Data Frame

Consider the following R data frame, which displays the points scored by different basketball players:

Let’s create a data frame

df <- data.frame(player=LETTERS[1:9],
                 points=c(102, 109, 57, 122, 824, 528, 125, 159, 195))

Now we can view the data frame

df
   player points
1      A    102
2      B    109
3      C     57
4      D    122
5      E    824
6      F    528
7      G    125
8      H    159
9      I    195

The following code demonstrates how to add a new column to the data frame using the ntile() function that places each player into one of three buckets based on their total number of points.

add a new column that sorts players according to their point totals.

df$bucket <- ntile(df$points, 3)

Let’s view the updated data frame

df
  player points bucket
1      A    102      1
2      B    109      1
3      C     57      1
4      D    122      2
5      E    824      3
6      F    528      3
7      G    125      2
8      H    159      2
9      I    195      3

Each player is given a value between 1 and 3 in the new bucket column.

Players who have the fewest points are assigned a value of 1, while those who have the most points are assigned a value of 3.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { } #mailpoet_form_3 form { margin-bottom: 0; } #mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; } #mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; } #mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; } #mailpoet_form_3 .mailpoet_checkbox { } #mailpoet_form_3 .mailpoet_submit { } #mailpoet_form_3 .mailpoet_divider { } #mailpoet_form_3 .mailpoet_message { } #mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; } #mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; } #mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

To read more visit Create groups based on the lowest and highest values in R?.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post Create groups based on the lowest and highest values in R? appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Create groups based on the lowest and highest values in R?

Part 1 of 3: 300+ milestone for Big Book of R

$
0
0
[This article was first published on R programming – Oscar Baruffa, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We’ve done it folks!

Over 300 free R programming books are now available at www.BigBookofR.com.

Of the 343 entries available, 20 are paid products and the rest are all 100% free. Thanks to all the authors, contributors, users readers and cheerleaders who are helping build a rich ecosystem of material.

This latest edition of new entries is a whopper. 35 new entries in one go. To do them justice, I’m highlighting them in 3 separate posts over the next few days (that’s the plan anyway!).

Revisit this page later or sign up to my newsletter to be notified of the next two posts.

If you’re on twitter, you might want to follow the Big Book of R twitter account which posts a random entry from the collection every couple of hours.


A big thank you to Ondrej Pekacek, Lluís Revilla, Daniel Sánchez, Soumya Ray for their contributions. I must also give a special thanks to a mysterious stranger identified only as “Gary”. In one fell swoop Gary submitted 26 books – the second highest contributor of all time :).

Without further ado, here’s the first 10 of the 35 entries recently added. Enjoy!

Complex Surveys: A Guide to Analysis Using R

by Thomas Lumley

Complex Surveys is a practical guide to the analysis of survey data using R, the freely available and downloadable statistical programming language. As creator of the specific survey package for R, the author provides the ultimate presentation of how to successfully use the software for analyzing data from complex surveys while also utilizing the most current data from health and social sciences studies to demonstrate the application of survey research methods in these fields.

https://www.bigbookofr.com/social-science.html#complex-surveys-a-guide-to-analysis-using-r

An introduction to psychometric theory with applications in R

by William Revelle

My course in psychometric theory, on which much of this book is based, was inspired by a course of the same name by Warren Norman. The organizational structure of this text owes a great deal to the structure of Warren’s course. Warren introduced me, as well as a generation of graduate students at the University of Michigan, to the role of theory and measurement in the study of psychology. 

https://www.bigbookofr.com/social-science.html#an-introduction-to-psychometric-theory-with-applications-in-r

An Introduction to Bayesian Reasoning and Methods

by Kevin Ross

Focus on statistical inference, the process of using data analysis to draw conclusions about a population or process beyond the existing data. “Traditional” hypothesis tests and confidence intervals that you are familiar with are components of “frequestist” statistics. This book will introduce aspects of “Bayesian” statistics. We will focus on analyzing data, developing models, drawing conclusions, and communicating results from a Bayesian perspective. We will also discuss some similarities and differences between frequentist and Bayesian approaches, and some advantages and disadvantages of each approach.

https://www.bigbookofr.com/statistics.html#an-introduction-to-bayesian-reasoning-and-methods

R bookdownplus Textbook

by Peng Zhao

‘bookdownplus’ is an extension of ‘bookdown’. It is a collection of multiple templates, which I have been collecting since years ago on the basis of LaTeX, and have been tailoring them so that I can work happily under the umbrella of ‘bookdown’. ‘bookdownplus’ helps you (and me) write varied types of books and documents. This book you are reading at the moment was exactly produced by ‘bookdownplus’.

https://www.bigbookofr.com/packages.html#r-bookdownplus-textbook

Surrogates – Gaussian process modeling, design and optimization for the applied sciences

by Robert B. Gramacy

Surrogates is a graduate textbook, or professional handbook, on topics at the interface between machine learning, spatial statistics, computer simulation, meta-modeling (i.e., emulation), design of experiments, and optimization. Experimentation through simulation, “human out-of-the-loop” statistical support, management of dynamic processes, online and real-time analysis, automation, and practical application are at the forefront.

https://www.bigbookofr.com/statistics.html#surrogates—gaussian-process-modeling-design-and-optimization-for-the-applied-sciences-1

The R Series by CRC Press

This book series reflects the recent rapid growth in the development and application of R, the programming language and software environment for statistical computing and graphics.

https://www.bigbookofr.com/other-compendiums.html#the-r-series-by-crc-press

Introduction to Computational Finance and Financial Econometrics with R

by Eric Zivot

This book is based on my University of Washington sponsored Coursera course Introduction to Computational Finance and Financial Econometrics that has been running every quarter on Coursera since 2013. This Coursera course is based on the Summer 2013 offering of my University of Washington advanced undergraduate economics course of the same name. At the time, my UW course was part of a three course summer certificate in Fundamentals of Quantitative Finance offered by the Professional Masters Program in Computational Finance & Risk Management that was video-recorded and available for online students. An edited version of this course became the Coursera course. The popularity of the course encouraged me to convert the class notes for the course into a short book.

https://www.bigbookofr.com/finance.html#introduction-to-computational-finance-and-financial-econometrics-with-r

Data Wrangling Essentials

by Mark Banghart

The R and Python communities have developed a set of tools in the tidyverse and the pandas packages respectively designed to wrangle table data. The intuitive nature of these packages makes learning to use them easy and the code easy to read and understand. These tools allow researchers to quickly and accurately complete data preparation for a wide variety of analysis. It is the application of these packages and their approaches to wrangling that are the subject of this book.

The Data Wrangling Essentials title was chosen to emphasize both the use of these new tools and the importance of the work of gathering and preparing data.

https://www.bigbookofr.com/getting-cleaning-and-wrangling-data.html#data-wrangling-essentials

Data Integration, Manipulation and Visualization of Phylogenetic Trees

by Guangchuang Yu

A guide for data integration, manipulation and visualization of phylogenetic trees using a suite of R packages, tidytree, treeio, ggtree and ggtreeExtra.

https://www.bigbookofr.com/life-sciences.html#data-integration-manipulation-and-visualization-of-phylogenetic-trees

The Saga of PLS

by Gaston Sanchez

The main motivating trigger behind this book has been my long standing obsession to understand the historical development of Partial Least Squares methods in order to find the who’s, why’s, what’s, when’s, and how’s. It is the result of an intermittent 10 year quest, tracking bits and pieces of information in order to assemble the story of such methods. Moreover, this text is my third iteration on the subject, following two of my previous works.

https://www.bigbookofr.com/statistics.html#the-saga-of-pls


That’s it for update 1 of 3. Subscribe to my newsletter to be notified of the next two updates.

#mc_embed_signup{background:#fff; clear:left; font:14px Helvetica,Arial,sans-serif; width:600px;}/* Add your own Mailchimp form style overrides in your site stylesheet or in this style block. We recommend moving this block and the preceding CSS link to the HEAD of your HTML file. */

Subscribe for updates. I write about R, data and careers.

Subscribers get a free copy of Project Management Fundamentals for Data Analysts worth $12

* indicates required
Email Address *
First Name *

(function($) {window.fnames = new Array(); window.ftypes = new Array();fnames[0]='EMAIL';ftypes[0]='email';fnames[1]='FNAME';ftypes[1]='text';}(jQuery));var $mcj = jQuery.noConflict(true);

The post Part 1 of 3: 300+ milestone for Big Book of R appeared first on Oscar Baruffa.

To leave a comment for the author, please follow the link and comment on their blog: R programming – Oscar Baruffa.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Part 1 of 3: 300+ milestone for Big Book of R

Get tickets for An introductory course in Shiny on July 11th & 13th or 12th & 14th at 30 USD

$
0
0
[This article was first published on Pachá, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

An introductory course in Shiny

This course aims to introduce people with basic R knowledge to develop interactive web applications using the Shiny framework.

The course consists of two days, one-hour session per day, where we will discuss topics such as user interface (UI), server-side logic (tables and graphs that respond to user selection), dashboard components, and the creation of modular components. Questions are super welcome!

The course will be held online (Zoom) from 17.30 to 18.30 and 19.00 to 20.00 (Eastern Time, which is New York, Boston, Toronto time), two days a week.

Previous knowledge required: Basic R (examples: reading a CSV file, transforming columns and making graphs using ggplot2).

Course organization:

  • Day 1: Building a working Shiny app, which will be of the modular kind.
  • Day 2: Good practice and robustness checks to point at creating an easy to maintain real-life app.

Maximum number of attendees: 5.

Buy tickets at buymeacoffee.com

An introductory course in Shiny on July 18th and 20th (17.30 to 18.30): https://www.buymeacoffee.com/pacha/e/79504

An introductory course in Shiny on July 18th and 20th (19.00 to 20.00): https://www.buymeacoffee.com/pacha/e/79503

To leave a comment for the author, please follow the link and comment on their blog: Pachá.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Get tickets for An introductory course in Shiny on July 11th & 13th or 12th & 14th at 30 USD

TBATS Time Series Modelling in R

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post TBATS Time Series Modelling in R appeared first on finnstats.

TBATS Time Series Modelling in R, The term “TBATS” refers to a well-liked time series forecasting technique and stands for

  1. Trigonometric seasonality
  2. Box-Cox transformation
  3. ARMA errors
  4. Trend
  5. Seasonal components

The following models can be used with and without this method

  1. Seasonality
  2. A Box-Cox transformation
  3. ARMA(p, q) process
  4. Various trends
  5. Various seasonal effects

The final model in this procedure will be the one with the lowest Akaike Information Criterion (AIC) score.

Using the tbats function from the forecast package is the simplest way to fit a TBATS model to a time series dataset in R.

How to actually use this function is demonstrated in the example that follows.

How to Fit a TBATS Time Series Modelling in R

We’ll use USAccDeaths, a built-in R dataset that contains values for the total monthly accidental fatalities in the USA from 1973 to 1978, for this example.

Now we can view the USAccDeaths dataset

USAccDeaths
       Jan   Feb   Mar   Apr   May   Jun   Jul   Aug   Sep   Oct   Nov   Dec
1973  9007  8106  8928  9137 10017 10826 11317 10744  9713  9938  9161  8927
1974  7750  6981  8038  8422  8714  9512 10120  9823  8743  9129  8710  8680
1975  8162  7306  8124  7870  9387  9556 10093  9620  8285  8466  8160  8034
1976  7717  7461  7767  7925  8623  8945 10078  9179  8037  8488  7874  8647
1977  7792  6957  7726  8106  8890  9299 10625  9302  8314  8850  8265  8796
1978  7836  6892  7791  8192  9115  9434 10484  9827  9110  9070  8633  9240

To fit a TBATS model to this dataset and forecast the values of upcoming months, use the following code.

library(forecast)

Yes, now we can fit the TBATS model

TBATSfit <- tbats(USAccDeaths)

Let’s use the model to make the predictions

predict <- predict(TBATSfit)

We can view the predictions     

predict
        Point Forecast     Lo 80     Hi 80    Lo 95     Hi 95
Jan 1979       8307.597  7982.943  8632.251 7811.081  8804.113
Feb 1979       7533.680  7165.539  7901.822 6970.656  8096.704
Mar 1979       8305.196  7882.740  8727.651 7659.106  8951.286
Apr 1979       8616.921  8150.753  9083.089 7903.978  9329.864
May 1979       9430.088  8924.028  9936.147 8656.137 10204.038
Jun 1979       9946.448  9403.364 10489.532 9115.873 10777.023
Jul 1979      10744.690 10167.936 11321.445 9862.621 11626.760
Aug 1979      10108.781  9499.282 10718.280 9176.632 11040.929
Sep 1979       9034.784  8395.710  9673.857 8057.405 10012.162
Oct 1979       9336.862  8668.087 10005.636 8314.060 10359.664
Nov 1979       8819.681  8124.604  9514.759 7756.652  9882.711
Dec 1979       9099.344  8376.864  9821.824 7994.407 10204.282
Jan 1980       8307.597  7563.245  9051.950 7169.208  9445.986
Feb 1980       7533.680  6769.358  8298.002 6364.750  8702.610
Mar 1980       8305.196  7513.281  9097.111 7094.067  9516.325
Apr 1980       8616.921  7800.849  9432.993 7368.847  9864.995
May 1980       9430.088  8590.590 10269.585 8146.187 10713.988
Jun 1980       9946.448  9084.125 10808.771 8627.639 11265.257
Jul 1980      10744.690  9860.776 11628.605 9392.859 12096.522
Aug 1980      10108.781  9203.160 11014.402 8723.753 11493.809
Sep 1980       9034.784  8109.000  9960.567 7618.920 10450.647
Oct 1980       9336.862  8390.331 10283.392 7889.269 10784.455
Nov 1980       8819.681  7854.387  9784.976 7343.391 10295.972
Dec 1980       9099.344  8114.135 10084.554 7592.597 10606.092

The predicted death toll for the following months is displayed along with the confidence ranges of 80% and 95%.

For instance, the forecasts are as follows for January 1979:

Deaths are expected to total 8,307.597.

[7,982.943, 8,632.251] is the 80 percent confidence interval for the number of fatalities ]

A 95 percent [7,811.081, 8,804.113] is the confidence interval for the number of deaths]

We can also plot these anticipated future values using the plot() function:

Now we can plot the predicted values

plot(forecast(TBATSfit))

Future anticipated values are represented by the blue line, while the confidence interval bounds are shown by the grey bands.

Naive Approach Forecasting Example »

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { } #mailpoet_form_3 form { margin-bottom: 0; } #mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; } #mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; } #mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; } #mailpoet_form_3 .mailpoet_checkbox { } #mailpoet_form_3 .mailpoet_submit { } #mailpoet_form_3 .mailpoet_divider { } #mailpoet_form_3 .mailpoet_message { } #mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; } #mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; } #mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post TBATS Time Series Modelling in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: TBATS Time Series Modelling in R

How to apply a transformation to multiple columns in R?

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to apply a transformation to multiple columns in R? appeared first on Data Science Tutorials

How to apply a transformation to multiple columns in R?, To apply a transformation to many columns, use R’s across() function from the dplyr package.

How to apply a transformation to multiple columns in R?

There are innumerable applications for this function, however, the following examples highlight some typical ones:

First Approach: Apply Function to Several Columns

Multiply values in col1 and col2 by 2

df %>%  mutate(across(c(col1, col2), function(x) x*2))

Second Approach: One Summary Statistic for Multiple Columns can be Calculated

calculate the mean of col1 and col2

df %>%  summarise(across(c(col1, col2), mean, na.rm=TRUE))

Third Approach: Multiple Summary Statistics to be Calculated for Multiple Columns

Calculate the mean and standard deviation for col1 and col2

df %>%  summarise(across(c(col1, col2), list(mean=mean, sd=sd), na.rm=TRUE))

The examples below demonstrate each technique using the given data frame.

Subset rows based on their integer locations

Let’s create a data frame

df <- data.frame(team=c('P1', 'P1', 'P1', 'P2', 'P2', 'P2'),
points=c(26, 22, 28, 15, 32, 28),
rebounds=c(16, 15, 16, 12, 13, 10))

Now we can view the data frame

df
   team points rebounds
1   P1     26       16
2   P1     22       15
3   P1     28       16
4   P2     15       12
5   P2     32       13
6   P2     28       10

Example 1: Apply Function to Multiple Columns

The values in the columns for points and rebounds can be multiplied by 2 using the across() function by using the following code.

library(dplyr)

Multiply by two to the values in the columns for points and rebounds.

df %>%  mutate(across(c(points, rebounds), function(x) x*2))
  team points rebounds
1   P1     52       32
2   P1     44       30
3   P1     56       32
4   P2     30       24
5   P2     64       26
6   P2     56       20

Example 2: One Summary Statistic for Multiple Columns can be Calculated

The across() function can be used to determine the mean value for both the points and rebound columns using the following sample code.

How to do Conditional Mutate in R? – Data Science Tutorials

the average value of the columns for points and rebounds.

df %>%  summarise(across(c(points, rebounds), mean, na.rm=TRUE))
    points rebounds
1 25.16667 13.66667

Be aware that we can also use the is.numeric function to have the data frame’s numeric columns generate a summary statistic automatically.

Calculate the mean value for each column of numbers in the data frame.

df %>%  summarise(across(where(is.numeric), mean, na.rm=TRUE))
  points rebounds
1 25.16667 13.66667

Example 3: Multiple Summary Statistics to be Calculated for Multiple Columns

The across() function may be used to determine the mean and standard deviation of the points and rebounds columns using the following code.

Compute the mean and standard deviation for the columns of points and rebounds.

df %>%  summarise(across(c(points, rebounds), list(mean=mean, sd=sd), na.rm=TRUE))
    points_mean points_sd rebounds_mean rebounds_sd 
1    25.16667  5.946988      13.66667     2.42212 

Now we are almost complete with dplyr package techniques. We will discuss transmute() function in an upcoming post.

How to change the column positions in R? – Data Science Tutorials

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

The post How to apply a transformation to multiple columns in R? appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to apply a transformation to multiple columns in R?

Academicons: my first quarto extension

$
0
0
[This article was first published on schochastics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I have been following the development of quarto for a while now andI am pretty excited about it. Not only its features but also its rich and detailed documentationwill make me transition from Rmarkdown to Quarto in the long run. While moving my personal webpage,I realized though that I am still missing some features. Quarto is still in its early stages so it is nosurprise that some features from Rmarkdown do not yet exist in quarto.

A few days ago, however, I noticed that a very exciting feature was added. Custom extensions.

Quarto Extensions are a powerful way to modify or extend the behavior of Quarto, and can be created and distributed by anyone

Extensions can be shortcodes, filters and, formats. All for different purposes and all very well explained in the docs.Note that extensions are, at the time of writing, a new feature of Quarto. Make sure to install at least v1.0.15 if you want to use them.

Adding social media accounts to my web page

When I transitioned my personal page to Quarto, I was missing an easy way to add my social media accounts using fontawesome icons. I was using the Hugo Apéro theme for blogdown before, and there it was just a matter of adding the usernames in the yaml header. As far as I understood, this is not possible with Quarto (yet).

So at that time, I kind of hacked my way to what I wanted using some simple lua scripts.

function twitter(handle)  local output = '<a href="https://twitter.com/' .. pandoc.utils.stringify(handle) .. '"><i class="bi bi-twitter" ></i></a>'  return pandoc.RawBlock('html', output)endfunction github(handle)  local output = '<a href="https://github.com/' .. pandoc.utils.stringify(handle) .. '"><i class="bi bi-github" ></i></a>'  return pandoc.RawBlock('html', output)endfunction scholar(handle)  local output = '<a href="https://scholar.google.de/citations?user=' .. pandoc.utils.stringify(handle) .. '&hl=en"><i class="ai ai-google-scholar" ></i></a>'  return pandoc.RawBlock('html', output)endfunction orcid(handle)  local output = '<a href="https://orcid.org/' .. pandoc.utils.stringify(handle) .. '"><i class="ai ai-orcid" ></i></a>'  return pandoc.RawBlock('html', output)end

The lua script defines shortcodes that can be used like this

{{< twitter schochastics >}}{{< github schochastics >}}{{< scholar MFlgHdcAAAAJ >}}{{< orcid 0000-0003-2952-4812 >}}

and here is what it looks like on my page.

You can find the full code of my page on github.

The academicons extension

Quarto extensions are a great way to easily add shortcodes to your quarto projects without the needof adding lua scripts to the yaml header. To add fontawesome icon support you install the extension in your project

quarto install extension quarto-ext/fontawesome

and then you can use any kind of free icon via the shortcode {{< fa >}}.

A similar library to fontawesome (but much smaller) are academicons, which providesupport for, well, academic icons. Since the library is very similar to fontawesome, it was quite straightforward tobuild a quarto extension that gives a shortcode to use the icons. To install it in your project just do

quarto install extension schochastics/academicons

and to embed an icon, you can use the {{{< ai >}}} shortcode. All available icons can be found here:

https://jpswalsh.github.io/academicons/

Here is the source code for a minimal example: example.qmd This is the output of example.qmd for HTML.

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Academicons: my first quarto extension

How to Calculate Percentiles in R

$
0
0
[This article was first published on Data Analysis in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post How to Calculate Percentiles in R appeared first on finnstats.

How to Calculate Percentiles in R, Although percentages and percentiles are different concepts, they are comparable in many ways and occasionally used interchangeably.

A percentile is the percentage of data points in a data collection that are below a given point, whereas a percentage reflects a fraction. Although they are not the same, percentages and percentile values both offer helpful details about a set of data.

Both are employed in various types of order statistics to identify various metrics and calculate probabilities within the distribution of a dataset.

They are most frequently utilized in a continuous variable’s conventional normal distribution, from the lowest data value to the biggest value.

Within a data frame or dataset, a percentile statistic and numerous other probability statistics can be used to account for each data point.

They may assist you in discovering various statistics such as mean, median, z-score, standard deviation, regression, interquartile range, outliers, correlation coefficient, and more.

The percentiles in a typical normal distribution are well-defined, making it simple to identify significant bell curve values like the 80th and 95th percentiles.

Quantiles

A data set’s three quantiles are the numbers whose percentiles correspond to the data set’s quarter points. They are specifically the 25 percent, 50 percent, and 75 percent values in the data set.

These are also referred to as quartiles, and the interquartile range is the region between the 25th and 75th percentiles. This calculation’s methodology is the same as that used to determine the percentile value.

How to Calculate Percentiles in R

How then may percentiles be found in R? Using the quantiles function in R, you may calculate a percentile. It generates the percentage with the percentile value.

x<-c(15,20,22,25,30,34,37,40,45)
quantile(x)
  0%  25%  50%  75% 100%
  15   22   30   37   45

The 0th percentile, 25th percentile, 50th percentile, 75th percentile, and 100th percentile are produced by this function’s default form.

x<-c(15,20,22,25,30,34,37,40,45)
quantile(x, probs = c(0.125,0.375,0.625,0.875))
  12.5% 37.5% 62.5% 87.5%
   20    25    34    40

The probs (probability) option, which enables you to set various percentages, is present here.

Applications.

Finding a percentile in R has a variety of uses. Here is a good illustration of a large dataset with 7,980 data points.

how to use treering data to find percentiles in R

quantile(treering)
  0%   25%   50%   75%  100% 
0.000 0.837 1.034 1.197 1.908 

The quantiles, as well as the minimum and maximum values, are shown below. It demonstrates that these tree rings have a tendency to be clustered in the middle, for example.

When the range is 1.908 and the IQR is 0.36, just around 19% of the range of the data set is covered by the IQR.

You can learn a lot about a percentage by identifying the numbers in a data set that correspond to it. It can inform you of how skewed and concentrated the values are. It serves as an illustration of the data science tool R.

QQ-plots in R: Quantile-Quantile Plots-Quick Start Guide »

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_3 .mailpoet_form { } #mailpoet_form_3 form { margin-bottom: 0; } #mailpoet_form_3 p.mailpoet_form_paragraph.last { margin-bottom: 10px; } #mailpoet_form_3 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_3 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_3 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_3 .mailpoet_form_paragraph last { margin-bottom: 0px; } #mailpoet_form_3 .mailpoet_segment_label, #mailpoet_form_3 .mailpoet_text_label, #mailpoet_form_3 .mailpoet_textarea_label, #mailpoet_form_3 .mailpoet_select_label, #mailpoet_form_3 .mailpoet_radio_label, #mailpoet_form_3 .mailpoet_checkbox_label, #mailpoet_form_3 .mailpoet_list_label, #mailpoet_form_3 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea, #mailpoet_form_3 .mailpoet_select, #mailpoet_form_3 .mailpoet_date_month, #mailpoet_form_3 .mailpoet_date_day, #mailpoet_form_3 .mailpoet_date_year, #mailpoet_form_3 .mailpoet_date { display: block; } #mailpoet_form_3 .mailpoet_text, #mailpoet_form_3 .mailpoet_textarea { width: 200px; } #mailpoet_form_3 .mailpoet_checkbox { } #mailpoet_form_3 .mailpoet_submit { } #mailpoet_form_3 .mailpoet_divider { } #mailpoet_form_3 .mailpoet_message { } #mailpoet_form_3 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_3 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; } #mailpoet_form_3 h2.mailpoet-heading { margin: 0 0 20px 0; } #mailpoet_form_3 h1.mailpoet-heading { margin: 0 0 10px; }#mailpoet_form_3{border-radius: 2px;text-align: left;}#mailpoet_form_3 form.mailpoet_form {padding: 30px;}#mailpoet_form_3{width: 100%;}#mailpoet_form_3 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_3 .mailpoet_validate_success {color: #00d084} #mailpoet_form_3 input.parsley-success {color: #00d084} #mailpoet_form_3 select.parsley-success {color: #00d084} #mailpoet_form_3 textarea.parsley-success {color: #00d084} #mailpoet_form_3 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_3 input.parsley-error {color: #cf2e2e} #mailpoet_form_3 select.parsley-error {color: #cf2e2e} #mailpoet_form_3 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_3 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_3 .parsley-required {color: #cf2e2e} #mailpoet_form_3 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_3 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_3 {background-image: none;}} @media (min-width: 500px) {#mailpoet_form_3 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_3 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
Email Address *

Check your inbox or spam folder to confirm your subscription.

If you are interested to learn more about data science, you can find more articles here finnstats.

The post How to Calculate Percentiles in R appeared first on finnstats.

To leave a comment for the author, please follow the link and comment on their blog: Data Analysis in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: How to Calculate Percentiles in R

Extracting numbers from a stacked density plot

$
0
0
[This article was first published on R – The Shape of Code, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A month or so ago, I found a graph showing a percentage of PCs having a given range of memory installed, between March 2000 and April 2020, on a TechTalk page of PC Matic; it had the form of a stacked density plot. This kind of installed memory data is rare, how could I get the underlying values (a previous post covers extracting data from a heatmap)?

The plot below is the image on PC Matic’s site:

Percentage of PC having a given amount of installed memory, from 2000 to 2020.

The change of colors creates a distinct boundary between different memory capacity ranges, and it ought to be possible to find the y-axis location of each color change, for a given x-axis location (with location measured in pixels).

The image was a png file, I loaded R’s png package, and a call to readPNG created the required 2-D array of pixel information.

library("png")img=readPNG("../rc_mem_memrange_all.png")

Next, the horizontal and vertical pixel boundaries of the colored data needed to be found. The rectangle of data is surrounded by white pixels. The number of white pixels (actually all ones corresponding to the RGB values) along each horizontal and vertical line dramatically drops at the data image boundary. The following code counts the number of col points in each horizontal line (used to find the y-axis bounds):

horizontal_line=function(a_img, col){lines_col=sapply(1:n_lines, function(X) sum((a_img[X, , 1]==col[1]) &                                            (a_img[X, , 2]==col[2]) &                                            (a_img[X, , 3]==col[3]))                )return(lines_col)}white=c(1, 1, 1)n_cols=dim(img)[2]# Find where fraction of white points on a line changes dramaticallywhite_horiz=horizontal_line(img, white)# handle when upper boundary is missingylim=c(0, which(abs(diff(white_horiz/n_cols)) > 0.5))ylim=ylim[2:3]

Next, for each vertical column of pixels, at each x-axis pixel location, the sought after y value occurs at the change of color boundary in the corresponding vertical column. This boundary includes a 1-pixel wide separation color, which creates a run of 2 or 3 consecutive pixel color changes.

The color change is easily found using the duplicated function.

# Return y position of vertical color changes at x_posy_col_change=function(x_pos){# Good enough technique to generate a unique value per RGB colorcol_change=which(!duplicated(img[y_range, x_pos, 1]+                          10*img[y_range, x_pos, 2]+                         100*img[y_range, x_pos, 3]))# Handle a 1-pixel separation line between colors. # Diff is used to find these consecutive sequences.y_change=c(1, col_change[which(diff(col_change) > 1)+1])# Always return a vector containing max_vals elements.return(c(y_change, rep(NA, max_vals-length(y_change))))}

Next, we need to group together the sequence of points that delimit a particular boundary. The points along the same boundary are all associated with the same two colors, i.e., the ones below/above the boundary (plus a possible boundary color).

The plot below shows all the detected boundary points, in black, overwritten by colors denoting the points associated with the same below/above colors (code):

Colored points showing detected area colow boundaries.

The visible black pluses show that the algorithm is not perfect. The few points here and there can be ignored, but the two blocks at the top of the original image have thrown a spanner in the works for some range of points (this could be fixed manually, or perhaps it is possible to tweak the color extraction formula to work around them).

How well does this approach work with other stacked density plots? No idea, but I am on the lookout for other interesting examples.

To leave a comment for the author, please follow the link and comment on their blog: R – The Shape of Code.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Extracting numbers from a stacked density plot

Add new calculated variables to a data frame and drop all existing variables

$
0
0
[This article was first published on Data Science Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The post Add new calculated variables to a data frame and drop all existing variables appeared first on Data Science Tutorials

Add new calculated variables to a data frame and drop all existing variables, I hope you enjoyed reading about the dplyr package magics in earlier posts, here is the last update while using dplyr package.

With R’s transmute() function, you can drop all of the existing variables and add new calculated variables to a data frame.

The basic syntax used by this function is as follows.

df %>% transmute(var_new = var1 * 2)

In this example, the existing variable var1 will be multiplied by 2 to produce a new variable called var new.

With the following data frame in R, the ensuing examples demonstrate how to utilize the transmute() function.

Let’s create a data frame

df <- data.frame(team=c('P1', 'P2', 'P3', 'P4', 'P5'),
points=c(129, 110, 115, 128, 412),
assists=c(313, 238, 331, 339, 234),
rebounds=c(230, 128, 324, 124, 228))

Now we can view the data frame

df
   team points assists rebounds
1   P1    129     313      230
2   P2    110     238      128
3   P3    115     331      324
4   P4    128     339      124
5   P5    412     234      228

Example 1: Use transmute() to Create One New Variable

One new variable can be made using transmute() by using the following code.

library(dplyr)

Let’s create a new variable called points2

df %>% transmute(mypoint = points * 2)
   mypoint
1     258
2     220
3     230
4     256
5     824

The original values in the points column multiplied by two give the values of mypoint.

Note that the original data frame is not actually modified by the transmute() method.

You must store the output of the transmute() function in a variable in order to save it in a new data frame.

library(dplyr)

the transmutation’s outcomes in a variable

mypoint<- df %>% transmute(mypoint = points * 2)

Now we can view the results

mypoint
   mypoint
1     258
2     220
3     230
4     256
5     824

Transmute(output )’s is now kept in a fresh data frame.

Example 2: Create several new variables with transmute()

Transmute() can be used to generate numerous new variables from a single set of existing variables. See the example code below.

Let’s create multiple new variables

df %>%
 transmute(
  mypont = points * 2,
  rebounds_squared = rebounds^2,
  assists_half = assists / 2,
  team_name= paste0('team_', team)
)
  mypont rebounds_squared assists_half team_name
1    258            52900        156.5   team_P1
2    220            16384        119.0   team_P2
3    230           104976        165.5   team_P3
4    256            15376        169.5   team_P4
5    824            51984        117.0   team_P5

Four additional variables have been added, as you can see.

.mailpoet_hp_email_label{display:none!important;}#mailpoet_form_1 .mailpoet_form { } #mailpoet_form_1 form { margin-bottom: 0; } #mailpoet_form_1 h1.mailpoet-heading { margin: 0 0 20px; } #mailpoet_form_1 p.mailpoet_form_paragraph.last { margin-bottom: 5px; } #mailpoet_form_1 .mailpoet_column_with_background { padding: 10px; } #mailpoet_form_1 .mailpoet_form_column:not(:first-child) { margin-left: 20px; } #mailpoet_form_1 .mailpoet_paragraph { line-height: 20px; margin-bottom: 20px; } #mailpoet_form_1 .mailpoet_segment_label, #mailpoet_form_1 .mailpoet_text_label, #mailpoet_form_1 .mailpoet_textarea_label, #mailpoet_form_1 .mailpoet_select_label, #mailpoet_form_1 .mailpoet_radio_label, #mailpoet_form_1 .mailpoet_checkbox_label, #mailpoet_form_1 .mailpoet_list_label, #mailpoet_form_1 .mailpoet_date_label { display: block; font-weight: normal; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea, #mailpoet_form_1 .mailpoet_select, #mailpoet_form_1 .mailpoet_date_month, #mailpoet_form_1 .mailpoet_date_day, #mailpoet_form_1 .mailpoet_date_year, #mailpoet_form_1 .mailpoet_date { display: block; } #mailpoet_form_1 .mailpoet_text, #mailpoet_form_1 .mailpoet_textarea { width: 200px; } #mailpoet_form_1 .mailpoet_checkbox { } #mailpoet_form_1 .mailpoet_submit { } #mailpoet_form_1 .mailpoet_divider { } #mailpoet_form_1 .mailpoet_message { } #mailpoet_form_1 .mailpoet_form_loading { width: 30px; text-align: center; line-height: normal; } #mailpoet_form_1 .mailpoet_form_loading > span { width: 5px; height: 5px; background-color: #5b5b5b; }#mailpoet_form_1{border-radius: 16px;background: #ffffff;color: #313131;text-align: left;}#mailpoet_form_1 form.mailpoet_form {padding: 16px;}#mailpoet_form_1{width: 100%;}#mailpoet_form_1 .mailpoet_message {margin: 0; padding: 0 20px;} #mailpoet_form_1 .mailpoet_validate_success {color: #00d084} #mailpoet_form_1 input.parsley-success {color: #00d084} #mailpoet_form_1 select.parsley-success {color: #00d084} #mailpoet_form_1 textarea.parsley-success {color: #00d084} #mailpoet_form_1 .mailpoet_validate_error {color: #cf2e2e} #mailpoet_form_1 input.parsley-error {color: #cf2e2e} #mailpoet_form_1 select.parsley-error {color: #cf2e2e} #mailpoet_form_1 textarea.textarea.parsley-error {color: #cf2e2e} #mailpoet_form_1 .parsley-errors-list {color: #cf2e2e} #mailpoet_form_1 .parsley-required {color: #cf2e2e} #mailpoet_form_1 .parsley-custom-error-message {color: #cf2e2e} #mailpoet_form_1 .mailpoet_paragraph.last {margin-bottom: 0} @media (max-width: 500px) {#mailpoet_form_1 {background: #ffffff;}} @media (min-width: 500px) {#mailpoet_form_1 .last .mailpoet_paragraph:last-child {margin-bottom: 0}} @media (max-width: 500px) {#mailpoet_form_1 .mailpoet_form_column:last-child .mailpoet_paragraph:last-child {margin-bottom: 0}} Please leave this field empty
input[name="data[form_field_MGI0Nzk2NWMxZTIzX2VtYWls]"]::placeholder{color:#abb8c3;opacity: 1;}Email Address *

Check your inbox or spam folder to confirm your subscription.

Have you liked this article? If you could email it to a friend or share it on Facebook, Twitter, or Linked In, I would be eternally grateful.

Please use the like buttons below to show your support. Please remember to share and comment below. 

The post Add new calculated variables to a data frame and drop all existing variables appeared first on Data Science Tutorials

To leave a comment for the author, please follow the link and comment on their blog: Data Science Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Add new calculated variables to a data frame and drop all existing variables

Draft position for players in the NBA for the 2020-21 season

$
0
0
[This article was first published on R – Statistical Odds & Ends, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When the 2022 NBA draft happened almost a month ago, I thought to myself: do players picked earlier in the draft (i.e. higher-ranked) actually end up having better/longer careers?

If data wasn’t an issue, the way I would do it would be to look at players chosen in the draft lottery (top 60 picks) in the past 10/20 years. For each player, I would look at how many years he played in the NBA and see if there was a correlation between that and draft position. (Here, number of years in the NBA is a proxy for how successful an NBA career is. There are other possible ways to define success, e.g. minutes played, points scored.)

Unfortunately data is an issue, so I ended up looking at a related question: What are the draft positions of players currently in the NBA? If players picked earlier in the draft are more successful, than we would see more of such players in the mix. I had wanted to do this analysis for the season that just happened (2021-2022) but could not find the data, and so I’m doing this for the 2020-21 season.

Importing the data

I got the list of players that played in the 2020-21 season from Basketball Reference. I got the draft position of the players from Wyatt Walsh on Kaggle. (Walsh provides a lot more data than just draft position: it’s worth a look!)

The code below loads the draft data (file path will depend on where the datasets are saved on your machine). We only look at draft data from 2002 onwards as none of the players from earlier drafts were still playing in the 2020-21 season.

library(DBI)library(RSQLite)library(tidyverse)sql_file <- "basketball.sqlite"# load the whole draft data framemydb <- dbConnect(SQLite(), sql_file)draft_df <- dbGetQuery(mydb, "SELECT * FROM Draft")dbDisconnect(mydb)# get just the columns we wantdraft_df <- draft_df %>%  select(year = yearDraft,         number_pick_overall = numberPickOverall,         player = namePlayer) %>%  filter(year >= 2002)head(draft_df)#   year number_pick_overall           player# 1 2020                   1  Anthony Edwards# 2 2020                   2    James Wiseman# 3 2020                   3      LaMelo Ball# 4 2020                   4 Patrick Williams# 5 2020                   5      Isaac Okoro# 6 2020                   6   Onyeka Okongwu

The next block gets the list of players for the 2020-21 season. We have to do some deduplication as players who play for multiple teams in the season have more than one row in the dataset.

players_file <- "nba_players_2020-21.csv"# read players file, just get unique player namesplayers_df <- read.csv(players_file)players <- unique(players_df$Player)length(players)# [1] 540

Looks like there were 540 players who played in this season.

Cleaning and joining the data

The names of players weren’t consistent across the two datasets and so I had to do a bit of manual cleaning. The main inconsistency was for players with names that had accents on some letters. There were also a handful of differences due to suffixes and abbreviations.

There might be a better way to do this cleaning: I would love to hear if there are better alternatives! It’s also possible that I missed out some differences.

# some data wrangling to get player names to match across the two# data sourcesplayers <- gsub("ā", "a", players)players <- gsub("ã", "a", players)players <- gsub("á", "a", players)players <- gsub("ć", "c", players)players <- gsub("Č", "C", players)players <- gsub("č", "c", players)players <- gsub("é", "e", players)players <- gsub("ģ", "g", players)players <- gsub("ņ", "n", players)players <- gsub("ó", "o", players)players <- gsub("ò", "o", players)players <- gsub("ö", "o", players)players <- gsub("Š", "S", players)players <- gsub("š", "s", players)players <- gsub("ū", "u", players)players <- gsub("ý", "y", players)players <- gsub("ž", "z", players)players <- gsub("Frank Mason III", "Frank Mason", players)players <- gsub("J.J. Redick", "JJ Redick", players)players <- gsub("Xavier Tillman Sr.", "Xavier Tillman", players)df <- data.frame(player = players)

Let’s join the data:

joined_df <- df %>% left_join(draft_df, by = "player")head(joined_df)#              player year number_pick_overall# 1  Precious Achiuwa 2020                  20# 2      Jaylen Adams   NA                  NA# 3      Steven Adams 2013                  12# 4       Bam Adebayo 2017                  14# 5 LaMarcus Aldridge 2006                   2# 6 Ty-Shon Alexander   NA                  NA

An NA in the number_pick_overall column means that the player was undrafted. The year column refers to the year the player was drafted.

Analysis

The first surprise I had was how many NBA players were undrafted:

sum(is.na(joined_df$number_pick_overall))# [1] 145

145 out of 540 players, or almost 27% of players were undrafted! (It’s possible that the number is slightly smaller due to inadequate data cleaning on my part. If you spot any mistakes, let me know!)

The second surprise is how many drafted players are no longer playing in the NBA. The earliest drafted player in this dataset was in 2003 and the latest was in 2020, meaning that in this period, 60 \times (2020 - 2003 + 1) = 1080 players were drafted in total. Of these, only 540 - 145 = 395, or just under 37%, are still playing!

The percentage is still pretty low even if you restrict the computation to players drafted in the last 10 years. From 2011 to 2020, 600 players were drafted. From the code below, only 335 of them (about 56%) played in the 2020-21 season.

joined_df %>% filter(year >= 2011) %>%  nrow()# [1] 335

Next, let’s make a plot of the number of players for each pick position. If the order of the draft means anything, we should see more players at higher pick positions (smaller numbers). That’s roughly what we see.

theme_set(theme_bw())ggplot(joined_df) +   geom_bar(aes(x = number_pick_overall)) +  labs(x = "Pick number", y = "# of players",       title = "# of players who played in 2021-22 at each pick number")

Let’s look at the same histogram, except where we aggregate the draft position into groups of 5.

joined_df$pick_group <- (joined_df$number_pick_overall - 1) %/% 5 + 1factor_levels <- paste(0:11 * 5 + 1, 1:12 * 5, sep = "-")joined_df$pick_group <- factor(joined_df$pick_group, labels = factor_levels)ggplot(filter(joined_df, !is.na(pick_group))) +   geom_bar(aes(x = pick_group)) +  labs(x = "Pick number", y = "# of players",       title = "# of players who played in 2021-22 at each pick number")

There’s a clear trend: there are fewer players at larger draft numbers as one might expect. The trend is clearly decreasing until roughly pick 31 onwards, which corresponds to the second round of the draft.

Here’s that same plot but with NAs (i.e. undrafted players) included:

Next, we have a scatterplot that shows the players who played in 2020-21 by their pick number and year drafted. There is no overplotting here (i.e. dots on top of each other) since there is exactly one player for each pick and year combination.

ggplot(joined_df) +  geom_point(aes(x = year, y = number_pick_overall)) +  labs(x = "Year", y = "Pick number",       title = "Players by pick number and year drafted")

As one might expect, players drafted earlier who are still playing in the league tend to have been picked early in the draft.

The thing about static plots is that it’s a bit hard to probe the data further. For example, upon seeing this chart, I was really interested in knowing which player each point corresponded to, especially those in the top-left corner (drafted low but have lasted in the NBA). The plotly package makes it easy to do this by providing data on the point when hovering over it. Unfortunately I can’t insert the plot in WordPress, but if you run the code below on your machine, you can get information just like the screenshot below the code.

library(plotly)plot_ly(data = joined_df, x = ~year, y = ~number_pick_overall,        text = joined_df$player)

Finally let’s end off with list of #1 picks. 14 number one picks were still playing in the 2020-21 season. (Only one #1 in the last decade was not playing: Anthony Bennett from the 2013 draft.)

joined_df %>% filter(number_pick_overall == 1) %>%  arrange(year) %>%  select(player, year)#                player year# 1        LeBron James 2003# 2       Dwight Howard 2004# 3        Derrick Rose 2008# 4       Blake Griffin 2009# 5           John Wall 2010# 6        Kyrie Irving 2011# 7       Anthony Davis 2012# 8      Andrew Wiggins 2014# 9  Karl-Anthony Towns 2015# 10        Ben Simmons 2016# 11     Markelle Fultz 2017# 12      Deandre Ayton 2018# 13    Zion Williamson 2019# 14    Anthony Edwards 2020
To leave a comment for the author, please follow the link and comment on their blog: R – Statistical Odds & Ends.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: Draft position for players in the NBA for the 2020-21 season

R Date

$
0
0
[This article was first published on R feed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Depending on what purposes we're using R for, we may want to deal with data containing dates and times.

R Provides us various functions to deal with dates and times.


Get Current System Date, and Time in R

In R, we use Sys.Date(), Sys.time() to get the current date and time respectively based on the local system. For example,

# get current system date
Sys.Date() 

# get current system time
Sys.time()

Output

[1] "2022-07-11"
[1] "2022-07-11 04:16:52 UTC"

In the above example, we have used different functions to get the current date, and time based on the local system.

Here,

  • Sys.date()– returns the current date i.e. 2022-07-11
  • Sys.time()– returns the current date, time, and timezone i.e. 2022-07-11 04:16:52 UTC

Using R lubridate Package

The lubridate package in R makes the extraction and manipulation of some parts of the date value more efficient.

There are various functions under this package that can be used to deal with dates.

But first, in order to access the lubridate package, we first need to import the package as:

# access lubridate package
library(lubridate) 

Here, we have successfully imported the lubridate package.

1. Get Current Date Using R lubridate Package

# access lubridate package
library(lubridate) 

# get current date with time and timezone
now()

# Output: "2022-07-11 04: 34: 23 UTC"

Here, we have used the now() function provided by the lubridate package to get the current date with time and timezone.


2. Extraction Years, Months, and Days from Multiple Date Values in R

In R, we use the year(), month(), and mday() function provided by the lubridate package to extract years, months, and days respectively from multiple date values. For example,

#  import lubridate package
library(lubridate)

dates <- c("2022-07-11", "2012-04-19", "2017-03-08")

# extract years from dates
year(dates)

# extract months from dates
month(dates)

# extract days from dates
mday(dates)

Output

[1] 2022 2012 2017
[1] 7 4 3
[1] 11 19 8

Here,

  • year(dates) - returns all years from dates i.e. 2022 2012 2017
  • month(dates) - returns all months from dates i.e. 7 4 3
  • mday(dates) - returns days from dates i.e 11 19 8

3. Manipulate Multiple Date Values in R

The lubridate package in R allows us to manipulate multiple date values all at once. For example,

#  import lubridate package
library(lubridate)

dates <- c("2022-07-11", "2012-04-19", "2017-03-08")

# increase each year by one year
print(dates + years(1))

# increase each month by one month 
print(dates + months(1))

# update days
mday(dates) <- c(22, 18, 15)
print(dates) 

Output

[1] "2023-07-11" "2013-04-19" "2018-03-08"
[1] "2022-08-11" "2012-05-19" "2017-04-08"
[1] "2022-07-22" "2012-04-18" "2017-03-15"

Here,

  • dates + years(1) - increases each year in dates by one year
  • dates + months(1) - increases each month in dates by one month
  • mday(dates) <- c(22, 18, 15) - updates each day in dates with a new day.

3. Using update() to Update Multiple dates Values in R

In R, we can use the update() function to update multiple dates values all at once. For example,

#  import lubridate package
library(lubridate)

dates <- c("2022-07-11", "2012-04-19", "2017-03-08")

# update all date values using update() 
new_dates <- update(dates, 
  year = c(2022, 2015, 2019), 
  month = c(9, 12, 1), 
  day = c(21, 2, 13)
)

Output

[1] "2022-09-21" "2015-12-02" "2019-01-13"

In the above example, we have used the update() function to update the dates vector containing years, months, and days values with new values.

  • year = c(2022, 2015, 2019) - updates current years of dates with new years.
  • month = c(9, 12,1) - updates current month with new months.
  • day = c(21, 2, 13) - updates current days with new ones.
To leave a comment for the author, please follow the link and comment on their blog: R feed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Continue reading: R Date
Viewing all 12100 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>