NV INBRE and the Nevada Bioinformatics Center are pleased to announce the R for Data Science Bootcamp. R is a complete, flexible, and open source system for statistical analysis and graphics, and has become a tool of choice for biologists and biomedical scientists.

These five sessions will introduce R and RStudio through hands-on learning activities. No prior data analysis experience is necessary! Participants will receive example data sets to practice data manipulation, organization, and graphical exploration. Participants are also encouraged to bring their own data and data challenges. In addition, major concepts of data science such as reproducibility will be addressed. Participants will gain the skills necessary to manage and organize data, run basic analyses, and generate professional documents and figures in R.

Audience and Prerequisites

No prior knowledge of data analytics or coding are necessary! This bootcamp is addressed to beginners wanting to become familiar with the R syntax, environment, and the most common commands to start using R to explore, interpret, and present their data.

Although there are no prerequisites for this bootcamp, you are encouraged to go through some of the R documentation available here.

Please also note this course is NOT a training on statistics but rather a training on how to use R to perform different tasks.

Application to Participate

Application is required to be considered to participate. Students and faculty from TMCC, WNC, SNU, and UNR interested in R are encouraged to apply. Applicants may sign up for individual workshops, but priority will be given to those that sign up for the whole series.

For students:

Click here to be taken to the application form

Application requires:

Application is open until May 14th at 3pm, but will close as soon as spots are filled with qualifying participants.

You will receive an email to confirm participation confirmation by no later than Monday, May 17th. Upon reception of the confirmation email, participants will be asked to confirm their attendance within 2 days.

For faculty:

Please contact Dr. Juli Petereit directly at (priority will be given to students, but we are planning a faculty-specific workshop, and your interest/feedback would be appreciated).

Additional Information

Coordinator: Juli Petereit

Non-UNR affiliates will receive a parking pass sponsored by NV INBRE.

Participants attending all section will receive a certificate of completion.

Important:


Schedule

Workshop Date and Time Speaker(s)
Introduction to R Part 1 May 20, 2021, 1-4pm Alex, Lucas, and Juli
Introduction to R Part 2 May 27, 2021, 1-4pm Alex, Lucas, and Juli
Reproducible Data Science June 3, 2021, 1-4pm Alex, Lucas, and Juli
Tidy R June 10, 2021, 1-4pm Alex, Lucas, and Juli
Producing Clean Documents, Graphs, and Tables June 17, 2021, 1-4pm Alex, Lucas, and Juli
Open R Lab June 24, 2021, 1-4pm Lucas and Juli

Details

Introduction to R Part 1

This workshop serves as a gentle introduction to programming with R, with a focus on application to data science and research. R basics will be covered, such as using the command line as a calculator, storing values in a variable, basic data structures (vector, matrix, list, data frame, etc.), and data types. We will show how to load basic data sets and access rows/columns, and summarize data with statistics and plots. We will also cover some considerations when choosing R over another language.

Objectives

Participants will be able to:

  • open the R command line interface (CLI)
  • access the built-in help docs
  • perform basic arithmetic using R
  • store data in variables
  • list the environment variables
  • understand the basic data types in R
  • convert between the different data types
  • understand and use the basic data structures in R
  • access rows, columns, and elements of matrices and data frames
  • read data from a csv into a data frame
  • fit simple linear models
  • summarize data and linear models
  • create simple plots of 1-D and 2-D data
  • understand the similarities and differences between R, Python, Julia, and other programming languages

Background Knowledge

Participants should know:

  • how to operate a computer
  • arithmetic and order of operations
  • basic statistics (summary statistics, simple linear models)
  • basic ideas of plots and charts

Introduction to R Part 2

This workshop introduces scripting in R, including control flow, writing functions, and installing/loading libraries (and the differences between CRAN, GitHub, and Bioconductor packages). We will go over the -apply family of functions, simple graphs, and good practices in R (style guide, organization, etc.).

Objectives

Participants will be able to:

  • open up RStudio and identify key parts of the IDE
  • create, open, and save R scripts
  • run R scripts line by line or all at once
  • understand and use the -apply family of functions
  • use control flow to execute complex tasks
  • write functions and understand when to use them
  • install and load libraries
  • understand the differences between CRAN, Bioconductor, and GitHub
  • understand what a style guide is and why it is important
  • use good practices for writing and organizing code

Background Knowledge

Participants should know:

  • how to open an application on a computer
  • the basic functions in R
  • the basic ideas of conditional statements

Reproducible Data Science

We will discuss the problems we commonly see and introduce what it means to conduct reproducible data science. During the workshop, we will go through an example of setting up a new project folder, loading “messy” data, organizing scripts, and creating basic Rmarkdown reports. In reference to the messy data, we will go over what “tidy” data is, and how to use tidyr and other tidyverse libraries.

Objectives

Participants will be able to:

  • understand the principles of reproducible [data] science
  • identify bad project management habits
  • organize a new project folder
  • write code that is human-readable
  • identify helpful libraries for data wrangling
  • understand and implement the principles of immutable data
  • create reports and other media using Rmarkdown
  • use GitHub to track project changes

Background Knowledge

Participants should know:

  • scripting with R or a similar programming language
  • how to create directories and folders
  • basic git or a git GUI
  • basic markdown syntax

Tidy R

This will be a workshop fully devoted to data wrangling, including loading and cleaning “dirty” data, formatting “messy” data, and transforming “tidy” data to suit the task. There will be an emphasis on tidyverse libraries, but we will mention data.table as a faster alternative with fewer dependencies but with the trade-off of being less “readable”.

Objectives

Participants will be able to:

  • understand what “tidy” data is
  • understand and use the various tidyverse packages
  • read in data from various formats
  • clean data using the tidyverse
  • reshape wide data into long format, and long data into wide format

Background Knowledge

Participants should know:

  • how to program with R or some other scripting language
  • how to navigate RStudio
  • how to access the help documentation in R

Producing Clean Documents, Graphs, and Tables

In this workshop, we will go over some principles of the visual display of quantitative data, and go in-depth with plotting in R with base and ggplot2. Additionally, since not all data is well represented in a graph, we will go over creating pretty tables using the kable and kableExtra packages.

Objectives

Participants will be able to

  • understand the basic principles of the visual display of quantitative information
  • produce clean and professional graphs using base R
  • produce clean and professional graphs using ggplot2
  • produce clean tables using kable and kableExtra
  • create clean documents using Rmarkdown

Background Knowledge

Participants should know:

  • how to program with R or some other scripting language
  • how to navigate RStudio
  • basic markdown syntax
  • some LaTex or mathjax
  • how to search for additional information

Acknowledgements

This bootcamp was made possible by a grant from the National Institute of General Medical Sciences (GM103440) from the National Institutes of Health.