Why would we even bother with creating a new project folder/file?
Writing a script allows one to re-use and re-write code. It means that you can also share your code, keep track of changes, and string together more complex commands to carry out an analysis.
ctrl+shift+N
)You can run the entire script at once using the “source” button
(ctrl+shift+S
or ctrl+shift+enter
), or you can
run line-by-line (ctrl+enter
).
Control flow statement is a statement that results in a choice being made as to which of two or more paths to follow.
The simplest control flow is the if-else
kind.
# An if statement can stand alone
if (7 > 3) {
cat("hello from inside the 'if' block\n")
}
## hello from inside the 'if' block
# You can have an if and an else
do_first_condition <- FALSE
if (do_first_condition) {
cat("This shouldn't print\n")
} else {
cat("This is the fallback option\n")
}
## This is the fallback option
# If, else if, else
x <- 4
if (x < 0) {
cat(x, " is less than 0\n")
} else if (isTRUE(all.equal(x, 0))) {
cat(x, " is equal to 0\n")
} else {
cat(x, " is greater than 0\n") # is this true? ;)
}
## 4 is greater than 0
# An `else` block is not required
if (FALSE) {
cat("This won't print\n")
} else if (7 < 4) {
cat("Neither will this\n")
}
There are two types of loops (technically one, but practically two),
and they are the while
-loop and the
for
-loop.
while
-loop
some_condition_is_true <- TRUE
some_counter <- 0
number_of_iterations <- 12
while (some_condition_is_true) {
# do stuff
cat("This is iteration:\t", some_counter, "\n")
if (some_counter == 7) {
some_condition_is_true <- FALSE
}
some_counter <- some_counter + 1
}
## This is iteration: 0
## This is iteration: 1
## This is iteration: 2
## This is iteration: 3
## This is iteration: 4
## This is iteration: 5
## This is iteration: 6
## This is iteration: 7
We can also further refine the behavior of a loop with the
next
and break
commands.
# Skip if a number is divisible by 7
# Print "fizz" if a number is divisible by 3
# Print "buzz" if a number is divisible by 5
# end the loop if the number reaches 23
number <- 0
while (TRUE) {
number <- number + 1
if (number == 23) {
break
}
if (number %% 7 == 0) {
next
} else if (number %% 15 == 0) {
print("fizzbuzz")
} else if (number %% 3 == 0) {
print("fizz")
} else if (number %% 5 == 0) {
print("buzz")
} else {
print(number)
}
}
## [1] 1
## [1] 2
## [1] "fizz"
## [1] 4
## [1] "buzz"
## [1] "fizz"
## [1] 8
## [1] "fizz"
## [1] "buzz"
## [1] 11
## [1] "fizz"
## [1] 13
## [1] "fizzbuzz"
## [1] 16
## [1] 17
## [1] "fizz"
## [1] 19
## [1] "buzz"
## [1] 22
Please note, the above code is a poor implementation for the fizzbuzz test.
for
-loop
When we know ahead of time how many iterations we are going to do, we
can instead use a for
-loop which will take care of
incrementing for us. We can also use a for-loop to iterate over
elements in a vector.
x <- 3:7
# print the number doubled
# note that the alias for each element in 'x' can be any variable name
for (some_alias_for_element in x) {
print(some_alias_for_element * 2)
}
## [1] 6
## [1] 8
## [1] 10
## [1] 12
## [1] 14
A switch
statement is a compact way to find a match over
multiple conditions.
x <- rnorm(n = 30, mean = pi, sd = 0.4)
stat <- "mean"
switch (stat,
mean = mean(x),
sd = sd(x),
var = var(x),
round = round(x, 2),
"Default option: no match found"
)
## [1] 3.246515
apply()
lapply()
sapply()
tapply()
vapply()
rapply()
lapply
replicate()
sapply
n
timesThe Huber loss function (or just Huber function, for short) is defined as:
\[ \psi(x) = \begin{cases} x^2 & \text{if } |x| \leq 1 \\ 2|x| - 1 & \text{if } |x| > 1 \end{cases} \]
# write a function `huber()` that takes as an input a number, x,
# and returns the Huber value
huber <- function(x) {
if (abs(x) <= 1) {
x^2
} else {
2*abs(x) - 1
}
}
The Huber function can be modified so that the transition from quadratic to linear happens at an arbitrary cutoff value \(a\), as in:
\[ \psi_a(x) = \begin{cases} x^2 & \text{if } |x| \leq a \\ 2a|x| - a^2 & \text{if } |x| > a \end{cases} \]
Starting with the code above, update the huber()
function so that it takes two arguments: \(x\), a number at which to evaluate the
loss, and \(a\), a number representing
the cutoff value.
It should now return \(\psi_a(x)\),
as defined above. Check that huber(3, 2)
returns 8, and
huber(3, 4)
returns 9.
huber <- function(x, a) {
if (abs(x) <= a) {
x^2
} else {
2 * a * abs(x) - a^2
}
}
huber(3, 2)
## [1] 8
huber(3, 4)
## [1] 9
Update the huber()
function so that the default value of
the cutoff \(a\) is 1. Check that
huber(3)
returns 5.
huber <- function(x, a = 1) {
if (abs(x) <= a) {
x^2
} else {
2 * a * abs(x) - a^2
}
}
huber(3)
## [1] 5
Check that huber(a=1, x=3)
returns 5. Check that
huber(1, 3)
returns 1. Why are these different?
huber(a = 1, x = 3)
## [1] 5
huber(1, 3)
## [1] 1
Finally, we can vectorize this function over a set of inputs in two different ways:
# ifelse()
huber_ifelse <- function(x, a) {
ifelse(abs(x) <= a, x^2, 2*a*abs(x) - a^2)
}
# Vectorize()
huber_vec <- Vectorize(huber, vectorize.args = "x", USE.NAMES = TRUE)
On the subject of code reuse, we can stand on the shoulders of giants (or other researchers) and use the code that they’ve decided to share on CRAN or GitHub (or Bioconductor, or …). We use their code through something called a library or package.
# load the `readxl` package using the `library()` function
library(readxl)
Sometimes two packages will export two different functions by the same name. When this happens, we get what is called a namespace conflict. In those cases (and in general), it is best to be explicit about which package’s function you are using:
# load the `dplyr` package and observe the 'onload' message
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
filter
from
package:stats
stats
is a default R library that is available on
startupdplyr
also exports an object called filter
and overwrites it in the namespace# This should work for time series
x <- 1:100
filter(x, rep(1, 3))
## Error in UseMethod("filter"): no applicable method for 'filter' applied to an object of class "c('integer', 'numeric')"
# fix the above error by being explicit about the namespace
stats::filter(x, rep(1, 3))
## Time Series:
## Start = 1
## End = 100
## Frequency = 1
## [1] NA 6 9 12 15 18 21 24 27 30 33 36 39 42 45 48 51 54
## [19] 57 60 63 66 69 72 75 78 81 84 87 90 93 96 99 102 105 108
## [37] 111 114 117 120 123 126 129 132 135 138 141 144 147 150 153 156 159 162
## [55] 165 168 171 174 177 180 183 186 189 192 195 198 201 204 207 210 213 216
## [73] 219 222 225 228 231 234 237 240 243 246 249 252 255 258 261 264 267 270
## [91] 273 276 279 282 285 288 291 294 297 NA
Take some time to read through the first three sections of the Tidyverse Style Guide. The developers of RStudio also develop a suite of packages called the Tidyverse, and they use this style guide to make code more readable and uniform.