We aren’t computer scientists and that’s okay!
We make lots of mistakes. Mistakes are funny. You can laugh with us.
Let’s go, Simba, Pumbaa, and Timon!
We will focus on the stringr package for string manipulation.
All functions from stringr start with str_.
stringr cheatsheet
https://evoldyn.gitlab.io/evomics-2018/ref-sheets/R_strings.pdf
Use str_length() to find the number of characters in a string.
Use str_c() to combine two or more strings
a <- 1
str_c("./output/aq-", a, ".csv")
str_c(10, 7, 2020, sep = "/")
# if input is a vector
x <- c("a", "b")
str_c(x, collapse = "") # specify collapse to combine input vectors into a single string
# two or more vectors
y <- c("1", "2", "3", "4")
str_c(x, y) # the short vector is recycled to the length of the longest
Use str_sub() to extract parts of a string based on index.
Regular expressions, or regexps, are a concise language for describing patterns in strings.
In R, we write regular expressions as strings.
We will use str_view() to learn regular expression
# install and load the htmlwidgets package
install.packages("htmlwidgets")
library(htmlwidgets)
# The simplest patterns match exact characters:
str_view(x, 'rry') # 'rry' is the pattern
# The next is special characters.
# For example, dot . matches any character (except a newline):
str_view(x, ".e.")
# But if . matches any character, how do you match the character .?
# Regexps use the backslash \ to escape special behaviour. So to match ., you need the \.
# However, whenever a \ appears in a regexp, you must write it as \\ in the string that represent the regexp.
str_view(c("abc", "a.c", "bef"), "\.")
str_view(c("abc", "a.c", "bef"), "\\.")
# ^ to match the start of the string
str_view(x, "^a")
# $ to match the end of the string.
str_view(x, "y$")
#To remember which is which, try this: if you begin with power (^), you end up with money ($).
x <- c("apple pie", "apple", "apple cake")
str_view(x, "^apple$")
For more patterns with regular expressions, please check the stringr cheatshet.
To extract the actual text of a match and return as a list.
To determine if a character vector matches a pattern. It returns a logical vector.
You can also combine str_detect() and filter() to select rows in a dataframe.
Replace matches with new strings
Use str_split() to split a string up into pieces.
filename <- "D:/LearnR/input/ozone_samples_demo.csv"
output <- str_split(filename, '/')
output[[1]][4] # extract the file name
output1 <- unlist(output) # use unlist() to produce a vector from a list
output1[4]
Exercise
Extract date from a pdf file.
Obtain the pdf text using the following code.
string basics
regular expressions
match strings