Hello!



We aren’t computer scientists and that’s okay!

We make lots of mistakes. Mistakes are funny. You can laugh with us.

Let’s go, Simba, Pumbaa, and Timon!





1 Add packages



What is an R package?

A package is a small add-on for R, it’s like a phone App for your phone. They add capabilities like statistical functions, mapping powers, and special charts to R.

To install multiple packages: install.packages(c(“pkg1”,“pkg2”))

Tidyverse packages: a collection of packages for tidy data https://www.tidyverse.org/packages/

Exercise 1

Install two packages: data.table and janitor

2 Import data


2.1 Load data from csv files

The main data format used in R is the CSV (comma-separated values). It looks squished together as plain text, but that’s okay! When opened in R, the text becomes a familiar looking table with columns and rows.

Sometimes there are a few lines of metadata at the top of the file. You can use skip = n to skip the first n lines.

The data might not have column names. You can use col_names = FALSE to tell read_csv() not to treat the first row as headings, and instead label them sequentially from X1 to Xn:

https://rawgit.com/rstudio/cheatsheets/master/data-import.pdf

2.2 Load data from other text files

We use the function read_delim(file, delim) to read a txt file with any delimiter. Check the cheatsheet of readr for example.

Exercise 2

Read data from ozone_samples_demo.txt

2.5 Read data from SQL database

https://www.statmethods.net/input/dbinterface.html

2.6 Write to a file

Write to a text file

Write to an excel file

Exercise 3

Save the air_data as plain text file with each row seperated by |
write the air_data to an excel file

3 Explore data


4 Recap


  • packages

  • read and write files: plain text, excel

  • view data frame info

  • missing values and duplicated values