Working with the database in R

The ISRaD R package offers:
1) ISRaD_data: The complete collection of data, reported from publications, compiled in one place
2) ISRaD_extra: An augmented dataset, with useful global variables and radiocarbon calculations completed for you
3) Tools: Options to compile your own dataset to compare to ISRaD, and functions to work with the data

How to install the database

Official Release:

# COMING SOON!!!

Latest development version:

# 1) Install and load the 'devtools' and 'rcrossref' packages
# 'devtools' allows you to install the ISRaD package in its "beta"/development form
# 'rcrossref' is used by our QAQC tool, but sometimes ISRaD won't install without it
install.packages("devtools")
library(devtools) 
install.packages("rcrossref")
library("rcrossref")

# 2) Install package 'ISRaD' from the github repository:
devtools::install_github("International-Soil-Radiocarbon-Database/ISRaD", ref="master")
library(ISRaD) # load the package

Some other useful packages for working with the data:

Tidyverse and dplyr (part of tidyverse) are useful packages for exploration of ISRaD data. This is a helpful cheatsheet.

#Tidyverse and dplyr (part of tidyverse) are useful packages for exploration of ISRaD data. 
#A helpful cheatsheet is available here: https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf
library(tidyverse)

Structure of data

Once installed, the ISRaD package provides the pre-compiled version of database associated with the most current data release. These data are formated as a list called ISRaD_data. ISRaD_data is a list comprised of 8 data.frames (metadata, site, profile, flux, layer, interstitial, fraction, and incubation), which correspond to the template data tables. ‘ISRaD_extra’ follows the same structure. Here is a conceptual overview of the data structure.

#Check to make sure the package is loaded by looking for the database object "ISRaD_data"
data(package="ISRaD") # should open a new tab

#View a single table (e.g. Site, Profile, Layer, etc):
#This will show you all the sites compiled in the database
View(ISRaD_data$Site)

Creating Your Own Data Frame of Interest

Ready to start your analysis? You can create your own data frame using the join function.
Here we join the fraction data with relevant data from other layers of the hierarchy:

#Merge fraction data with all upstream data in an object called "frc_data"
frc_data <- ISRaD_extra$fraction %>% #Start with fraction data
  left_join(ISRaD_extra$layer) %>% #Join to layer data
  left_join(ISRaD_extra$profile) %>% #Join to profile data
  left_join(ISRaD_extra$site) %>% #Join to site data
  left_join(ISRaD_extra$metadata) #Join to metadata

#Take a look at it:
View(frc_data)

Or join with a more limited set of information:

#Merge flux data with site information only
flx_data <- ISRaD_data$flux %>% #Start with flux data
  left_join(ISRaD_data$site) %>% #Join to site data

#Take a look at it:
View(flx_data)

Alternatively, you can use the function flatten to create a data frame with all relevant layers of the hierarchy. You can create a flat data frame with data from the flux, layer, interstitial, fraction or incubation tables. You can flatten data from ISRaD_data, ISRaD_extra, or your own compiled data with a similar structure.

#Use flatten function to create a data frame with layer data and all higher levels of the hierarchy
lyr_data <- flatten(ISRaD_data, 'layer')

#Take a look at it:
View(lyr_data)

Filtering the Data & Summary Statistics

frc_data %>%
  filter(lyr_bot < 20) %>%
  group_by(pro_land_cover) %>%
  summarise(num_data_points = n(),
            mean_frc_14c = mean(frc_14c, na.rm=TRUE))

Simple visualization of the data

Quick and simple visualization is possible by calling a Shiny app.

ISRaD.shiny()

How to compile user data locally

Some users may wish to compile thier own data locally in order view it in the context of the larger database. (note: that this operation is not the same as submitting your data for ingest)

The compile function is used QA/QC and assemble additional datasets (that pass the QA/QC test) into an new list, which can later be merged with ISRaD_data.

In order to run compile on a set of user specified data entries, the user must create a local folder whose path is specified with dataset_directory. This folder must only contain the entries to be compiled in .xlsx format. If other files types exist in the directory, compile will fail. (note: entries cannot be open in Excel)

compiled<-compile(dataset_directory = "~/Directory/to/data/", write_report = T, write_out = T, return="list")

The paramter return determines format of the object that is returned and should be set to “list” unless the user prefers a flattend version of the database formatted as a single data.frame.

When set to “TRUE”, the parameter write_out will trigger the creation of several output files:

Description Location File name
Report files that identify issues with the files in the dataset_directory dataset_directory/QAQC QAQC_*.txt (* corresponds to dataset file names)
Flattened database file dataset_directory/database ISRaD_flat.csv
List structured database file in the same format as template dataset_directory/database ISRaD_list.xlsx
Log file generated by compile function. Most importatntly, tells you which files passed QAQC. dataset_directory/database ISRaD_log.txt
Summary statistics for datasets compiled into databaser dataset_directory/database ISRaD_summary.csv
QAQC check on compiled database. dataset_directory/database QAQC_ISRaD_list.txt

Merging a user compiled list with ISRaD_data

The function mapply can be used to merge the user compiled list with ISRaD_data as follows:

merged_data<-mapply(rbind, ISRaD_data, compiled, SIMPLIFY=FALSE)

How to subset the database

We use packages dplyr for filtering data and ggplot for plotting. Both packages are included in the tidyverse library.

library(tidyverse)

to be continued…