DATASETS AND R CODE


This page was last updated March 24, 2025.

Many of the datasets below have been compiled as .csv files. Click on the link to automatically download the file. If this doesn't work, please email rls@email.unc.edu and I'll try to fix it.

Software

ismev
extRemes
evdbayes

Climate extremes data (current ongoing project)

Weblink for the project
Annual maximum temperatures in Kelowna, BC
Daily temperatures at Kelowna Airport (the annual maxima are derived from this; original source of data is here)
Annual maximum temperatures at London Heathrow Airport - source
Maximum 7-day precipitations at Houston Hobby Airport - source
Global mean temperature anomalies
Pacific Northwest regional summer mean anomalies
Northern Europe regional summer mean anomalies
Gulf of Mexico sea surface temperature mean
The global mean, Pacific Northwest and North Europe anomalies are all derived from the HadCRUT5 dataset. The Gulf of Maxico SST means were calculated from HadISST1 data.

World Weather Attribution analysis of the 2021 Pacific Northwest heatwae

Source data from Climate Explorer
Annual maximum temperatures
Smoothed GMST from WWA. Source: https://climexp.knmi.nl/data/igiss_al_gl_a_4yrlo.dat. The last eight values represent conventions for 1.5, 2.0, 3.0 and 4.0 degree worlds relative to pre-industrial and 1880-1900
Regional mean anomalies
Code for R data analysis (update of 3/20/25)

R code generated during the class:

Subroutine for adaptive Metropolis sampler
Code for Jenkinson's Hartford dataset (updated 1/29/25)
Revised Code for Jenkinson's Hartford dataset (includes tests of fit for GEV model) (updated 2/10/25)
Kelowna GEV code

Venice sea levels analysis:

Data (csv file)
R code (updated 1/29/25)
Revised R code to include tests of fit (updated 2/10/25)

River Nidd data (originally analyzed by Davison and Smith, 1990; updated to 2018)

Nidd annual maxima data
Nidd POT data
Nidd POT data with headers
Sample R code using extRemes package

Women's track times (Robinson and Tawn 1995, Smith 1997, Applied Statistics):

Women's 1500m. data (best 5 times per year)
Women's 3000m. data (best 5 times per year)
R code for records analysis
Stored data from MCMC output

Chicago Marathon Data:

This dataset has been used for a forthcoming paper by Richard Smith and Abigail Mabe, motivated by the controversy over the women's marathon record of the Kenyan runner Ruth Chepngetich in the 2024 Chicago Marathon. The dataset consists of the 25 best women's times in minutes from the Chicago Marathon, 1998-2024 (no results for 2020). The objective of the forthcoming paper is to use the data form 1998-2023 to predict the winning time in 2024 using the "r largest order statistics method".

Public lecture about the analysis

To load the dataset into R: read.csv('https://rls.sites.oasis.unc.edu/s834-2023/Data/ChicagoMarathonData.csv',header=T)

R code used to fit the models and analyze goodness of fit

Insurance Data:

Simulated Insurance Dataset. Designed to reproduce the main features of the dataset analyzed by Smith and Goodman 2000 (132 exceedances over 2.5; 132 observations in 6 types)

Return to Richard Smith's page