Many of the datasets below have been compiled as .csv files. Click on the link to automatically download the file. If this doesn't work, please email rls@email.unc.edu and I'll try to fix it.
Weblink for the project
Annual maximum temperatures in Kelowna, BC
Daily temperatures at Kelowna Airport
(the annual maxima are derived from this; original source of data is
here)
Annual maximum temperatures at London Heathrow Airport -
source
Maximum 7-day precipitations at Houston Hobby Airport -
source
Global mean temperature anomalies
Pacific Northwest regional summer mean anomalies
Northern Europe regional summer mean anomalies
Gulf of Mexico sea surface temperature mean
The global mean, Pacific Northwest and North Europe anomalies are all derived from the HadCRUT5 dataset. The Gulf of Maxico SST means were calculated from
HadISST1 data.
Source data from Climate Explorer
Annual maximum temperatures
Smoothed GMST from WWA. Source: https://climexp.knmi.nl/data/igiss_al_gl_a_4yrlo.dat. The last eight values represent conventions for 1.5, 2.0, 3.0 and 4.0 degree worlds relative to pre-industrial and 1880-1900
Regional mean anomalies
Code for R data analysis (update of 3/20/25)
Subroutine for adaptive Metropolis sampler
Code for Jenkinson's Hartford dataset (updated 1/29/25)
Revised Code for Jenkinson's Hartford dataset (includes tests of fit for GEV model) (updated 2/10/25)
Kelowna GEV code
Data (csv file)
R code (updated 1/29/25)
Revised R code to include tests of fit (updated 2/10/25)
Nidd annual maxima data
Nidd POT data
Nidd POT data with headers
Sample R code using extRemes package
Women's 1500m. data (best 5 times per year)
Women's 3000m. data (best 5 times per year)
R code for records analysis
Stored data from MCMC output
This dataset has been used for a forthcoming paper by Richard Smith and Abigail Mabe, motivated by the controversy over the women's marathon record of the Kenyan runner Ruth Chepngetich in the 2024 Chicago Marathon. The dataset consists of the 25 best women's times in minutes from the Chicago Marathon, 1998-2024 (no results for 2020). The objective of the forthcoming paper is to use the data form 1998-2023 to predict the winning time in 2024 using the "r largest order statistics method".
Public lecture about the analysis
To load the dataset into R: read.csv('https://rls.sites.oasis.unc.edu/s834-2023/Data/ChicagoMarathonData.csv',header=T)
R code used to fit the models and analyze goodness of fit
Simulated Insurance Dataset. Designed to reproduce the main features of the dataset analyzed by Smith and Goodman 2000 (132 exceedances over 2.5; 132 observations in 6 types)
Return to Richard Smith's page