This vignette demonstrates how to use the ffaframework
package to perform flood frequency analysis (model selection, parameter
estimation, uncertainty quantification, and model performance
assessment) under the assumption of stationarity.
The results are expressed in terms of return periods and return levels:
This vignette will explore data from the Athabasca River at Athabasca
(07BE001) hydrological monitoring station. A statistical analysis of the
data from this station does not reveal any evidence of trends in the
mean or variability of the data. Data for this station is provided as
Application_1.csv
in the ffaframework
package.
library(ffaframework)
csv_path <- system.file("extdata", "Application_1.csv", package = "ffaframework")
df <- read.csv(csv_path, comment.char = "#")
df <- subset(df, !is.na(max)) # Remove missing values
head(df)
#> year max
#> 14 1913 1670
#> 15 1914 3090
#> 16 1915 2760
#> 17 1916 2080
#> 18 1917 2490
#> 19 1918 1470
plot_ams_data(df$max, df$year, title = "Athabasca River at Athabasca (07BE001)")
First, a suitable probability distribution for the data is selected using the L-moments.
select_ldistance
chooses the distribution with
theoretical L-skewness (\(\tau_3\)) and
L-kurtosis (\(\tau_4\)) closest to the
sample, based on Euclidean distance.select_lkurtosis
selects the distribution with the
smallest difference between sample and theoretical L-kurtosis (for
three-parameter distributions only).select_zstatistic
uses a fitted 4-parameter Kappa
distribution to estimate the sampling distribution of the L-kurtosis and
selects the distribution with the smallest z-statistic.selection <- select_ldistance(df$max)
print(selection$recommendation)
#> [1] "GEV"
plot_lmom_diagram(selection)
Recommendation: Use the generalized extreme value (GEV) distribution.
Note: For information about the other distributions,
see the selection$metrics
item.
You can find more information about the probability distributions supported by the framework here.
The ffaframework
package provides three methods for
parameter estimation. See here
for more information.
fit_lmom_*
: L-moments parameter estimation for each
distribution.fit_maximum_likelihood
: Maximum likelihood and
generalized maximum likelihood.params <- fit_lmom_gev(df$max)$params
print(params)
#> [1] 1600.219872 616.666030 0.120747
Conclusion: The GEV distribution with parameters
location = 1600
, scale = 616
, and
shape = 0.12
will be used.
Once a probability distribution is fitted, return levels (design
events) can be readily estimated. However, point estimates alone can be
misleading –it is more informative to report confidence intervals to
reflect estimation uncertainty. The uncertainty_bootstrap
function performs uncertainty quantification using the parametric
bootstrap method. It requires three arguments:
data
: A vector of annual maximum series data.model
: A three-letter code for a probability
distribution (ex. "GEV"
).method
: A parameter estimation method. Must be
"L-moments"
, "MLE"
, or
"GMLE"
.By default, return levels are computed 2-, 5-, 10-, 20-, 50-, and 100- year return periods.
uncertainty <- uncertainty_bootstrap(df$max, "GEV", "L-moments")
print(uncertainty[[1]]$estimates)
#> [1] 1831.312 2614.238 3194.788 3803.340 4673.826 5393.422
plot_sffa(uncertainty)
Example Conclusion: Every 10 years, we can expect a flood of roughly \(3{\small,}200\text{m}^3/\text{s}\) or greater.
Model performance is assessed using model_diagnostics
,
which reports a collection of assessment statistics about the flood
frequency analysis. plot_model_diagnostics
compares the
empirical plotting positions (“Observed Quantiles”) and the predictions
of the parametric model (“Model Quantiles”). The black line represents a
perfect 1:1 correspondence between the model and the data.
diagnostics <- model_diagnostics(df$max, "GEV", params, uncertainty)
plot_model_diagnostics(diagnostics)
Conclusion: The parametric model generally matches the plotting positions. There is a small positive bias from \(3{\small,}500\text{m}^3/\text{s}\) to \(5{\small,}000\text{m}^3/\text{s}\).