User Manual
All configuration is done through YAML files. The default configuration file is source/config.yml
, although it is possible to create additional configuration files and pass them to the scripts using the --config
command line argument.
Folder Configuration
data_folder
: This argument should be an absolute path to a directory containing the CSV files you would like to run the framework on. The framework may not work if you use a relative path.
- Alternatively,
data_folder: null
will use the defaultffa-framework/data
folder.
csv_files
: A list of 1 or more CSV files located in data_folder
.
- Alternatively,
csv_files: null
will run the framework on every file indata_folder
.
report_folder
: An absolute path to a directory which will be used to store EDA and FFA reports. The directory must exist. Again, the framework may not work if you use a relative path.
- Alternatively,
report_folder: null
will use the defaultffa-framework/reports
folder.
Example Configuration (1)
data_folder: "~/Desktop/ffa-data"
csv_files:
- "dataset-1.csv"
- "dataset-2.csv"
report_folder: "~/Desktop/ffa-reports"
Example Configuration (2)
data_folder: null
csv_files: null
report_folder: null
Exploratory Data Analysis (EDA)
For more information about the statistical tests used for EDA, see here.
The configuration options for the EDA portion of the framework are as follows:
mode
: The mode in which to run EDA (complete framework only). Must be one of:"preset"
: Specify split points ahead of time using thesplit_points
option."automatic"
: Automatically determine split points using change point tests."manual"
: Manually confirm the split points identified using change point tests.
split_points
: Used withmode: "preset"
to set split points ahead of time (as a list of years).- Alternatively,
split_points: null
will disable splitting entirely.
- Alternatively,
alpha
: The confidence level used for statistical tests (must be between \(0.01\) and \(0.10\)).bbmk_repetitions
: The number of boostrap samples to use for the BB-MK Test.window_length
: The size of the moving window used to compute the streamflow variance.window_step
: The size of the steps used to compute the streamflow variance.show_trend
: Whether or not to draw a trend line between data points (true
orfalse
).generate_report
: Whether or not to generate a report (true
orfalse
).report_format
: A list of report formats ("md_document"
,"html_document"
, or"pdf_document"
).
Example Configuration
mode: "manual"
split_points: null
alpha: 0.05
bbmk_repetitions: 10000
window_length: 10
window_step: 5
show_trend: false
generate_report: true
report_format: "html_document"
Example (Disable Splitting)
mode: "preset"
split_points: null
Example (Preset Split Points)
mode: "preset"
split_points:
- 1950
- 1980
Running the Complete EDA Framework
The source/eda.R
script is used to run the complete EDA framework. The optional command line argument --config
can be used to specify a custom configuration file. If --config
is not specified, the framework will use config.yml
as a default.
Example Usage (1)
Rscript eda.R
Example Usage (2)
Rscript eda.R --config='custom-config.yml'
Running Individual Statistical Tests
The source/stats.R
script is used to run individual statistical tests.
--name
(required) is used to configure the name of the statistical test. It must be one of "bbmk", "kpss", "mk", "mks", "mwmk", "pettitt", "pp", "sens-mean", "sens-variance", "spearman", and "white".--config
(optional) can be used to specify a custom configuration file.
Example Usage (1)
Rscript stats.R --name='mks'
Example Usage (2)
Rscript stats.R --name='white' --config='custom-config.yml'
Frequency Analysis
TBD