SindbadOptimization Module
SindbadOptimization
The SindbadOptimization
package provides tools for optimizing SINDBAD models, including parameter estimation, model calibration, and cost function evaluation. It integrates various optimization algorithms and utilities to streamline the optimization workflow for SINDBAD experiments.
Purpose:
This package is designed to support optimization tasks in SINDBAD, such as calibrating model parameters to match observations or minimizing cost functions. It leverages multiple optimization libraries and provides a unified interface for running optimization routines.
Dependencies:
CMAEvolutionStrategy
: Provides the CMA-ES (Covariance Matrix Adaptation Evolution Strategy) algorithm for global optimization.Evolutionary
: Supplies evolutionary algorithms for optimization, useful for non-convex problems.ForwardDiff
: Enables automatic differentiation for gradient-based optimization methods.MultistartOptimization
: Implements multistart optimization for finding global optima by running multiple local optimizations.NLopt
: Provides a collection of nonlinear optimization algorithms, including derivative-free methods.Optim
: Supplies optimization algorithms such as BFGS and LBFGS for gradient-based optimization.Optimization
: A unified interface for various optimization backends, simplifying the integration of multiple libraries.OptimizationOptimJL
: Integrates theOptim
library into theOptimization
interface.OptimizationBBO
: Provides black-box optimization methods for derivative-free optimization.OptimizationGCMAES
: Implements the GCMA-ES (Global Covariance Matrix Adaptation Evolution Strategy) algorithm.OptimizationCMAEvolutionStrategy
: Integrates CMA-ES into theOptimization
interface.QuasiMonteCarlo
: Provides quasi-Monte Carlo methods for optimization, useful for high-dimensional problems.StableRNGs
: Supplies stable random number generators for reproducible optimization results.GlobalSensitivity
: Provides tools for global sensitivity analysis, including Sobol indices and variance-based sensitivity analysis.Sindbad
: Provides the core SINDBAD models and types.SindbadUtils
: Provides utility functions for handling NamedTuple, spatial operations, and other helper tasks for spatial and temporal operations.SindbadSetup
: Provides the SINDBAD setup.SindbadTEM
: Provides the SINDBAD Terrestrial Ecosystem Model (TEM) as the target for optimization tasks.SindbadMetrics
: Supplies metrics for evaluating model performance, which are used in cost function calculations.
Included Files:
prepOpti.jl
:
- Prepares the necessary inputs and configurations for running optimization routines.
optimizer.jl
:
- Implements the core optimization logic, including merging algorithm options and selecting optimization methods.
cost.jl
:
- Defines cost functions for evaluating the loss of SINDBAD models against observations.
optimizeTEM.jl
:
Provides functions for optimizing SINDBAD TEM parameters for single locations or small spatial grids.
Functionality to handle optimization using large-scale 3D data YAXArrays cubes, enabling parameter calibration across spatial dimensions.
sensitivityAnalysis.jl
:
- Provides functions for performing sensitivity analysis on SINDBAD models, including global sensitivity analysis and local sensitivity analysis.
Note
The package integrates multiple optimization libraries, allowing users to choose the most suitable algorithm for their problem.
Designed to be modular and extensible, enabling users to customize optimization workflows for specific use cases.
Supports both gradient-based and derivative-free optimization methods, ensuring flexibility for different types of cost functions.
Examples:
- Running an experiment:
using SindbadExperiment
# Set up experiment parameters
experiment_config = ...
# Run the experiment
runExperimentOpti(experiment_config)
- Running a CMA-ES optimization:
using SindbadOptimization
optimized_params = optimizer(cost_function, default_values, lower_bounds, upper_bounds, algo_options, CMAEvolutionStrategyCMAES())
Exported
SindbadOptimization.cost Function
cost(parameter_vector, default_values, selected_models, space_forcing, space_spinup_forcing, loc_forcing_t, output_array, space_output, space_land, tem_info, observations, parameter_updater, cost_options, multi_constraint_method, parameter_scaling_type, cost_method<: CostMethod)
Calculate the cost for a parameter vector.
Arguments
parameter_vector
: Vector of parameter values to be optimized'default_values': Default values for model parameters
selected_models
: Collection of selected models for simulationspace_forcing
: Forcing data for the main simulation periodspace_spinup_forcing
: Forcing data for the spin-up periodloc_forcing_t
: Time-specific forcing dataoutput_array
: Array to store simulation outputsspace_output
: Spatial output configurationspace_land
: Land surface characteristicstem_info
: Temporal information for simulationobservations
: Observed data for comparisonparameter_updater
: Function to update parameterscost_options
: Options for cost function calculationmulti_constraint_method
: Method for handling multiple constraintsparameter_scaling_type
: Type of parameter scalingsindbad_cost_method <: CostMethod
: a type parameter indicating cost calculation method
Returns
Cost value representing the difference between model outputs and observations
sindbad_cost_method:
CostMethod
Abstract type for cost calculation methods in SINDBAD
Available methods/subtypes:
CostModelObs
: cost calculation between model output and observationsCostModelObsLandTS
: cost calculation between land model output and time series observationsCostModelObsMT
: multi-threaded cost calculation between model output and observationsCostModelObsPriors
: cost calculation between model output, observations, and priors. NOTE THAT THIS METHOD IS JUST A PLACEHOLDER AND DOES NOT CALCULATE PRIOR COST PROPERLY YET
SindbadOptimization.costLand Function
costLand(parameter_vector::AbstractArray, selected_models, forcing, spinup_forcing, loc_forcing_t, land_timeseries, land_init, tem_info, observations, parameter_updater, cost_options, multi_constraint_method, parameter_scaling_type)
costLand(parameter_vector::AbstractArray, selected_models, forcing, spinup_forcing, loc_forcing_t, _, land_init, tem_info, observations, parameter_updater, cost_options, multi_constraint_method, parameter_scaling_type)
Calculates the cost of SINDBAD model simulations for a single location by comparing model outputs as collections of SINDBAD land
with observations using specified metrics and constraints.
In the first variant, the land_time_series
is preallocated for computational efficiency. In the second variant, the runTEM stacks the land using map function and the preallocations is not necessary.
Arguments:
parameter_vector::AbstractArray
: A vector of model parameter values to be optimized.selected_models
: A tuple of selected SINDBAD models in the given model structure, the parameters of which are optimized.forcing
: A forcing NamedTuple containing the time series of environmental drivers for the simulation.spinup_forcing
: A forcing NamedTuple for the spinup phase, used to initialize the model to a steady state.loc_forcing_t
: A forcing NamedTuple for a single location and a single time step.land_timeseries
: A preallocated vector to store the land state for each time step during the simulation.land_init
: The initial SINDBAD land NamedTuple containing all fields and subfields.tem_info
: A nested NamedTuple containing necessary information for running SINDBAD TEM, including helpers, models, and spinup configurations.observations
: A NamedTuple or vector of arrays containing observational data, uncertainties, and masks for calculating performance metrics.parameter_updater
: A function to update model parameters based on theparameter_vector
.cost_options
: A table specifying how each observation constraint should be used to calculate the cost or performance metric.multi_constraint_method
: A method for combining the vector of costs into a single cost value or vector, as required by the optimization algorithm.parameter_scaling_type
: Specifies the type of scaling applied to the parameters during optimization.
Returns:
cost_metric
: A scalar or vector representing the cost, calculated by comparing model outputs with observations using the specified metrics and constraints.
Note
The function updates the selected models using the
parameter_vector
andparameter_updater
.It runs the SINDBAD TEM simulation for the specified location using
runTEM
.The model outputs are compared with observations using
metricVector
, which calculates the performance metrics.The resulting cost vector is combined into a single cost value or vector using
combineMetric
and the specifiedmulti_constraint_method
.
Examples:
- Calculating cost for a single location:
cost = costLand(parameter_vector, selected_models, forcing, spinup_forcing, loc_forcing_t, land_timeseries, land_init, tem_info, observations, parameter_updater, cost_options, multi_constraint_method, parameter_scaling_type)
- Using a custom multi-constraint method:
custom_method = CustomConstraintMethod()
cost = costLand(parameter_vector, selected_models, forcing, spinup_forcing, loc_forcing_t, land_timeseries, land_init, tem_info, observations, parameter_updater, cost_options, custom_method, parameter_scaling_type)
- Handling observational uncertainties:
- Observations can include uncertainties and masks to refine the cost calculation, ensuring robust model evaluation.
SindbadOptimization.getCostVectorSize Function
getCostVectorSize(algo_options, parameter_vector, ::OptimizationMethod || GSAMethod)
Calculates the size of the cost vector required for a specific optimization or sensitivity analysis method.
Arguments:
algo_options
: A NamedTuple or dictionary containing algorithm-specific options (e.g., population size, number of trajectories).parameter_vector
: A vector of parameters used in the optimization or sensitivity analysis.::OptimizationMethod
: The optimization or sensitivity analysis method. Supported methods include:CMAEvolutionStrategyCMAES
: Covariance Matrix Adaptation Evolution Strategy.GSAMorris
: Morris method for global sensitivity analysis.GSASobol
: Sobol method for global sensitivity analysis.GSASobolDM
: Sobol method with Design Matrices.
Returns:
- An integer representing the size of the cost vector required for the specified method.
Notes:
For
CMAEvolutionStrategyCMAES
, the size is determined by the population size or a default formula based on the parameter vector length.For
GSAMorris
, the size is calculated as the product of the number of trajectories and the length of the design matrix.For
GSASobol
, the size is determined by the number of parameters and the number of samples.For
GSASobolDM
, the size is equivalent to that ofGSASobol
.
SindbadOptimization.globalSensitivity Function
globalSensitivity(cost_function, method_options, p_bounds, ::GSAMethod; batch=true)
Performs global sensitivity analysis using the specified method.
Arguments:
cost_function
: A function that computes the cost or output of the model based on input parameters.method_options
: A set of options specific to the chosen sensitivity analysis method.p_bounds
: A vector or matrix specifying the bounds of the parameters for sensitivity analysis.::GSAMethod
: The sensitivity analysis method to use.batch
: A boolean flag indicating whether to perform batch processing (default:true
).
Returns:
- A
results
object containing the sensitivity indices and other relevant outputs for the specified method.
algorithm:
GSAMethod
Abstract type for global sensitivity analysis methods in SINDBAD
Available methods/subtypes:
GSAMorris
: Morris method for global sensitivity analysisGSASobol
: Sobol method for global sensitivity analysisGSASobolDM
: Sobol method with derivative-based measures for global sensitivity analysis
Extended help
Notes:
The function internally calls the
gsa
function from the GlobalSensitivity.jl package with the specified method and options.The
cost_function
should be defined to compute the model output based on the input parameters.The
method_options
argument allows fine-tuning of the sensitivity analysis process for each method.
SindbadOptimization.optimizeTEM Function
optimizeTEM(forcing::NamedTuple, observations, info::NamedTuple)
Arguments:
forcing
: a forcing NT that contains the forcing time series set for ALL locationsobservations
: a NT or a vector of arrays of observations, their uncertainties, and mask to use for calculation of performance metric/lossinfo
: a SINDBAD NT that includes all information needed for setup and execution of an experiment
SindbadOptimization.optimizeTEMYax Method
optimizeTEMYax(forcing::NamedTuple, output::NamedTuple, tem::NamedTuple, optim::NamedTuple, observations::NamedTuple; max_cache=1e9)
Optimizes the Terrestrial Ecosystem Model (TEM) parameters for each pixel by mapping over the YAXcube(s).
Arguments
forcing::NamedTuple
: Input forcing data for the TEM modeloutput::NamedTuple
: Output configuration settingstem::NamedTuple
: TEM model parameters and settingsoptim::NamedTuple
: Optimization parameters and settingsobservations::NamedTuple
: Observed data for model calibration
Keywords
max_cache::Float64=1e9
: Maximum cache size for optimization process
Returns
Optimized TEM parameters cube
SindbadOptimization.optimizer Function
optimizer(cost_function, default_values, lower_bounds, upper_bounds, algo_options, algorithm <: OptimizationMethod)
Optimize model parameters using various optimization algorithms.
Arguments:
cost_function
: A function handle that takes a parameter vector as input and calculates a cost/loss (scalar or vector).default_values
: A vector of default parameter values to initialize the optimization.lower_bounds
: A vector of lower bounds for the parameters.upper_bounds
: A vector of upper bounds for the parameters.algo_options
: A set of options specific to the chosen optimization algorithm.algorithm
: The optimization algorithm to be used.
Returns:
optim_para
: A vector of optimized parameter values.
algorithm:
OptimizationMethod
Abstract type for optimization methods in SINDBAD
Available methods/subtypes:
BayesOptKMaternARD5
: Bayesian Optimization using Matern 5/2 kernel with Automatic Relevance Determination from BayesOpt.jlCMAEvolutionStrategyCMAES
: Covariance Matrix Adaptation Evolution Strategy (CMA-ES) from CMAEvolutionStrategy.jlEvolutionaryCMAES
: Evolutionary version of CMA-ES optimization from Evolutionary.jlOptimBFGS
: Broyden-Fletcher-Goldfarb-Shanno (BFGS) from Optim.jlOptimLBFGS
: Limited-memory Broyden-Fletcher-Goldfarb-Shanno (L-BFGS) from Optim.jlOptimizationBBOadaptive
: Black Box Optimization with adaptive parameters from Optimization.jlOptimizationBBOxnes
: Black Box Optimization using Natural Evolution Strategy (xNES) from Optimization.jlOptimizationBFGS
: BFGS optimization with box constraints from Optimization.jlOptimizationFminboxGradientDescent
: Gradient descent optimization with box constraints from Optimization.jlOptimizationFminboxGradientDescentFD
: Gradient descent optimization with box constraints using forward differentiation from Optimization.jlOptimizationGCMAESDef
: Global CMA-ES optimization with default settings from Optimization.jlOptimizationGCMAESFD
: Global CMA-ES optimization using forward differentiation from Optimization.jlOptimizationMultistartOptimization
: Multi-start optimization to find global optimum from Optimization.jlOptimizationNelderMead
: Nelder-Mead simplex optimization method from Optimization.jlOptimizationQuadDirect
: Quadratic Direct optimization method from Optimization.jl
Extended help
Notes:
The function supports a wide range of optimization algorithms, each tailored for specific use cases.
Some methods do not require bounds for optimization, while others do.
The
cost_function
should be defined by the user to calculate the loss based on the model output and observations. It is defined in cost.jl.The
algo_options
argument allows fine-tuning of the optimization process for each algorithm.Some algorithms (e.g.,
BayesOptKMaternARD5
,OptimizationBBOxnes
) require additional configuration steps, such as setting kernels or merging default and user-defined options.
Examples:
- Using CMAES from CMAEvolutionStrategy.jl:
optim_para = optimizer(cost_function, default_values, lower_bounds, upper_bounds, algo_options, CMAEvolutionStrategyCMAES())
- Using BFGS from Optim.jl:
optim_para = optimizer(cost_function, default_values, lower_bounds, upper_bounds, algo_options, OptimBFGS())
- Using Black Box Optimization (xNES) from Optimization.jl:
optim_para = optimizer(cost_function, default_values, lower_bounds, upper_bounds, algo_options, OptimizationBBOxnes())
Implementation Details:
The function internally calls the appropriate optimization library and algorithm based on the
algorithm
argument.Each algorithm has its own implementation details, such as handling bounds, configuring options, and solving the optimization problem.
The results are processed to extract the optimized parameter vector (
optim_para
), which is returned to the user.
SindbadOptimization.prepOpti Function
prepOpti(forcing, observations, info, cost_method::CostModelObs)
Prepares optimization parameters, settings, and helper functions based on the provided inputs.
Arguments:
forcing
: Input forcing data used for the optimization process.observations
: Observed data used for comparison or calibration during optimization.info
: A SINDBAD NamedTuple containing all information needed for setup and execution of the experiment.cost_method
: The method used to calculate the cost function.
Returns:
- A NamedTuple
opti_helpers
containing:parameter_table
: Processed model parameters for optimization.cost_function
: A function to compute the cost for optimization.cost_options
: Options and settings for the cost function.default_values
: Default parameter values for the models.lower_bounds
: Lower bounds for the parameters.upper_bounds
: Upper bounds for the parameters.run_helpers
: Helper information for running the optimization.
cost_method:
CostMethod
Abstract type for cost calculation methods in SINDBAD
Available methods/subtypes:
CostModelObs
: cost calculation between model output and observationsCostModelObsLandTS
: cost calculation between land model output and time series observationsCostModelObsMT
: multi-threaded cost calculation between model output and observationsCostModelObsPriors
: cost calculation between model output, observations, and priors. NOTE THAT THIS METHOD IS JUST A PLACEHOLDER AND DOES NOT CALCULATE PRIOR COST PROPERLY YET
Extended help
Notes:
The function processes the input data and configuration to set up the optimization problem.
It prepares model parameters, cost options, and helper functions required for the optimization process.
Depending on the
cost_method
, the cost function is customized to handle specific data types or computation methods.
SindbadOptimization.prepParameters Method
prepParameters(parameter_table, parameter_scaling)
Prepare model parameters for optimization by processing default and bounds of the parameters to be optimized.
Arguments
parameter_table
: Table of the parameters to be optimizedparameter_scaling
: Scaling method/type for parameter optimization
Returns
A tuple containing processed parameters ready for optimization
Internal
SindbadOptimization.optimizeYax Method
optimizeYax(map_cubes...; out::NamedTuple, tem::NamedTuple, optim::NamedTuple, forcing_vars::AbstractArray, obs_vars::AbstractArray)
A helper function to optimize parameters for each pixel by mapping over the YAXcube(s).
Arguments
map_cubes...
: Variadic input of cube maps to be optimizedout::NamedTuple
: Output configuration parameterstem::NamedTuple
: TEM (Terrestrial Ecosystem Model) configuration parametersoptim::NamedTuple
: Optimization configuration parametersforcing_vars::AbstractArray
: Array of forcing variables used in optimizationobs_vars::AbstractArray
: Array of observation variables used in optimization
SindbadOptimization.unpackYaxOpti Method
unpackYaxOpti(args; forcing_vars::AbstractArray)
Unpacks the variables for the mapCube function
Arguments
all_cubes
: Collection of cubes containing input, output and optimization/observation variablesforcing_vars::AbstractArray
: Array specifying which variables should be forced/constrained
Returns
Unpacked data arrays