Skip to content
SindbadML Module
julia
SindbadML

The SindbadML package provides the core functionality for integrating machine learning (ML) and hybrid modeling capabilities into the SINDBAD framework. It enables the use of neural networks and other ML models alongside process-based models for parameter learning, and potentially hybrid modeling, and advanced optimization.

Purpose

This package brings together all components required for hybrid (process-based + ML) modeling in SINDBAD, including data preparation, model construction, training routines, gradient computation, and optimizer management. It supports flexible configuration, cross-validation, and seamless integration with SINDBAD's process-based modeling workflows.

Dependencies

  • Distributed: Parallel and distributed computing utilities (nworkers, pmap, workers, nprocs, CachingPool).

  • Sindbad, SindbadTEM, SindbadSetup: Core SINDBAD modules for process-based modeling and setup.

  • SindbadData.YAXArrays, SindbadData.Zarr, SindbadData.AxisKeys, SindbadData: Data handling, array, and cube utilities.

  • SindbadMetrics: Metrics for model performance/loss evaluation.

  • Enzyme, Zygote, ForwardDiff, FiniteDiff, FiniteDifferences, PolyesterForwardDiff: Automatic and numerical differentiation libraries for gradient-based learning.

  • Flux: Neural network layers and training utilities for ML models.

  • Optimisers: Optimizers for training neural networks.

  • Statistics: Statistical utilities.

  • ProgressMeter: Progress bars for ML training and evaluation (@showprogress, Progress, next!, progress_pmap, progress_map).

  • PreallocationTools: Tools for efficient memory allocation.

  • Base.Iterators: Iterators for batching and repetition (repeated, partition).

  • Random: Random number utilities.

  • JLD2: For saving and loading model checkpoints and fold indices.

Included Files

  • utilsML.jl: Utility functions for ML workflows.

  • diffCaches.jl: Caching utilities for differentiation.

  • activationFunctions.jl: Implements various activation functions, including custom and Flux-provided activations.

  • mlModels.jl: Constructors and utilities for building neural network models and other ML architectures.

  • mlOptimizers.jl: Functions for creating and configuring optimizers for ML training.

  • loss.jl: Loss functions and utilities for evaluating model performance and computing gradients.

  • prepHybrid.jl: Prepares all data structures, loss functions, and ML components required for hybrid modeling, including data splits and feature extraction.

  • mlGradient.jl: Routines for computing gradients using different libraries and methods, supporting both automatic and finite difference differentiation.

  • mlTrain.jl: Training routines for ML and hybrid models, including batching, checkpointing, and evaluation.

  • neuralNetwork.jl: Neural network utilities and architectures.

  • siteLosses.jl: Site-specific loss calculation utilities.

  • oneHots.jl: One-hot encoding utilities.

  • loadCovariates.jl: Functions for loading and handling covariate data.

Notes

  • The package is modular and extensible, allowing users to add new ML models, optimizers, activation functions, and training methods.

  • It is tightly integrated with the SINDBAD ecosystem, ensuring consistent data handling and reproducibility across hybrid and process-based modeling workflows.

Exported

SindbadML.JoinDenseNN Method
julia
JoinDenseNN(models::Tuple)

Arguments:

  • models :: a tuple of models, i.e. (m1, m2)

Returns:

  • all parameters as a vector or matrix (multiple samples)

Example

julia
using SindbadML
using Flux
using Random
Random.seed!(123)

m_big = Chain(Dense(4 => 5, relu), Dense(5 => 3), Flux.sigmoid)
m_eta = Dense(1=>1, Flux.sigmoid)

x_big_a = rand(Float32, 4, 10)
x_small_a1 = rand(Float32, 1, 10)
x_small_a2 = rand(Float32, 1, 10)

model = JoinDenseNN((m_big, m_eta))
model((x_big_a, x_small_a2))
SindbadML.activationFunction Function
julia
activationFunction(model_options, act::AbstractActivation)

Return the activation function corresponding to the specified activation type and model options.

This function dispatches on the activation type to provide the appropriate activation function for use in neural network layers. For custom activation types, relevant parameters can be passed via model_options.

Arguments

  • model_options: A struct or NamedTuple containing model options, including parameters for custom activation functions (e.g., k_σ for CustomSigmoid).

  • act: An activation type specifying the desired activation function. Supported types include:

    • FluxRelu: Rectified Linear Unit (ReLU) activation.

    • FluxTanh: Hyperbolic Tangent (tanh) activation.

    • FluxSigmoid: Sigmoid activation.

    • CustomSigmoid: Custom sigmoid activation with steepness parameter k_σ.

Returns

  • A callable activation function suitable for use in neural network layers.

Example

julia
act_fn = activationFunction(model_options, FluxRelu())
y = act_fn(x)
SindbadML.denseNN Method
julia
denseNN(in_dim::Int, n_neurons::Int, out_dim::Int; extra_hlayers=0, activation_hidden=Flux.relu, activation_out= Flux.sigmoid, seed=1618)

Arguments

  • in_dim: input dimension

  • n_neurons: number of neurons in each hidden layer

  • out_dim: output dimension

  • extra_hlayers=0: controls the number of extra hidden layers, default is zero

  • activation_hidden=Flux.relu: activation function within hidden layers, default is Relu

  • activation_out= Flux.sigmoid: activation of output layer, default is sigmoid

  • seed=1618: Random seed, default is ~ (1+√5)/2

Returns a Flux.Chain neural network.

SindbadML.destructureNN Method
julia
destructureNN(model; nn_opt=Optimisers.Adam())

Given a model returns a flat vector with all weights, a re structure of the neural network and the current state.

Arguments

  • model: a Flux.Chain neural network.

  • nn_opt: Optimiser, the default is Optimisers.Adam().

Returns:

  • flat :: a flat vector with all network weights

  • re :: an object containing the model structure, used later to reconstruct the neural network

  • opt_state :: the state of the optimiser

SindbadML.epochLossComponents Method
julia
epochLossComponents(loss_functions::F, loss_array_sites, loss_array_components, epoch_number, scaled_params, sites_list) where {F}

Compute and store the loss metrics and loss components for each site in parallel for a given training epoch.

This function evaluates the provided loss functions for each site using the current scaled parameters, and stores the resulting scalar loss metrics and loss component vectors in the corresponding arrays for the specified epoch. Parallel execution is used to accelerate computation across sites.

Arguments

  • loss_functions::F: An array or KeyedArray of loss functions, one per site (where F is a subtype of AbstractArray{<:Function}).

  • loss_array_sites: A matrix to store the scalar loss metric for each site and epoch (dimensions: site × epoch).

  • loss_array_components: A 3D tensor to store the loss components for each site, component, and epoch (dimensions: site × component × epoch).

  • epoch_number: The current epoch number (integer).

  • scaled_params: A callable or array providing the scaled parameters for each site (e.g., scaled_params(site=site_name)).

  • sites_list: List or array of site identifiers to process.

Notes

  • The function uses Julia's threading (Threads.@spawn) to compute losses for multiple sites in parallel.

  • Each site's loss metric and components are stored at the corresponding index for the current epoch.

  • Designed for use within training loops to track loss evolution over epochs.

Example

julia
epochLossComponents(loss_functions, loss_array_sites, loss_array_components, epoch, scaled_params, sites)
SindbadML.getCacheFromOutput Function
julia
getCacheFromOutput(loc_output, ::MLGradType)
getCacheFromOutput(loc_output, ::ForwardDiffGrad)
getCacheFromOutput(loc_output, ::PolyesterForwardDiffGrad)

Returns the appropriate Cache type based on the automatic differentiation or finite differences package being used.

Arguments

  • loc_output: The local output

  • Second argument specifies the differentiation method:

    • ForwardDiffGrad: Uses ForwardDiff.jl for automatic differentiation

    • MLGradType: All other libraries, e.g., FiniteDiff.jl,FiniteDifferences.jl, etc. for gradient calculations

    • PolyesterForwardDiffGrad: Uses PolyesterForwardDiff.jl for automatic differentiation

SindbadML.getIndicesSplit Function
julia
getIndicesSplit(info, sites, fold_type)

Determine the indices for training, validation, and testing site splits for hybrid (ML) modeling in SINDBAD.

This function dispatches on the fold_type argument to either load precomputed folds from file or to compute the splits on-the-fly based on the provided split ratios and number of folds.

Arguments

  • info: The SINDBAD experiment info structure, containing hybrid modeling configuration.

  • sites: Array of site identifiers (e.g., site names or indices).

  • fold_type: Determines the splitting strategy. Use LoadFoldFromFile() to load folds from file, or CalcFoldFromSplit() to compute splits dynamically.

Returns

  • indices_training: Indices of sites assigned to the training set.

  • indices_validation: Indices of sites assigned to the validation set.

  • indices_testing: Indices of sites assigned to the testing set.

Notes

  • When using LoadFoldFromFile, the function loads fold indices from the file specified in info.hybrid.fold.fold_path.

  • When using CalcFoldFromSplit, the function splits the sites according to the ratios and number of folds specified in info.hybrid.ml_training.options.

  • Ensures reproducibility by using the random seed from info.hybrid.random_seed when shuffling sites.

Example

julia
indices_train, indices_val, indices_test = getIndicesSplit(info, sites, info.hybrid.fold.fold_type)
SindbadML.getInnerArgs Method
julia
getInnerArgs(idx, grads_lib, scaled_params_batch, parameter_scaling_type, selected_models, space_forcing, space_spinup_forcing, loc_forcing_t, space_output, loc_land, tem_info, parameter_to_index, parameter_scaling_type, space_observations, cost_options, constraint_method, indices_batch, sites_batch)

Function to get inner arguments for the loss function.

Arguments

  • idx: index batch value

  • grads_lib: gradient library

  • scaled_params_batch: scaled parameters batch

  • selected_models: selected models

  • space_forcing: forcing data location

  • space_spinup_forcing: spinup forcing data location

  • loc_forcing_t: forcing data time for one time step.

  • space_output: output data location

  • loc_land: initial land state

  • tem_info: model information

  • parameter_to_index: parameter to index

  • parameter_scaling_type: type determining parameter scaling

  • loc_observations: observation data location

  • cost_options: cost options

  • constraint_method: constraint method

  • indices_batch: indices batch

  • sites_batch: sites batch

SindbadML.getLossForSites Method
julia
getLossForSites(gradient_lib, loss_function::F, loss_array_sites, loss_array_split, epoch_number, scaled_params, sites_list, indices_sites, models, space_forcing, space_spinup_forcing, loc_forcing_t, space_output, loc_land, tem_info, parameter_to_index, parameter_scaling_type, space_observations, cost_options, constraint_method) where {F}

Calculates the loss for all sites. The loss is calculated using the loss_function function. The loss_array_sites and loss_array_split arrays are updated with the loss values. The loss_array_sites array stores the loss values for each site and epoch, while the loss_array_split array stores the loss values for each model output and epoch.

Arguments

  • gradient_lib: gradient library

  • loss_function: loss function

  • loss_array_sites: array to store the loss values for each site and epoch

  • loss_array_split: array to store the loss values for each model output and epoch

  • epoch_number: epoch number

  • scaled_params: scaled parameters

  • sites_list: list of sites

  • indices_sites: indices of sites

  • models: list of models

  • space_forcing: forcing data location

  • space_spinup_forcing: spinup forcing data location

  • loc_forcing_t: forcing data time for one time step.

  • space_output: output data location

  • loc_land: initial land state

  • tem_info: model information

  • parameter_to_index: parameter to index

  • space_observations: observation data location

  • cost_options: cost options

  • constraint_method: constraint method

SindbadML.getLossFunctionHandles Method
julia
getLossFunctionHandles(info, run_helpers, sites)

Construct loss function handles for each site for use in hybrid (ML) modeling in SINDBAD.

This function generates callable loss functions and loss component functions for each site, encapsulating all necessary arguments and configuration from the experiment info and runtime helpers. These handles are used during training and evaluation to compute the loss and its components for each site efficiently.

Arguments

  • info: The SINDBAD experiment info structure, containing model, optimization, and hybrid configuration.

  • run_helpers: Helper object returned by prepTEM, containing prepared model, forcing, observation, and output structures.

  • sites: Array of site indices or identifiers for which to build loss functions.

Returns

  • loss_functions: A KeyedArray of callable loss functions, one per site. Each function takes model parameters as input and returns the scalar loss for that site.

  • loss_component_functions: A KeyedArray of callable functions, one per site, that return the vector of loss components (e.g., for multi-objective or constraint-based loss).

Notes

  • Each loss function is closed over all required data and options for its site, including model structure, parameter indices, scaling, forcing, observations, output cache, cost options, and hybrid/optimization settings.

  • The returned arrays are keyed by site for convenient lookup and iteration.

Example

julia
loss_functions, loss_component_functions = getLossFunctionHandles(info, run_helpers, sites)
site_loss = loss_functions[site_index](params)
site_loss_components = loss_component_functions[site_index](params)
SindbadML.getOutputFromCache Function
julia
getOutputFromCache(loc_output, _, ::MLGradType)
getOutputFromCache(loc_output, new_params, ::ForwardDiffGrad)
getOutputFromCache(loc_output, new_params, ::PolyesterForwardDiffGrad)

Retrieves output values from Cache based on the differentiation method being used.

Arguments

  • loc_output: The cached output values

  • _ or new_params: Additional parameters (only used with ForwardDiff)

  • Third argument specifies the differentiation method:

    • MLGradType: Returns cached output directly when using other libraries, e.g., FiniteDiff.jl, FiniteDifferences.jl, etc.

    • ForwardDiffGrad: Processes cached output with new parameters when using ForwardDiff.jl, returns get_tmp.(loc_output, (new_params,))

    • PolyesterForwardDiffGrad: Calls cached output with new parameters using ForwardDiff.jl

SindbadML.getParamsAct Method
julia
getParamsAct(x, parameter_table)

Scales x values in the [0,1] interval to some given lower lo_b and upper up_b bounds.

Arguments

  • x: vector array

  • parameter_table: a Table with input fields default, lower and upper that match the x vector.

Returns a vector array with new values scaled into the new interval [lower, upper].

SindbadML.getPullback Function
julia
getPullback(flat, re, features::AbstractArray)
getPullback(flat, re, features::Tuple)

Arguments:

  • flat :: weight parameters.

  • re :: model structure (vanilla Chain Dense Layers).

  • features :: n predictors and s samples.

    • A vector of predictors

    • A matrix of predictors: (p_n x s)

    • A tuple vector of predictors: (p1, p2)

    • A tuple of matrices of predictors: [(p1_n x s), (p2_n x s)]

Returns:

  • new parameters and pullback function

Example

Here we do one input features vector or matrix.

julia
using SindbadML
using Flux
# model
m = Chain(Dense(4 => 5, relu), Dense(5 => 3), Flux.sigmoid)
# features
_feat = rand(Float32, 4)
# apply
flat, re = destructureNN(m)
# Zygote
new_params, pullback_func = getPullback(flat, re, _feat)
# ? or
_feat_ns = rand(Float32, 4, 3) # `n` predictors and `s` samples.
new_params, pullback_func = getPullback(flat, re, _feat_ns)

Example

Here we do one multiple input features vector or matrix.

julia
using SindbadML
using Flux
# model
m1 = Chain(Dense(4 => 5, relu), Dense(5 => 3), Flux.sigmoid)
m2 = Dense(2=>1, Flux.sigmoid)
combo_ms = JoinDenseNN((m1, m2))
# features
_feat1 = rand(Float32, 4)
_feat2 = rand(Float32, 2)
# apply
flat, re = destructureNN(combo_ms)
# Zygote
new_params, pullback_func = getPullback(flat, re, (_feat1, _feat2))
# ? or with multiple samples
_feat1_ns = rand(Float32, 4, 3) # `n` predictors and `s` samples.
_feat2_ns = rand(Float32, 2, 3) # `n` predictors and `s` samples.
new_params, pullback_func = getPullback(flat, re, (_feat1_ns, _feat2_ns))
SindbadML.gradientBatch! Function
julia
gradientBatch!(grads_lib, grads_batch, chunk_size::Int, loss_f::Function, get_inner_args::Function, input_args...; showprog=false)
gradientBatch!(grads_lib, grads_batch, gradient_options::NamedTuple, loss_functions, scaled_params_batch, sites_batch; showprog=false)

Compute gradients for a batch of samples in hybrid (ML) modeling in SINDBAD.

This function computes the gradients of the loss function with respect to model parameters for a batch of sites or samples, using the specified gradient library. It supports both distributed and multi-threaded execution, and can handle different gradient computation backends (e.g., PolyesterForwardDiff, ForwardDiff, FiniteDiff, etc.).

Arguments

  • grads_lib: Gradient computation library or method. Supported types include:

    • PolyesterForwardDiffGrad: Uses PolyesterForwardDiff.jl for multi-threaded chunked gradients.

    • Other MLGradType subtypes: Use their respective backend.

  • grads_batch: Pre-allocated array for storing batched gradients (size: n_parameters × n_samples).

  • chunk_size: (Optional) Chunk size for threaded gradient computation (used by PolyesterForwardDiffGrad).

  • gradient_options: (Optional) NamedTuple of gradient options (e.g., chunk size).

  • loss_f: Loss function to be applied (for all samples).

  • get_inner_args: Function to obtain inner arguments for the loss function.

  • input_args: Global input arguments for the batch.

  • loss_functions: Array or KeyedArray of loss functions, one per site.

  • scaled_params_batch: Callable or array providing scaled parameters for each site.

  • sites_batch: List or array of site identifiers for the batch.

  • showprog: (Optional) If true, display a progress bar during computation (default: false).

Returns

  • Updates grads_batch in-place with computed gradients for each sample in the batch.

Notes

  • The function automatically selects between distributed (pmap) and multi-threaded (Threads.@spawn) execution depending on the backend and arguments.

  • Designed for use within training loops for efficient batch gradient computation.

Example

julia
gradientBatch!(grads_lib, grads_batch, (chunk_size=4,), loss_functions, scaled_params_batch, sites_batch; showprog=true)
SindbadML.gradientSite Function
julia
gradientSite(grads_lib, x_vals, chunk_size::Int, loss_f::Function, args...)
gradientSite(grads_lib, x_vals, gradient_options::NamedTuple, loss_f::Function)
gradientSite(grads_lib, x_vals::AbstractArray, gradient_options::NamedTuple, loss_f::Function)

Compute gradients of the loss function with respect to model parameters for a single site using the specified gradient library.

This function dispatches on the type of grads_lib to select the appropriate differentiation backend (e.g., PolyesterForwardDiff, ForwardDiff, FiniteDiff, FiniteDifferences, Zygote, or Enzyme). It supports both threaded and single-threaded computation, as well as chunked evaluation for memory and speed trade-offs.

Arguments

  • grads_lib: Gradient computation library or method. Supported types include:

    • PolyesterForwardDiffGrad: Uses PolyesterForwardDiff.jl for multi-threaded chunked gradients.

    • ForwardDiffGrad: Uses ForwardDiff.jl for automatic differentiation.

    • FiniteDiffGrad: Uses FiniteDiff.jl for finite difference gradients.

    • FiniteDifferencesGrad: Uses FiniteDifferences.jl for finite difference gradients.

    • ZygoteGrad: Uses Zygote.jl for reverse-mode automatic differentiation.

    • EnzymeGrad: Uses Enzyme.jl for AD (experimental).

  • x_vals: Parameter values for which to compute gradients.

  • chunk_size: (Optional) Chunk size for threaded gradient computation (used by PolyesterForwardDiffGrad).

  • gradient_options: (Optional) NamedTuple of gradient options (e.g., chunk size).

  • loss_f: Loss function to be differentiated.

  • args...: Additional arguments to be passed to the loss function.

Returns

  • ∇x: Array of gradients of the loss function with respect to x_vals.

Notes

  • On Apple M1 systems, PolyesterForwardDiffGrad falls back to single-threaded ForwardDiff due to closure issues.

  • The function is used internally for both site-level and batch-level gradient computation in hybrid ML training.

Example

julia
grads = gradientSite(ForwardDiffGrad(), x_vals, (chunk_size=4,), loss_f)
SindbadML.gradsNaNCheck! Method
julia
gradsNaNCheck!(grads_batch, _params_batch, sites_batch, parameter_table; show_params_for_nan=false)

Utility function to check if some calculated gradients were NaN (if found please double check your approach). This function will replace those NaNs with 0.0f0.

Arguments

  • grads_batch: gradients array.

  • _params_batch: parameters values.

  • sites_batch: sites names.

  • parameter_table: parameters table.

  • show_params_for_nan=false: if true, it will show the parameters that caused the NaNs.

SindbadML.lcKAoneHotbatch Method
julia
lcKAoneHotbatch(lc_data, up_bound, lc_name, ka_labels)

Arguments

  • lc_data: Vector array

  • up_bound: last index class, the range goes from 1:up_bound, and any case not in that range uses the up_bound value. For PFT use 17 and for KG 32.

  • lc_name: land cover approach, either KG or PFT.

  • ka_labels: KeyedArray labels, i.e. site names

SindbadML.loadCovariates Method
julia
loadCovariates(sites_forcing; kind="all")

use the kind argument to select different sets of covariates

Arguments

  • sites_forcing: names of forcing sites

  • kind: defaults to "all"

Other options

  • PFT

  • KG

  • KG_PFT

  • PFT_ABCNOPSWB

  • KG_ABCNOPSWB

  • ABCNOPSWB

  • veg_all

  • veg

  • KG_veg

  • veg_ABCNOPSWB

SindbadML.loadTrainedNN Method
julia
loadTrainedNN(path_model)

Arguments

  • path_model: path to the model.
SindbadML.loss Method
julia
loss(params, models, parameter_to_index, parameter_scaling_type, loc_forcing, loc_spinup_forcing, loc_forcing_t, loc_output, land_init, tem_info, loc_obs, cost_options, constraint_method, gradient_lib, ::LossModelObsML)

Calculates the scalar loss for a given site in hybrid (ML) modeling in SINDBAD.

This function computes the loss value for a given site by first calling lossVector to obtain the vector of loss components, and then combining them into a scalar loss using the combineMetric function and the specified constraint method.

Arguments

  • params: Model parameters (typically output from an ML model).

  • models: List of process-based models.

  • parameter_to_index: Mapping from parameter names to indices.

  • parameter_scaling_type: Parameter scaling configuration.

  • loc_forcing: Forcing data for the site.

  • loc_spinup_forcing: Spinup forcing data for the site.

  • loc_forcing_t: Forcing data for a single time step.

  • loc_output: Output data structure for the site.

  • land_init: Initial land state.

  • tem_info: Model information and configuration.

  • loc_obs: Observation data for the site.

  • cost_options: Cost function and metric configuration.

  • constraint_method: Constraint method for combining metrics.

  • gradient_lib: Gradient computation library or method.

  • ::LossModelObsML: Type dispatch for loss model with observations and machine learning.

Returns

  • t_loss: Scalar loss value for the site.

Notes

  • This function is used internally by higher-level training and evaluation routines.

  • The loss is computed by aggregating the loss vector using the specified constraint method.

Example

julia
t_loss = loss(params, models, parameter_to_index, parameter_scaling_type, loc_forcing, loc_spinup_forcing, loc_forcing_t, loc_output, land_init, tem_info, loc_obs, cost_options, constraint_method, gradient_lib, LossModelObsML())
SindbadML.lossSite Method
julia
lossSite(new_params, gradient_lib, models, loc_forcing, loc_spinup_forcing, loc_forcing_t, loc_output, land_init, tem_info, parameter_to_index, parameter_scaling_type, loc_obs, cost_options, constraint_method; optim_mode=true)

Function to calculate the loss for a given site. This is used for optimization, hence the optim_mode argument is set to true by default. Also, a gradient library should be set as well as new parameters to update the models. See all input arguments in the function:

Arguments

  • new_params: new parameters

  • gradient_lib: gradient library

  • models: list of models

  • loc_forcing: forcing data location

  • loc_spinup_forcing: spinup forcing data location

  • loc_forcing_t: forcing data time for one time step.

  • loc_output: output data location

  • land_init: initial land state

  • tem_info: model information

  • parameter_to_index: parameter to index

  • loc_obs: observation data location

  • cost_options: cost options

  • constraint_method: constraint method

SindbadML.lossVector Method
julia
lossVector(params, models, parameter_to_index, parameter_scaling_type, loc_forcing, loc_spinup_forcing, loc_forcing_t, loc_output, land_init, tem_info, loc_obs, cost_options, constraint_method, gradient_lib, ::LossModelObsML)

Calculate the loss vector for a given site in hybrid (ML) modeling in SINDBAD.

This function runs the core TEM model with the provided parameters, forcing data, initial land state, and model information, then computes the loss vector using the specified cost options and metrics. It is typically used for site-level loss evaluation during training and validation.

Arguments

  • params: Model parameters (in this case, output from an ML model).

  • models: List of process-based models.

  • parameter_to_index: Mapping from parameter names to indices.

  • parameter_scaling_type: Parameter scaling configuration.

  • loc_forcing: Forcing data for the site.

  • loc_spinup_forcing: Spinup forcing data for the site.

  • loc_forcing_t: Forcing data for a single time step.

  • loc_output: Output data structure for the site.

  • land_init: Initial land state.

  • tem_info: Model information and configuration.

  • loc_obs: Observation data for the site.

  • cost_options: Cost function and metric configuration.

  • constraint_method: Constraint method for combining metrics.

  • gradient_lib: Gradient computation library or method.

  • ::LossModelObsML: Type dispatch for loss model with observations and machine learning.

Returns

  • loss_vector: Vector of loss components for the site.

  • loss_indices: Indices corresponding to each loss component.

Notes

  • This function is used internally by higher-level loss and training routines.

  • The loss vector is typically combined into a scalar loss using combineMetric.

Example

julia
loss_vec, loss_idx = lossVector(params, models, parameter_to_index, parameter_scaling_type, loc_forcing, loc_spinup_forcing, loc_forcing_t, loc_output, land_init, tem_info, loc_obs, cost_options, constraint_method, gradient_lib, LossModelObsML())
SindbadML.mixedGradientTraining Method
julia
mixedGradientTraining(grads_lib, nn_model, train_refs, test_val_refs, loss_fargs, forward_args; n_epochs=3, optimizer=Optimisers.Adam(), path_experiment="/")

Training function that computes model parameters using a neural network, which are then used by process-based models (PBMs) to estimate parameter gradients. Neural network weights are updated using the product of these gradients with the neural network's Jacobian.

Arguments

  • grads_lib: Library to compute PBMs parameter gradients.

  • nn_model: A Flux.Chain neural network.

  • train_refs: training data features.

  • test_val_refs: test and validation data features.

  • loss_fargs: functions used to calculate the loss.

  • forward_args: arguments to evaluate the PBMs.

  • path_experiment="/": save model to path.

SindbadML.mlModel Function
julia
mlModel(info, n_features, ::MLModelType)

Builds a Flux dense neural network model. This function initializes a neural network model based on the provided info and n_features.

Arguments

  • info: The experiment information containing model options and parameters.

  • n_features: The number of features in the input data.

  • ::MLModelType: Type dispatch for the machine learning model type.

Supported MLModelType:

  • ::FluxDenseNN: A simple dense neural network model implemented in Flux.jl.

Returns

The initialized machine learning model.

SindbadML.mlOptimizer Function
julia
mlOptimizer(optimizer_options, ::MLOptimizerType)

Create a ML optimizer from the given options and type. The optimizer is created using the given options and type. The options are passed to the constructor of the optimizer.

Arguments:

  • optimizer_options: A dictionary or NamedTuple containing options for the optimizer.

  • ::MLOptimizerType: The type used to determine which optimizer to create. Supported types include:

    • OptimisersAdam: For Adam optimizer.

    • OptimisersDescent: For Descent optimizer.

.

Returns:

  • A ML optimizer object that can be used to optimize machine learning models.
SindbadML.oneHotPFT Method
julia
oneHotPFT(pft, up_bound, veg_class)

Arguments

  • pft: (Plant Functional Type). Any entry not in 1:17 would be set to the last index, this includes NaN! Last index is water/NaN

  • up_bound: last index class, the range goes from 1:up_bound, and any case not in that range uses the up_bound value. For PFT use 17.

  • veg_class: true or false.

Returns a vector.

SindbadML.partitionBatches Method
julia
partitionBatches(n; batch_size=32)

Return an Iterator partitioning a dataset into batches.

Arguments

  • n: number of samples

  • batch_size: batch size

SindbadML.prepHybrid Method
julia
prepHybrid(forcing, observations, info, ::MLTrainingType)

Prepare all data structures, loss functions, and machine learning components required for hybrid (process-based + machine learning) modeling in SINDBAD.

This function orchestrates the setup for hybrid modeling by:

  • Initializing model helpers and runtime structures.

  • Building loss function handles for each site.

  • Splitting sites into training, validation, and testing sets according to the hybrid configuration.

  • Loading covariate features for all sites.

  • Building the machine learning model as specified in the configuration.

  • Preparing arrays for storing losses and loss components during training and evaluation.

  • Initializing the optimizer for ML training.

  • Collecting all relevant metadata and configuration into a single hybrid_helpers NamedTuple for downstream training routines.

Arguments

  • forcing: Forcing data structure as required by the process-based model.

  • observations: Observational data structure.

  • info: The SINDBAD experiment info structure, containing all configuration and runtime options.

  • ::MLTrainingType: Type specifying the ML training method to use (e.g., MixedGradient).

Returns

  • hybrid_helpers: A NamedTuple containing all prepared data, models, loss functions, indices, features, optimizers, and arrays needed for hybrid ML training and evaluation.

Fields of hybrid_helpers

  • run_helpers: Output of prepTEM, containing prepared model, forcing, observation, and output structures.

  • sites: NamedTuple with training, validation, and testing site arrays.

  • indices: NamedTuple with indices for training, validation, and testing sites.

  • features: NamedTuple with n_features and data (covariate features for all sites).

  • ml_model: The machine learning model instance (e.g., a Flux neural network).

  • options: The info.hybrid configuration NamedTuple.

  • checkpoint_path: Path for saving checkpoints during training.

  • parameter_table: Parameter table from info.optimization.

  • loss_functions: KeyedArray of callable loss functions, one per site.

  • loss_component_functions: KeyedArray of callable loss component functions, one per site.

  • training_optimizer: The optimizer object for ML training.

  • loss_array: NamedTuple of arrays to store scalar losses for training, validation, and testing.

  • loss_array_components: NamedTuple of arrays to store loss components for training, validation, and testing.

  • metadata_global: Global metadata from the output configuration.

Notes

  • This function is typically called once at the start of a hybrid modeling experiment to set up all necessary components.

  • The returned hybrid_helpers is designed to be passed directly to training routines such as trainML.

Example

julia
hybrid_helpers = prepHybrid(forcing, observations, info, MixedGradient())
trainML(hybrid_helpers, MixedGradient())
SindbadML.shuffleBatches Method
julia
shuffleBatches(list, bs; seed=1)

Arguments

  • bs: Batch size

  • list: an array of samples

  • seed: Int

Returns shuffled partitioned batches.

SindbadML.shuffleList Method
julia
shuffleList(list; seed=123)

Arguments

  • list: an array of samples

  • seed: Int

SindbadML.siteNameToID Method
julia
siteNameToID(site_name, sites_list)

Returns the index of site_name in the sites_list

Arguments

  • site_name: site name

  • sites_list: list of site names

SindbadML.toClass Method
julia
toClass(x::Number; vegetation_rules)

Arguments

  • x: a key (Number) from vegetation_rules

  • vegetation_rules

SindbadML.trainML Method
julia
trainML(hybrid_helpers, ::MLTrainingType)

Train a machine learning (ML) or hybrid model in SINDBAD using the specified training method.

This function performs the training loop for the ML model, handling batching, gradient computation, optimizer updates, loss calculation, and checkpointing. It supports hybrid modeling workflows where ML-derived parameters are used in process-based models, and is designed to work with the data structures prepared by prepHybrid.

Arguments

  • hybrid_helpers: NamedTuple containing all prepared data, models, loss functions, indices, features, optimizers, and arrays needed for ML training and evaluation (as returned by prepHybrid).

  • ::MLTrainingType: Type specifying the ML training method to use (e.g., MixedGradient).

Workflow

  • Iterates over epochs and batches of training sites.

  • For each batch:

    • Extracts features and computes model parameters.

    • Computes gradients using the specified gradient method.

    • Checks for NaNs in gradients and replaces them if needed.

    • Updates model parameters using the optimizer.

  • After each epoch:

    • Computes and stores losses and loss components for training, validation, and testing sets.

    • Saves model checkpoints and loss arrays to disk if a checkpoint path is specified.

Notes

  • The function is extensible to support different training strategies via dispatch on MLTrainingType.

  • Designed for use with hybrid modeling, where ML models provide parameters to process-based models.

  • Checkpointing enables resuming or analyzing training progress.

Example

julia
hybrid_helpers = prepHybrid(forcing, observations, info, MixedGradient())
trainML(hybrid_helpers, MixedGradient())
SindbadML.vegKAoneHotbatch Method
julia
vegKAoneHotbatch(pft_data, ka_labels)

Arguments

  • pft_data: Vector array

  • ka_labels: KeyedArray labels, i.e. site names

SindbadML.vegOneHot Method
julia
vegOneHot(v_class; vegetation_labels)

Arguments

  • v_class: get it by doing toClass(x; vegetation_rules).

  • vegetation_labels: see them by typing vegetation_labels.

SindbadML.vegOneHotbatch Method
julia
vegOneHotbatch(veg_classes; vegetation_labels)

Arguments

  • veg_classes: get these from toClass.([x1, x2,...])

  • vegetation_labels: see them by typing vegetation_labels

Internal

SindbadML.batchShuffler Method
julia
batchShuffler(x_forcings, ids_forcings, batch_size; bs_seed=1456)

Shuffles the batches of forcings and their corresponding indices.

SindbadML.getLoss Method
julia
getLoss(models, loc_forcing, loc_spinup_forcing, loc_forcing_t, loc_output, land_init, tem_info, loc_obs, cost_options, constraint_method; optim_mode=true)

Calculates the loss for a given site. At this stage model parameters should had been set. The loss is calculated using the metricVector and combineMetric functions. The metricVector function calculates the loss for each model output and the combineMetric function combines the losses into a single value.

Arguments

  • models: list of models

  • loc_forcing: forcing data location

  • loc_spinup_forcing: spinup forcing data location

  • loc_forcing_t: forcing data time for one time step.

  • loc_output: output data location

  • land_init: initial land state

  • tem_info: model information

  • loc_obs: observation data location

  • cost_options: cost options

  • constraint_method: constraint method

The optional argument optim_mode is used to return the loss value only when set to true. Otherwise, it returns the loss value, the loss vector, and the loss indices.

SindbadML.getNFolds Method
julia
getNFolds(sites, train_ratio, val_ratio, test_ratio, n_folds, batch_size; seed=1234)

Partition a list of sites into training, validation, and testing sets for k-fold cross-validation in hybrid (ML) modeling.

This function shuffles the input sites array using the provided random seed for reproducibility, then splits the sites into n_folds folds. It computes the number of sites for each partition based on the provided ratios, ensuring the training set size is a multiple of batch_size. The function returns the indices for training, validation, and testing sets, as well as the full list of folds.

Arguments

  • sites: Array of site identifiers (e.g., site names or indices).

  • train_ratio: Fraction of sites to assign to the training set.

  • val_ratio: Fraction of sites to assign to the validation set.

  • test_ratio: Fraction of sites to assign to the testing set.

  • n_folds: Number of folds for cross-validation.

  • batch_size: Batch size for training; training set size will be rounded down to a multiple of this value.

  • seed: (Optional) Random seed for reproducibility (default: 1234).

Returns

  • train_indices: Array of sites assigned to the training set.

  • val_indices: Array of sites assigned to the validation set.

  • test_indices: Array of sites assigned to the testing set.

  • folds: Vector of arrays, each containing the sites for one fold.

Notes

  • The sum of train_ratio, val_ratio, and test_ratio must be approximately 1.0.

  • The returned folds can be used for further cross-validation or analysis.

Example

julia
train_indices, val_indices, test_indices, folds = getNFolds(sites, 0.7, 0.15, 0.15, 5, 32; seed=42)
SindbadML.scaleToBounds Method
julia
scaleToBounds(x, lo_b, up_b)

Scales values in the [0,1] interval to some given lower lo_b and upper up_b bounds.

Arguments

  • x: vector array

  • lo_b: lower bound

  • up_b: upper bound