Designing a SINDBAD Experiment

This guide provides a comprehensive overview of how to design and configure a SINDBAD experiment, based on an example setting. Note that the settings are different for different experiments

Overview

A SINDBAD experiment consists of several configuration files that define:

Basic experiment settings
Model structure and parameters
Forcing data configuration
Optimization settings
Execution rules and flags

Configuration Files

1. Experiment Configuration (`experiment.json`)

The main configuration file that defines the basic experiment settings:

json

{
  "basics": {
    "config_files": {
      "forcing": "forcing.json",
      "model_structure": "model_structure.json",
      "optimization": "optimization.json"
    },
    "domain": "FLUXNET",
    "name": "WROASTED",
    "time": {
      "date_begin": "1979-01-01",
      "date_end": "2017-12-31",
      "temporal_resolution": "day"
    }
  },
  "exe_rules": {
    "input_array_type": "keyed_array",
    "input_data_backend": "netcdf",
    "land_output_type": "array",
    "model_array_type": "static_array",
    "model_number_type": "Float32",
    "parallelization": "threads"
  },
  "flags": {
    "calc_cost": true,
    "run_forward": true,
    "run_optimization": true,
    "spinup_TEM": true
  }
}

2. Model Structure (`model_structure.json`)

Defines the model components and their approaches:

json

{
  "default_model": {
    "implicit_t_repeat": 1,
    "use_in_spinup": true
  },
  "models": {
    "autoRespiration": {
      "approach": "Thornley2000A"
    },
    "cCycle": {
      "approach": "GSI"
    },
    "gpp": {
      "approach": "coupled"
    }
  },
  "pools": {
    "carbon": {
      "combine": "cEco",
      "components": {
        "cVeg": {
          "Root": [1, 25.0],
          "Wood": [1, 25.0],
          "Leaf": [1, 25.0]
        }
      }
    }
  }
}

3. Forcing Configuration (`forcing.json`)

Defines the input data sources and their properties:

json

{
  "data_dimension": {
    "time": "time",
    "permute": ["time", "longitude", "latitude"],
    "space": ["longitude", "latitude"]
  },
  "default_forcing": {
    "data_path": "../data/BE-Vie.1979.2017.daily.nc",
    "source_product": "FLUXNET"
  },
  "variables": {
    "f_ambient_CO2": {
      "bounds": [200, 500],
      "standard_name": "ambient_CO2",
      "sindbad_unit": "ppm",
      "source_unit": "ppm"
    }
  }
}

4. Optimization Configuration (`optimization.json`)

Defines optimization parameters and observational constraints:

json

{
  "algorithm_optimization": "opti_algorithms/CMAEvolutionStrategy_CMAES.json",
  "model_parameters_to_optimize": {
    "autoRespiration,RMN": null,
    "gppAirT,opt_airT": null
  },
  "multi_constraint_method": "metric_sum",
  "observational_constraints": [
    "gpp",
    "nee",
    "reco"
  ],
  "observations": {
    "default_cost": {
      "cost_metric": "NSE_inv",
      "cost_weight": 1.0
    }
  }
}

Key Components

1. Experiment Basics

Domain: Geographic or thematic scope
Time Period: Start and end dates
Temporal Resolution: Time step (e.g., day, hour)
Name: Unique experiment identifier

2. Model Configuration

Model Approaches: Different implementations for each process
Pools: State variables and their components

3. Forcing Data

Data Sources: Input data files and variables
Units and Conversions: Unit specifications and conversions
Spatial and Temporal Dimensions: Data structure and organization

4. Optimization Settings

Algorithm: Optimization method (e.g., CMA-ES)
Parameters: Parameters to be optimized
Constraints: Observational constraints and metrics
Cost Function: How to evaluate model performance

5. Execution Rules

Data Types: Array types and number precision
Parallelization: Threading or other parallel execution
Output Format: Data storage format (e.g., NetCDF, Zarr)

Best Practices

Experiment Design

Start with a clear research question
Choose relevant model components and processes
Define appropriate data and methods

Model Configuration

Select appropriate model approaches
Define necessary pools and components
Set reasonable optimization parameter list and their ranges

Data Management

Ensure data consistency and quality, no gaps in input
Use appropriate units and conversions based on what is in the data and what is needed in SINDBAD
Handle missing data in constraints appropriately

Optimization

Choose suitable optimization algorithm
Define relevant observational constraints
Set appropriate cost metrics and weights

Performance

Use appropriate parallelization
Optimize memory usage
Consider computational efficiency

Example Workflow

Setup

julia

using SindbadExperiment
experiment_json = "path/to/experiment.json"

Configuration

The main configuration are loaded from the json, which can be over-written by replace_info at run time.

julia

replace_info = Dict(
    "experiment.basics.time.date_begin" => "1979-01-01",
    "experiment.basics.time.date_end" => "2017-12-31",
    "experiment.flags.run_optimization" => true
)

Run Experiment

julia

out_opti = runExperimentOpti(experiment_json; replace_info=replace_info)

Analysis

julia

# Access results
forcing = out_opti.forcing
observations = out_opti.observation
output = out_opti.output

Experiment Running Functions

SINDBAD provides several functions for running experiments with different configurations and purposes:

To list all available experiment methods and their purposes, use:

julia

using Sindbad
showMethodsOf(RunFlag)

This will display a formatted list of all experiment methods and their descriptions.