Skip to content
SindbadData Module
julia
SindbadData

The SindbadData package provides tools for handling and processing SINDBAD-related input data and processing. It supports reading, cleaning, masking, and managing data for SINDBAD experiments, with a focus on spatial and temporal dimensions.

Purpose:

This package is designed to streamline the ingestion and preprocessing of input data for SINDBAD experiments.

Dependencies:

  • Sindbad: Provides the core SINDBAD models and types.

  • SindbadUtils: Provides utility functions for handling NamedTuple, spatial operations, and other helper tasks for spatial and temporal operations.

  • AxisKeys: Enables labeled multidimensional arrays (KeyedArray) for managing data with explicit axis labels.

  • FillArrays: Provides efficient representations of arrays filled with a single value, useful for initializing data structures.

  • DimensionalData: Facilitates working with multidimensional data, particularly for indexing and slicing along spatial and temporal dimensions.

  • NCDatasets: Provides tools for reading and writing NetCDF files, a common format for scientific data.

  • NetCDF: Re-exported for convenience, enabling users to work with NetCDF files directly.

  • YAXArrays: Supports handling of multidimensional arrays, particularly for managing spatial and temporal data in SINDBAD experiments.

  • Zarr: Re-exported for handling hierarchical, chunked, and compressed data arrays, useful for large datasets.

  • YAXArrayBase: Provides base functionality for working with YAXArrays.

Included Files:

  1. utilsData.jl:
  • Contains utility functions for data preprocessing, including cleaning, masking, and checking bounds.
  1. spatialSubset.jl:
  • Implements spatial operations, such as extracting subsets of data based on spatial dimensions.
  1. getForcing.jl:
  • Provides functions for extracting and processing forcing data, such as environmental drivers, for SINDBAD experiments.
  1. getObservation.jl:
  • Implements utilities for reading and processing observational data, enabling model validation and performance evaluation.

Notes:

  • The package re-exports key packages (NetCDF, YAXArrays, Zarr) for convenience, allowing users to access their functionality directly through SindbadData.

  • Designed to handle large datasets efficiently, leveraging chunked and compressed data formats like NetCDF and Zarr.

  • Ensures compatibility with SINDBAD's experimental framework by integrating spatial and temporal data management tools.

Exported

SindbadData.AllNaN Type
julia
AllNaN <: YAXArrays.DAT.ProcFilter

Specialized filter for YAXArrays to skip pixels with all NaN or missing values.

Description

This struct is used as a specialized filter in data processing pipelines to identify or handle cases where all values in a data segment are NaN (Not a Number).

SindbadData.getForcing Method
julia
getForcing(info::NamedTuple)

Reads forcing data from the data_path specified in the experiment configuration and returns a NamedTuple with the forcing data.

Arguments:

  • info: A SINDBAD NamedTuple containing all information needed for setup and execution of an experiment.

Returns:

  • A NamedTuple forcing containing:
    • data: The processed input cubes.

    • dims: The dimensions of the forcing data.

    • variables: The names of the forcing variables.

    • f_types: The types of the forcing data (e.g., ForcingWithTime or ForcingWithoutTime).

    • helpers: Helper information for the forcing data.

Notes:

  • Reads forcing data from the specified data path and processes it using the SINDBAD framework.

  • Handles spatiotemporal and spatial-only forcing data.

  • Applies masks and subsets to the forcing data if specified in the configuration.

SindbadData.getNumberOfTimeSteps Method
julia
getNumberOfTimeSteps(incubes, time_name)

Returns the number of time steps in the input data cubes.

Arguments

  • incubes: Input data cubes containing temporal information

  • time_name: Name of the time dimension/variable

Returns

Integer representing the total number of time steps in the data

SindbadData.getObservation Method
julia
getObservation(info::NamedTuple, forcing_helpers::NamedTuple)

Processes observation data and returns a NamedTuple containing the observation data, dimensions, and variables.

Arguments:

  • info: A SINDBAD NamedTuple containing all information needed for setup and execution of an experiment.

  • forcing_helpers: A SINDBAD NamedTuple containing helper information for forcing data.

Returns:

  • A NamedTuple with the following fields:
    • data: The processed observation data as an input array.

    • dims: The dimensions of the observation data.

    • variables: A tuple of variable names for the observation data.

Notes:

  • Reads observation data from the path specified in the experiment configuration.

  • Handles quality flags, uncertainty, spatial weights, and selection masks for each observation variable.

  • Subsets and harmonizes the observation data based on the target dimensions and masks.

SindbadData.getSpatialSubset Method
julia
getSpatialSubset(ss, v)

Extracts a spatial subset of data based on specified spatial subsetting type/strategy.

Arguments

  • ss: Spatial subset parameters or geometry defining the region of interest

  • v: Data to be spatially subset

Returns

Spatially subset data according to the specified parameters

Note

The function assumes input data and spatial parameters are in compatible formats

SindbadData.mapCleanData Method

Maps and cleans data based on quality control parameters and fills missing values.

Arguments

  • _data: Raw input data to be cleaned

  • _data_qc: Quality control data corresponding to input data

  • _data_fill: Fill values for replacing invalid/missing data

  • bounds_qc: Quality control bounds/thresholds

  • _data_info: Additional information about the data

  • ::Val{T}: Value type parameter for dispatch

Returns

Cleaned and mapped data with invalid values replaced according to QC criteria

Note

This function performs quality control checks and data cleaning based on the provided bounds and fill values. The exact behavior depends on the value type T.

SindbadData.subsetAndProcessYax Method
julia
subsetAndProcessYax(yax, forcing_mask, tar_dims, _data_info, info, ::Val{num_type}; clean_data=true, fill_nan=false, yax_qc=nothing, bounds_qc=nothing) where {num_type}

Subset and process YAX data according to specified parameters and quality control criteria.

Arguments

  • yax: YAX data to be processed

  • forcing_mask: Mask to apply to the data

  • tar_dims: Target dimensions

  • _data_info: Data information

  • info: a SINDBAD NT that includes all information needed for setup and execution of an experiment

  • ::Val{num_type}: Value type parameter for numerical type specification

  • clean_data=true: Boolean flag to enable/disable data cleaning

  • fill_nan=false: Boolean flag to control NaN filling

  • yax_qc=nothing: Optional quality control parameters for YAX data

  • bounds_qc=nothing: Optional boundary quality control parameters

Returns

Processed and subset YAX data according to specified parameters and quality controls.

Type Parameters

  • num_type: Numerical type specification for the processed data
SindbadData.toDimStackArray Method

Convert a stacked array into a DimensionalArray with specified dimensions and metadata.

Arguments

  • stackArr: The input stacked array to be converted

  • time_interval: Time interval information for temporal dimension

  • p_names: Names of pools/variables

  • name: Optional keyword argument to specify the name of the dimension (default: :pools)

Returns

A DimensionalArray with proper dimensions and labels.

This function is useful for converting raw stacked arrays into properly dimensioned arrays with metadata, particularly for time series data with multiple pools or variables.

SindbadData.yaxCubeToKeyedArray Method
julia
yaxCubeToKeyedArray(c)

Convert a YAXArray cube to a KeyedArray.

Arguments

  • c: YAXArray input cube to be converted

Returns

KeyedArray representation of the input YAXArray cube

Description

Transforms a YAXArray data cube into a KeyedArray format, preserving the dimensional structure and associated metadata of the original cube.

Internal

SindbadData.applyQCBound Method
julia
applyQCBound(_data, data_qc, bounds_qc, _data_fill)

Apply quality control bounds to data values.

Arguments

  • _data: Input data array to be quality controlled

  • data_qc: Quality control flags associated with the data

  • bounds_qc: Bounds/thresholds for quality control checks

  • _data_fill: Fill value to use for data points that fail QC

Returns

The quality controlled data array with values outside bounds replaced by fill value

SindbadData.applyUnitConversion Function
julia
applyUnitConversion(_data, conversion, isadditive=false)

Applies a simple factor to the input, either additively or multiplicatively depending on isadditive flag

Arguments

  • _data: Input data to be converted

  • conversion: Conversion factor or function to be applied

  • isadditive: Boolean flag indicating whether the conversion is additive (default: false) or multiplicative

Returns

Converted data with the applied unit transformation

SindbadData.cleanData Method
julia
cleanData(_data, _data_fill, _data_info, ::Val{T}) where {T}

Applies a series of cleaning steps to the data, including replacing invalid data, applying unit conversion, and clamping to bounds.

Arguments

  • _data: The raw data to be cleaned

  • _data_fill: Fill values or parameters for handling missing/invalid data

  • _data_info: Information about the data structure and cleaning requirements

  • ::Val{T}: Value type parameter for dispatch

Returns

Cleaned data according to the specified type parameter T

SindbadData.collectForcingHelpers Method
julia
collectForcingHelpers(info, f_sizes, f_dimensions)

Generates a NamedTuple of helper information for forcing data.

Arguments:

  • info: A SINDBAD NamedTuple containing all information needed for setup and execution of an experiment.

  • f_sizes: A NamedTuple containing the sizes of forcing dimensions.

  • f_dimensions: A NamedTuple containing the dimensions of the forcing data.

Returns:

  • A NamedTuple f_helpers containing helper information for forcing data.

Notes:

  • Includes dimensions, axes, subset information, and sizes for the forcing data.
SindbadData.collectForcingSizes Method
julia
collectForcingSizes(info, in_yax)

Collects the sizes of forcing dimensions from the input YAXArray.

Arguments:

  • info: A SINDBAD NamedTuple containing all information needed for setup and execution of an experiment.

  • in_yax: The input YAXArray containing forcing data.

Returns:

  • A NamedTuple f_sizes where each dimension name is paired with its size.

Notes:

  • The function retrieves the size of the time dimension and spatial dimensions specified in the experiment configuration.

  • If the dimension is not directly accessible, it uses DimensionalData.lookup to retrieve the size.

SindbadData.createForcingNamedTuple Method
julia
createForcingNamedTuple(incubes, f_sizes, f_dimensions, info)

Creates a NamedTuple containing forcing data and metadata.

Arguments:

  • incubes: A collection of input cubes (YAXArray) containing forcing data.

  • f_sizes: A NamedTuple containing the sizes of forcing dimensions.

  • f_dimensions: A NamedTuple containing the dimensions of the forcing data.

  • info: A SINDBAD NamedTuple containing all information needed for setup and execution of an experiment.

Returns:

  • A NamedTuple forcing containing:
    • data: The processed input cubes.

    • dims: The dimensions of the forcing data.

    • variables: The names of the forcing variables.

    • f_types: The types of the forcing data (e.g., ForcingWithTime or ForcingWithoutTime).

    • helpers: Helper information for the forcing data.

Notes:

  • Processes the input cubes to determine their types and dimensions.

  • Helper information is generated using collectForcingHelpers.

SindbadData.getAllConstraintData Method
julia
getAllConstraintData(nc, data_backend, data_path, default_info, v_info, data_sub_field, info; yax=nothing, use_data_sub=true)

Reads data from the observation file and returns the data, YAXArray, variable info, and bounds for the observation constraint.

Arguments:

  • nc: The file or NetCDF object containing the observation data.

  • data_backend: The backend used to process the data (e.g., NetCDF, Zarr).

  • data_path: The path to the observation data file.

  • default_info: Default variable information for constraints.

  • v_info: Variable-specific information for the observation constraint, which can overwrite default_info.

  • data_sub_field: The subfield of the observation data to process (e.g., :data, :qflag, :unc).

  • info: A SINDBAD NamedTuple containing all information needed for setup and execution of an experiment.

  • yax: (Optional) The base observation YAXArray.

  • use_data_sub: A flag indicating whether to use the subfield of the observation constraint.

Returns:

  • nc_sub: The NetCDF object for the subfield.

  • yax_sub: The YAXArray for the subfield.

  • v_info_sub: The variable information for the subfield.

  • bounds_sub: The bounds for the subfield.

Notes:

  • If the subfield is not provided or use_data_sub is false, default values are used.

  • Handles quality flags, uncertainty, spatial weights, and selection masks for observation constraints.

SindbadData.getDataDims Method
julia
getDataDims(c, mappinginfo)

Retrieves the dimensions of data based on provided mapping information.

Arguments

  • c: The container or data structure to get dimensions from

  • mappinginfo: Information about how the data is mapped

Returns

The dimensions of the data specified by the mapping information.

SindbadData.getDimPermutation Method
julia
getDimPermutation(datDims, permDims)

Returns the permutation indices required to rearrange dimensions from datDims to match permDims.

Arguments

  • datDims: Array of current dimension names or indices

  • permDims: Array of target dimension names or indices in desired order

Returns

  • Array of indices representing the required permutation
SindbadData.getInputArrayOfType Function
julia
getInputArrayOfType(input_data, <: SindbadInputDataType)

Converts the provided input data into a specific input array type.

Arguments

  • input_data: The data to be converted into an input array

  • <: SindbadInputDataType: The specific input array type to convert the data into

    • ::InputArray: Specifies the input array type as a simple array

    • ::InputKeyedArray: Specifies the input array type as a keyed array

    • ::InputNamedDimsArray: Specifies the input array type as a named dims array

    • ::InputYaxArray: Specifies the input array type as a YAX array

Returns

Returns the input data converted to the specified input array type.

SindbadData.getSindbadDims Method
julia
getSindbadDims(c)

prepare the dimensions of data and name them appropriately for use in internal SINDBAD functions

Arguments

  • c: input data cube

Returns

Dimensions for use in SINDBAD

SindbadData.getTargetDimensionOrder Method
julia
getTargetDimensionOrder(info)

Retrieves the target dimension order to organize the forcing data from the provided information.

Arguments

  • info: Input information containing dimension order details.

Returns

The ordered sequence of dimensions for the target.

SindbadData.getYaxFromSource Function
julia
getYaxFromSource(nc, data_path, data_path_v, source_variable, info, <: DataFormatBackend)

Retrieve the data from a specified source.

Arguments

  • nc: The NetCDF file or object to read data from.

  • data_path: The path to the data within the NetCDF file.

  • data_path_v: The path to the variable within the NetCDF file.

  • source_variable: The name of the source variable to extract data for.

  • info: Additional information or metadata required for processing.

  • <: DataFormatBackend: Specifies the SINDBAD backend being used.

    • ::BackendNetcdf: Specifies that the function operates on a NetCDF backend.

    • ::BackendZarr: Specifies that the backend being used is Zarr.

Returns

  • The file object and extracted YAX data from the specified source.

Notes

  • Ensure that the nc object and paths provided are valid and accessible.

  • The functions are specific to the NetCDF and Zarr backend and may not work with other backends.

SindbadData.loadDataFile Method
julia
loadDataFile(data_path::String) -> Any

Load data from the specified file path.

Arguments

  • data_path::String: The path to the data file to be loaded.

Returns

  • The data loaded from the specified file. The return type depends on the file format and its contents.

Notes

  • Ensure that the file exists and is accessible at the given path.

  • The function assumes the file format is supported by the implementation.

SindbadData.loadDataFromPath Method
julia
loadDataFromPath(nc, data_path, data_path_v, source_variable)

Load data from specified NetCDF paths using given parameters.

Arguments

  • nc: NetCDF file handle

  • data_path: Path to the main data in NetCDF file

  • data_path_v: Path to the variable data in NetCDF file

  • source_variable: Name of the source variable to load

Returns

Data loaded from the specified paths in the NetCDF file.

SindbadData.spatialSubset Function
julia
spatialSubset(v, ss_range, <: SpatialSubsetter)

Extracts a spatial subset of the input data v based on the specified range and spatial dimension.

Arguments:

  • v: The input data from which a spatial subset is to be extracted.

  • ss_range: The range of indices or values to subset along the specified spatial dimension.

Returns:

  • A subset of the input data v corresponding to the specified spatial range and dimension.

SpatialSubsetter

Abstract type for spatial subsetting methods in SINDBAD

Available methods/subtypes:

  • SpaceID: Use site ID (all caps) for spatial subsetting

  • SpaceId: Use site ID (capitalized) for spatial subsetting

  • Spaceid: Use site ID for spatial subsetting

  • Spacelat: Use latitude for spatial subsetting

  • Spacelatitude: Use full latitude for spatial subsetting

  • Spacelon: Use longitude for spatial subsetting

  • Spacelongitude: Use full longitude for spatial subsetting

  • Spacesite: Use site location for spatial subsetting


Extended help

Notes:

  • The function dynamically selects the appropriate field in v based on the spatial type provided.

  • The spatial type determines the field name (e.g., site, lat, longitude, id, etc.) used for subsetting.

Examples:

  1. Subsetting by latitude:
julia
subset = spatialSubset(data, 10:20, Spacelat())
  1. Subsetting by longitude:
julia
subset = spatialSubset(data, 30:40, Spacelongitude())
  1. Subsetting by site ID:
julia
subset = spatialSubset(data, 1:5, Spaceid())