Skip to content

Data Designer Configuration

DataDesignerConfig is the main configuration object for builder datasets with Data Designer. It is a declarative configuration for defining the dataset you want to generate column-by-column, including options for dataset post-processing, validation, and profiling.

Generally, you should use the DataDesignerConfigBuilder to build your configuration, but you can also build it manually by instantiating the DataDesignerConfig class directly.

Classes:

Name Description
DataDesignerConfig

Configuration for NeMo Data Designer.

DataDesignerConfig

Bases: ExportableConfigBase

Configuration for NeMo Data Designer.

This class defines the main configuration structure for NeMo Data Designer, which orchestrates the generation of synthetic data.

Attributes:

Name Type Description
columns list[Annotated[ColumnConfigT, Field(discriminator='column_type')]]

Required list of column configurations defining how each column should be generated. Must contain at least one column.

model_configs list[ModelConfig] | None

Optional list of model configurations for LLM-based generation. Each model config defines the model, provider, and inference parameters.

tool_configs list[ToolConfig] | None

Optional list of tool configurations for MCP tool calling. Each tool config defines the provider, allowed tools, and execution limits.

seed_config SeedConfig | None

Optional seed dataset settings to use for generation.

constraints list[ColumnConstraintInputT] | None

Optional list of column constraints.

profilers list[ColumnProfilerConfigT] | None

Optional list of column profilers for analyzing generated data characteristics.

Methods:

Name Description
fingerprint

Compute a deterministic content-addressable fingerprint of this config.

fingerprint()

Compute a deterministic content-addressable fingerprint of this config.

See data_designer.config.fingerprint.fingerprint_config for the full list of identity-relevant and excluded fields, and how custom column generators are identified.

Returns:

Type Description
dict[str, str | int]

A dict with config_hash, config_hash_algo, and

dict[str, str | int]

config_hash_version.

Source code in packages/data-designer-config/src/data_designer/config/data_designer_config.py
47
48
49
50
51
52
53
54
55
56
57
58
def fingerprint(self) -> dict[str, str | int]:
    """Compute a deterministic content-addressable fingerprint of this config.

    See `data_designer.config.fingerprint.fingerprint_config` for the full
    list of identity-relevant and excluded fields, and how custom column
    generators are identified.

    Returns:
        A dict with `config_hash`, `config_hash_algo`, and
        `config_hash_version`.
    """
    return fingerprint_config(self)