adc_toolkit.data.default_attributes

Factory functions for creating default catalog and validator instances.

This module provides factory functions that return sensible default implementations of the data catalog and data validator abstractions used throughout the adc-toolkit. These defaults follow a priority-based selection strategy and use lazy imports to avoid requiring all optional dependencies.

Default Selection Logic

Data Catalog: Always returns KedroDataCatalog if the kedro package is installed. This provides YAML-based configuration and supports multiple data formats (CSV, Parquet, JSON, etc.).

Data Validator: Follows a priority hierarchy: 1. GXValidator (Great Expectations) - preferred default if installed 2. PanderaValidator (Pandera) - fallback if GX is not installed 3. ImportError - raised if neither validation package is available

The lazy import mechanism ensures that only the actually-used implementation is imported, allowing users to install only the optional dependencies they need.

Functions

default_catalog(config_path) Return the default data catalog implementation (KedroDataCatalog). default_validator(config_path) Return the default data validator implementation (GX or Pandera).

See Also

adc_toolkit.data.abs.DataCatalog: Abstract base class for data catalogs.
adc_toolkit.data.abs.DataValidator: Abstract base class for data validators.
adc_toolkit.data.ValidatedDataCatalog: Main validated data catalog abstraction.

Notes

These factory functions are primarily used by ValidatedDataCatalog.in_directory() to automatically construct a validated catalog with sensible defaults when the user doesn't explicitly specify catalog or validator implementations.

Users can always bypass these defaults by directly instantiating specific implementations (e.g., KedroDataCatalog, GXValidator, PanderaValidator, or NoValidator).

Examples

The default factories are typically used indirectly through ValidatedDataCatalog:

>>> from adc_toolkit.data import ValidatedDataCatalog
>>> catalog = ValidatedDataCatalog.in_directory("config/")
>>> # This uses default_catalog() and default_validator() internally

Direct usage of the factory functions:

>>> from adc_toolkit.data.default_attributes import default_catalog, default_validator
>>> catalog = default_catalog("config/")
>>> validator = default_validator("config/")
>>> # Manually construct ValidatedDataCatalog with defaults
>>> from adc_toolkit.data import ValidatedDataCatalog
>>> validated_catalog = ValidatedDataCatalog(catalog, validator)
  1"""
  2Factory functions for creating default catalog and validator instances.
  3
  4This module provides factory functions that return sensible default implementations
  5of the data catalog and data validator abstractions used throughout the adc-toolkit.
  6These defaults follow a priority-based selection strategy and use lazy imports to
  7avoid requiring all optional dependencies.
  8
  9Default Selection Logic
 10-----------------------
 11**Data Catalog**: Always returns ``KedroDataCatalog`` if the kedro package is
 12installed. This provides YAML-based configuration and supports multiple data
 13formats (CSV, Parquet, JSON, etc.).
 14
 15**Data Validator**: Follows a priority hierarchy:
 16    1. ``GXValidator`` (Great Expectations) - preferred default if installed
 17    2. ``PanderaValidator`` (Pandera) - fallback if GX is not installed
 18    3. ``ImportError`` - raised if neither validation package is available
 19
 20The lazy import mechanism ensures that only the actually-used implementation is
 21imported, allowing users to install only the optional dependencies they need.
 22
 23Functions
 24---------
 25default_catalog(config_path)
 26    Return the default data catalog implementation (KedroDataCatalog).
 27default_validator(config_path)
 28    Return the default data validator implementation (GX or Pandera).
 29
 30See Also
 31--------
 32adc_toolkit.data.abs.DataCatalog : Abstract base class for data catalogs.
 33adc_toolkit.data.abs.DataValidator : Abstract base class for data validators.
 34adc_toolkit.data.ValidatedDataCatalog : Main validated data catalog abstraction.
 35
 36Notes
 37-----
 38These factory functions are primarily used by ``ValidatedDataCatalog.in_directory()``
 39to automatically construct a validated catalog with sensible defaults when the user
 40doesn't explicitly specify catalog or validator implementations.
 41
 42Users can always bypass these defaults by directly instantiating specific
 43implementations (e.g., ``KedroDataCatalog``, ``GXValidator``, ``PanderaValidator``,
 44or ``NoValidator``).
 45
 46Examples
 47--------
 48The default factories are typically used indirectly through ValidatedDataCatalog:
 49
 50>>> from adc_toolkit.data import ValidatedDataCatalog
 51>>> catalog = ValidatedDataCatalog.in_directory("config/")
 52>>> # This uses default_catalog() and default_validator() internally
 53
 54Direct usage of the factory functions:
 55
 56>>> from adc_toolkit.data.default_attributes import default_catalog, default_validator
 57>>> catalog = default_catalog("config/")
 58>>> validator = default_validator("config/")
 59>>> # Manually construct ValidatedDataCatalog with defaults
 60>>> from adc_toolkit.data import ValidatedDataCatalog
 61>>> validated_catalog = ValidatedDataCatalog(catalog, validator)
 62"""
 63
 64import warnings
 65from importlib.util import find_spec
 66from pathlib import Path
 67
 68from adc_toolkit.data.abs import DataCatalog, DataValidator
 69
 70
 71def default_catalog(config_path: str | Path) -> DataCatalog:
 72    """
 73    Return the default data catalog implementation initialized from configuration.
 74
 75    This factory function provides the default ``DataCatalog`` implementation for
 76    the adc-toolkit. It uses a lazy import mechanism to check for the kedro package
 77    and returns a ``KedroDataCatalog`` instance if available.
 78
 79    The function performs runtime package detection using ``importlib.util.find_spec``
 80    to avoid hard dependencies on kedro. This allows users to install only the
 81    catalog implementations they need.
 82
 83    Parameters
 84    ----------
 85    config_path : str or pathlib.Path
 86        Path to the configuration directory containing the data catalog YAML file.
 87        For ``KedroDataCatalog``, this directory should contain a ``catalog.yaml``
 88        file that defines dataset configurations in Kedro format. The path can be
 89        either absolute or relative to the current working directory.
 90
 91    Returns
 92    -------
 93    DataCatalog
 94        An instance of ``KedroDataCatalog`` initialized with the configuration
 95        found in the specified directory. The returned object implements the
 96        ``DataCatalog`` abstract interface, providing ``load()`` and ``save()``
 97        methods for data I/O operations.
 98
 99    Raises
100    ------
101    ImportError
102        If the kedro package is not installed. The error message provides
103        installation instructions using the uv package manager (formerly poetry).
104        Users can install kedro by running ``uv sync --group kedro`` or implement
105        their own custom ``DataCatalog`` subclass.
106
107    See Also
108    --------
109    adc_toolkit.data.catalogs.kedro.KedroDataCatalog : The Kedro-based catalog implementation.
110    adc_toolkit.data.abs.DataCatalog : Abstract base class for data catalogs.
111    adc_toolkit.data.ValidatedDataCatalog : Validated catalog using this default.
112
113    Notes
114    -----
115    **Lazy Import Mechanism**: The function uses ``importlib.util.find_spec`` to
116    check for kedro's availability before importing. This allows the module to be
117    imported even when kedro is not installed, with the ImportError only raised
118    when the function is actually called.
119
120    **Alternative Implementations**: Users who don't want to use Kedro can:
121        1. Implement a custom ``DataCatalog`` subclass
122        2. Directly instantiate their catalog and pass it to ``ValidatedDataCatalog``
123
124    **Configuration Format**: The KedroDataCatalog expects a ``catalog.yaml`` file
125    in the specified directory. See the Kedro documentation for the full
126    specification of the catalog configuration format.
127
128    Examples
129    --------
130    Basic usage to get a default catalog:
131
132    >>> from adc_toolkit.data.default_attributes import default_catalog
133    >>> catalog = default_catalog("path/to/config")
134    >>> # catalog is now a KedroDataCatalog instance
135    >>> df = catalog.load("my_dataset")
136
137    Using with ValidatedDataCatalog (typical usage pattern):
138
139    >>> from adc_toolkit.data import ValidatedDataCatalog
140    >>> validated_cat = ValidatedDataCatalog.in_directory("config/")
141    >>> # This internally calls default_catalog("config/")
142
143    Handling the ImportError when kedro is not installed:
144
145    >>> try:
146    ...     catalog = default_catalog("config/")
147    ... except ImportError as e:
148    ...     print("Kedro not installed, using custom catalog")
149    ...     catalog = MyCustomCatalog("config/")
150
151    Working with Path objects:
152
153    >>> from pathlib import Path
154    >>> config_dir = Path(__file__).parent / "config"
155    >>> catalog = default_catalog(config_dir)
156    """
157    is_kedro_installed = find_spec("kedro") is not None
158    if not is_kedro_installed:
159        raise ImportError(
160            "Default data catalog is KedroDataCatalog. "
161            "You must install kedro to use KedroDataCatalog. "
162            "Run `uv sync --group kedro` to do so."
163            "Alternatively, you can implement your own data catalog."
164        )
165
166    from adc_toolkit.data.catalogs.kedro import KedroDataCatalog
167
168    return KedroDataCatalog(config_path)
169
170
171def default_validator(config_path: str | Path) -> DataValidator:
172    """
173    Return the default data validator implementation with priority-based selection.
174
175    This factory function provides the default ``DataValidator`` implementation for
176    the adc-toolkit by attempting to load validation libraries in priority order.
177    It implements a fallback chain: Great Expectations (preferred) → Pandera
178    (fallback) → ImportError (if neither is available).
179
180    The function uses lazy imports and runtime package detection to check for
181    available validation libraries, allowing users to install only the validator
182    they prefer. When Great Expectations is not available but Pandera is, a
183    warning is issued to inform users they are using the fallback implementation.
184
185    Priority Selection Logic
186    ------------------------
187    1. **GXValidator (Great Expectations)**: Preferred default. Provides comprehensive
188       data validation with extensive built-in expectations, profiling capabilities,
189       and data documentation features.
190
191    2. **PanderaValidator (Pandera)**: Fallback option. Provides DataFrame schema
192       validation with a more lightweight, Pythonic API. Used automatically when
193       Great Expectations is not installed.
194
195    3. **ImportError**: Raised when neither validation library is available, with
196       detailed installation instructions.
197
198    Parameters
199    ----------
200    config_path : str or pathlib.Path
201        Path to the configuration directory containing validator configuration files.
202        The expected file structure depends on the validator:
203
204        - **GXValidator**: Expects a Great Expectations project structure with
205          ``great_expectations.yml`` or expectations suite configurations.
206        - **PanderaValidator**: Expects Pandera schema definition files (Python
207          modules or YAML files depending on configuration).
208
209        The path can be either absolute or relative to the current working directory.
210
211    Returns
212    -------
213    DataValidator
214        An instance of either ``GXValidator`` or ``PanderaValidator`` (in priority
215        order), initialized with the configuration found in the specified directory.
216        The returned object implements the ``DataValidator`` abstract interface,
217        providing ``validate()`` methods for data quality checks.
218
219    Raises
220    ------
221    ImportError
222        Raised when neither the great_expectations nor pandera packages are
223        installed. The error message provides detailed installation instructions
224        for both options using the uv package manager, and also mentions the
225        alternative of implementing a custom validator or using ``NoValidator``
226        (though the latter is not recommended for production use).
227
228    Warns
229    -----
230    UserWarning
231        Issued when Great Expectations is not installed but Pandera is available.
232        This warning informs users that they are using the fallback validator
233        implementation rather than the preferred default. The warning includes
234        stacklevel=2 to show the calling code location rather than the factory
235        function itself.
236
237    See Also
238    --------
239    adc_toolkit.data.validators.gx.GXValidator : Great Expectations validator implementation.
240    adc_toolkit.data.validators.pandera.PanderaValidator : Pandera validator implementation.
241    adc_toolkit.data.validators.no_validator.NoValidator : No-op validator (not recommended).
242    adc_toolkit.data.abs.DataValidator : Abstract base class for data validators.
243    adc_toolkit.data.ValidatedDataCatalog : Validated catalog using this default.
244
245    Notes
246    -----
247    **Lazy Import Mechanism**: The function uses ``importlib.util.find_spec`` to
248    check for package availability before importing. This allows the module to be
249    imported even when validation libraries are not installed, with the ImportError
250    only raised when the function is actually called.
251
252    **Installation Options**: Users should install the validation library that best
253    fits their needs:
254
255    - For comprehensive validation and data documentation: ``uv sync --group gx``
256    - For lightweight DataFrame validation: ``uv sync --group pandera``
257    - For both (if needed): ``uv sync --group gx --group pandera``
258
259    **Alternative Implementations**: Users who don't want to use the defaults can:
260
261    1. Implement a custom ``DataValidator`` subclass
262    2. Use the ``NoValidator`` class (bypasses all validation, not recommended)
263    3. Directly instantiate a specific validator and pass it to ``ValidatedDataCatalog``
264
265    **Warning Behavior**: The fallback warning uses ``stacklevel=2`` to ensure the
266    warning appears to originate from the user's code that called this function,
267    not from within the factory function itself. This makes it easier for users
268    to identify where the fallback is being triggered.
269
270    Examples
271    --------
272    Basic usage to get a default validator:
273
274    >>> from adc_toolkit.data.default_attributes import default_validator
275    >>> validator = default_validator("path/to/config")
276    >>> # validator is either GXValidator or PanderaValidator
277    >>> validated_df = validator.validate("my_dataset", df)
278
279    Using with ValidatedDataCatalog (typical usage pattern):
280
281    >>> from adc_toolkit.data import ValidatedDataCatalog
282    >>> validated_cat = ValidatedDataCatalog.in_directory("config/")
283    >>> # This internally calls default_validator("config/")
284    >>> df = validated_cat.load("my_dataset")  # Validates after loading
285
286    Handling the fallback warning:
287
288    >>> import warnings
289    >>> warnings.filterwarnings("ignore", message=".*PanderaValidator.*")
290    >>> validator = default_validator("config/")
291    >>> # Warning is suppressed if only Pandera is installed
292
293    Explicitly choosing a validator to avoid the default behavior:
294
295    >>> from adc_toolkit.data.validators.pandera import PanderaValidator
296    >>> from adc_toolkit.data.validators.gx import GXValidator
297    >>> # Choose GXValidator explicitly
298    >>> validator = GXValidator.in_directory("config/")
299
300    Handling the ImportError when no validators are installed:
301
302    >>> from adc_toolkit.data.validators.no_validator import NoValidator
303    >>> try:
304    ...     validator = default_validator("config/")
305    ... except ImportError:
306    ...     print("No validators installed, using NoValidator")
307    ...     validator = NoValidator()
308
309    Working with Path objects:
310
311    >>> from pathlib import Path
312    >>> config_dir = Path(__file__).parent / "config"
313    >>> validator = default_validator(config_dir)
314    """
315    is_great_expectations_installed = find_spec("great_expectations") is not None
316    is_pandera_installed = find_spec("pandera") is not None
317
318    if is_great_expectations_installed:
319        from adc_toolkit.data.validators.gx import GXValidator
320
321        return GXValidator.in_directory(config_path)
322    elif is_pandera_installed:
323        warnings.warn(
324            "Default data validator is GXValidator. "
325            "Great Expectations is not installed. "
326            "Using PanderaValidator instead.",
327            stacklevel=2,
328        )
329        from adc_toolkit.data.validators.pandera import PanderaValidator
330
331        return PanderaValidator.in_directory(config_path)
332    else:
333        raise ImportError(
334            "Default data validators are GXValidator and PanderaValidator. "
335            "You must install either great_expectations or pandera to use them. "
336            "Neither package is installed. "
337            "Run `uv sync --group gx` or "
338            "`uv sync --group pandera` to do so. "
339            "Alternatively, you can implement your own data validator. "
340            "If you don't want to validate data, use NoValidator class (not recommended)."
341        )
def default_catalog(config_path: str | pathlib.Path) -> adc_toolkit.data.abs.DataCatalog:
 72def default_catalog(config_path: str | Path) -> DataCatalog:
 73    """
 74    Return the default data catalog implementation initialized from configuration.
 75
 76    This factory function provides the default ``DataCatalog`` implementation for
 77    the adc-toolkit. It uses a lazy import mechanism to check for the kedro package
 78    and returns a ``KedroDataCatalog`` instance if available.
 79
 80    The function performs runtime package detection using ``importlib.util.find_spec``
 81    to avoid hard dependencies on kedro. This allows users to install only the
 82    catalog implementations they need.
 83
 84    Parameters
 85    ----------
 86    config_path : str or pathlib.Path
 87        Path to the configuration directory containing the data catalog YAML file.
 88        For ``KedroDataCatalog``, this directory should contain a ``catalog.yaml``
 89        file that defines dataset configurations in Kedro format. The path can be
 90        either absolute or relative to the current working directory.
 91
 92    Returns
 93    -------
 94    DataCatalog
 95        An instance of ``KedroDataCatalog`` initialized with the configuration
 96        found in the specified directory. The returned object implements the
 97        ``DataCatalog`` abstract interface, providing ``load()`` and ``save()``
 98        methods for data I/O operations.
 99
100    Raises
101    ------
102    ImportError
103        If the kedro package is not installed. The error message provides
104        installation instructions using the uv package manager (formerly poetry).
105        Users can install kedro by running ``uv sync --group kedro`` or implement
106        their own custom ``DataCatalog`` subclass.
107
108    See Also
109    --------
110    adc_toolkit.data.catalogs.kedro.KedroDataCatalog : The Kedro-based catalog implementation.
111    adc_toolkit.data.abs.DataCatalog : Abstract base class for data catalogs.
112    adc_toolkit.data.ValidatedDataCatalog : Validated catalog using this default.
113
114    Notes
115    -----
116    **Lazy Import Mechanism**: The function uses ``importlib.util.find_spec`` to
117    check for kedro's availability before importing. This allows the module to be
118    imported even when kedro is not installed, with the ImportError only raised
119    when the function is actually called.
120
121    **Alternative Implementations**: Users who don't want to use Kedro can:
122        1. Implement a custom ``DataCatalog`` subclass
123        2. Directly instantiate their catalog and pass it to ``ValidatedDataCatalog``
124
125    **Configuration Format**: The KedroDataCatalog expects a ``catalog.yaml`` file
126    in the specified directory. See the Kedro documentation for the full
127    specification of the catalog configuration format.
128
129    Examples
130    --------
131    Basic usage to get a default catalog:
132
133    >>> from adc_toolkit.data.default_attributes import default_catalog
134    >>> catalog = default_catalog("path/to/config")
135    >>> # catalog is now a KedroDataCatalog instance
136    >>> df = catalog.load("my_dataset")
137
138    Using with ValidatedDataCatalog (typical usage pattern):
139
140    >>> from adc_toolkit.data import ValidatedDataCatalog
141    >>> validated_cat = ValidatedDataCatalog.in_directory("config/")
142    >>> # This internally calls default_catalog("config/")
143
144    Handling the ImportError when kedro is not installed:
145
146    >>> try:
147    ...     catalog = default_catalog("config/")
148    ... except ImportError as e:
149    ...     print("Kedro not installed, using custom catalog")
150    ...     catalog = MyCustomCatalog("config/")
151
152    Working with Path objects:
153
154    >>> from pathlib import Path
155    >>> config_dir = Path(__file__).parent / "config"
156    >>> catalog = default_catalog(config_dir)
157    """
158    is_kedro_installed = find_spec("kedro") is not None
159    if not is_kedro_installed:
160        raise ImportError(
161            "Default data catalog is KedroDataCatalog. "
162            "You must install kedro to use KedroDataCatalog. "
163            "Run `uv sync --group kedro` to do so."
164            "Alternatively, you can implement your own data catalog."
165        )
166
167    from adc_toolkit.data.catalogs.kedro import KedroDataCatalog
168
169    return KedroDataCatalog(config_path)

Return the default data catalog implementation initialized from configuration.

This factory function provides the default DataCatalog implementation for the adc-toolkit. It uses a lazy import mechanism to check for the kedro package and returns a KedroDataCatalog instance if available.

The function performs runtime package detection using importlib.util.find_spec to avoid hard dependencies on kedro. This allows users to install only the catalog implementations they need.

Parameters
  • config_path (str or pathlib.Path): Path to the configuration directory containing the data catalog YAML file. For KedroDataCatalog, this directory should contain a catalog.yaml file that defines dataset configurations in Kedro format. The path can be either absolute or relative to the current working directory.
Returns
  • DataCatalog: An instance of KedroDataCatalog initialized with the configuration found in the specified directory. The returned object implements the DataCatalog abstract interface, providing load() and save() methods for data I/O operations.
Raises
  • ImportError: If the kedro package is not installed. The error message provides installation instructions using the uv package manager (formerly poetry). Users can install kedro by running uv sync --group kedro or implement their own custom DataCatalog subclass.
See Also

adc_toolkit.data.catalogs.kedro.KedroDataCatalog: The Kedro-based catalog implementation.
adc_toolkit.data.abs.DataCatalog: Abstract base class for data catalogs.
adc_toolkit.data.ValidatedDataCatalog: Validated catalog using this default.

Notes

Lazy Import Mechanism: The function uses importlib.util.find_spec to check for kedro's availability before importing. This allows the module to be imported even when kedro is not installed, with the ImportError only raised when the function is actually called.

Alternative Implementations: Users who don't want to use Kedro can: 1. Implement a custom DataCatalog subclass 2. Directly instantiate their catalog and pass it to ValidatedDataCatalog

Configuration Format: The KedroDataCatalog expects a catalog.yaml file in the specified directory. See the Kedro documentation for the full specification of the catalog configuration format.

Examples

Basic usage to get a default catalog:

>>> from adc_toolkit.data.default_attributes import default_catalog
>>> catalog = default_catalog("path/to/config")
>>> # catalog is now a KedroDataCatalog instance
>>> df = catalog.load("my_dataset")

Using with ValidatedDataCatalog (typical usage pattern):

>>> from adc_toolkit.data import ValidatedDataCatalog
>>> validated_cat = ValidatedDataCatalog.in_directory("config/")
>>> # This internally calls default_catalog("config/")

Handling the ImportError when kedro is not installed:

>>> try:
...     catalog = default_catalog("config/")
... except ImportError as e:
...     print("Kedro not installed, using custom catalog")
...     catalog = MyCustomCatalog("config/")

Working with Path objects:

>>> from pathlib import Path
>>> config_dir = Path(__file__).parent / "config"
>>> catalog = default_catalog(config_dir)
def default_validator(config_path: str | pathlib.Path) -> adc_toolkit.data.abs.DataValidator:
172def default_validator(config_path: str | Path) -> DataValidator:
173    """
174    Return the default data validator implementation with priority-based selection.
175
176    This factory function provides the default ``DataValidator`` implementation for
177    the adc-toolkit by attempting to load validation libraries in priority order.
178    It implements a fallback chain: Great Expectations (preferred) → Pandera
179    (fallback) → ImportError (if neither is available).
180
181    The function uses lazy imports and runtime package detection to check for
182    available validation libraries, allowing users to install only the validator
183    they prefer. When Great Expectations is not available but Pandera is, a
184    warning is issued to inform users they are using the fallback implementation.
185
186    Priority Selection Logic
187    ------------------------
188    1. **GXValidator (Great Expectations)**: Preferred default. Provides comprehensive
189       data validation with extensive built-in expectations, profiling capabilities,
190       and data documentation features.
191
192    2. **PanderaValidator (Pandera)**: Fallback option. Provides DataFrame schema
193       validation with a more lightweight, Pythonic API. Used automatically when
194       Great Expectations is not installed.
195
196    3. **ImportError**: Raised when neither validation library is available, with
197       detailed installation instructions.
198
199    Parameters
200    ----------
201    config_path : str or pathlib.Path
202        Path to the configuration directory containing validator configuration files.
203        The expected file structure depends on the validator:
204
205        - **GXValidator**: Expects a Great Expectations project structure with
206          ``great_expectations.yml`` or expectations suite configurations.
207        - **PanderaValidator**: Expects Pandera schema definition files (Python
208          modules or YAML files depending on configuration).
209
210        The path can be either absolute or relative to the current working directory.
211
212    Returns
213    -------
214    DataValidator
215        An instance of either ``GXValidator`` or ``PanderaValidator`` (in priority
216        order), initialized with the configuration found in the specified directory.
217        The returned object implements the ``DataValidator`` abstract interface,
218        providing ``validate()`` methods for data quality checks.
219
220    Raises
221    ------
222    ImportError
223        Raised when neither the great_expectations nor pandera packages are
224        installed. The error message provides detailed installation instructions
225        for both options using the uv package manager, and also mentions the
226        alternative of implementing a custom validator or using ``NoValidator``
227        (though the latter is not recommended for production use).
228
229    Warns
230    -----
231    UserWarning
232        Issued when Great Expectations is not installed but Pandera is available.
233        This warning informs users that they are using the fallback validator
234        implementation rather than the preferred default. The warning includes
235        stacklevel=2 to show the calling code location rather than the factory
236        function itself.
237
238    See Also
239    --------
240    adc_toolkit.data.validators.gx.GXValidator : Great Expectations validator implementation.
241    adc_toolkit.data.validators.pandera.PanderaValidator : Pandera validator implementation.
242    adc_toolkit.data.validators.no_validator.NoValidator : No-op validator (not recommended).
243    adc_toolkit.data.abs.DataValidator : Abstract base class for data validators.
244    adc_toolkit.data.ValidatedDataCatalog : Validated catalog using this default.
245
246    Notes
247    -----
248    **Lazy Import Mechanism**: The function uses ``importlib.util.find_spec`` to
249    check for package availability before importing. This allows the module to be
250    imported even when validation libraries are not installed, with the ImportError
251    only raised when the function is actually called.
252
253    **Installation Options**: Users should install the validation library that best
254    fits their needs:
255
256    - For comprehensive validation and data documentation: ``uv sync --group gx``
257    - For lightweight DataFrame validation: ``uv sync --group pandera``
258    - For both (if needed): ``uv sync --group gx --group pandera``
259
260    **Alternative Implementations**: Users who don't want to use the defaults can:
261
262    1. Implement a custom ``DataValidator`` subclass
263    2. Use the ``NoValidator`` class (bypasses all validation, not recommended)
264    3. Directly instantiate a specific validator and pass it to ``ValidatedDataCatalog``
265
266    **Warning Behavior**: The fallback warning uses ``stacklevel=2`` to ensure the
267    warning appears to originate from the user's code that called this function,
268    not from within the factory function itself. This makes it easier for users
269    to identify where the fallback is being triggered.
270
271    Examples
272    --------
273    Basic usage to get a default validator:
274
275    >>> from adc_toolkit.data.default_attributes import default_validator
276    >>> validator = default_validator("path/to/config")
277    >>> # validator is either GXValidator or PanderaValidator
278    >>> validated_df = validator.validate("my_dataset", df)
279
280    Using with ValidatedDataCatalog (typical usage pattern):
281
282    >>> from adc_toolkit.data import ValidatedDataCatalog
283    >>> validated_cat = ValidatedDataCatalog.in_directory("config/")
284    >>> # This internally calls default_validator("config/")
285    >>> df = validated_cat.load("my_dataset")  # Validates after loading
286
287    Handling the fallback warning:
288
289    >>> import warnings
290    >>> warnings.filterwarnings("ignore", message=".*PanderaValidator.*")
291    >>> validator = default_validator("config/")
292    >>> # Warning is suppressed if only Pandera is installed
293
294    Explicitly choosing a validator to avoid the default behavior:
295
296    >>> from adc_toolkit.data.validators.pandera import PanderaValidator
297    >>> from adc_toolkit.data.validators.gx import GXValidator
298    >>> # Choose GXValidator explicitly
299    >>> validator = GXValidator.in_directory("config/")
300
301    Handling the ImportError when no validators are installed:
302
303    >>> from adc_toolkit.data.validators.no_validator import NoValidator
304    >>> try:
305    ...     validator = default_validator("config/")
306    ... except ImportError:
307    ...     print("No validators installed, using NoValidator")
308    ...     validator = NoValidator()
309
310    Working with Path objects:
311
312    >>> from pathlib import Path
313    >>> config_dir = Path(__file__).parent / "config"
314    >>> validator = default_validator(config_dir)
315    """
316    is_great_expectations_installed = find_spec("great_expectations") is not None
317    is_pandera_installed = find_spec("pandera") is not None
318
319    if is_great_expectations_installed:
320        from adc_toolkit.data.validators.gx import GXValidator
321
322        return GXValidator.in_directory(config_path)
323    elif is_pandera_installed:
324        warnings.warn(
325            "Default data validator is GXValidator. "
326            "Great Expectations is not installed. "
327            "Using PanderaValidator instead.",
328            stacklevel=2,
329        )
330        from adc_toolkit.data.validators.pandera import PanderaValidator
331
332        return PanderaValidator.in_directory(config_path)
333    else:
334        raise ImportError(
335            "Default data validators are GXValidator and PanderaValidator. "
336            "You must install either great_expectations or pandera to use them. "
337            "Neither package is installed. "
338            "Run `uv sync --group gx` or "
339            "`uv sync --group pandera` to do so. "
340            "Alternatively, you can implement your own data validator. "
341            "If you don't want to validate data, use NoValidator class (not recommended)."
342        )

Return the default data validator implementation with priority-based selection.

This factory function provides the default DataValidator implementation for the adc-toolkit by attempting to load validation libraries in priority order. It implements a fallback chain: Great Expectations (preferred) → Pandera (fallback) → ImportError (if neither is available).

The function uses lazy imports and runtime package detection to check for available validation libraries, allowing users to install only the validator they prefer. When Great Expectations is not available but Pandera is, a warning is issued to inform users they are using the fallback implementation.

Priority Selection Logic
  1. GXValidator (Great Expectations): Preferred default. Provides comprehensive data validation with extensive built-in expectations, profiling capabilities, and data documentation features.

  2. PanderaValidator (Pandera): Fallback option. Provides DataFrame schema validation with a more lightweight, Pythonic API. Used automatically when Great Expectations is not installed.

  3. ImportError: Raised when neither validation library is available, with detailed installation instructions.

Parameters
  • config_path (str or pathlib.Path): Path to the configuration directory containing validator configuration files. The expected file structure depends on the validator:

    • GXValidator: Expects a Great Expectations project structure with great_expectations.yml or expectations suite configurations.
    • PanderaValidator: Expects Pandera schema definition files (Python modules or YAML files depending on configuration).

    The path can be either absolute or relative to the current working directory.

Returns
  • DataValidator: An instance of either GXValidator or PanderaValidator (in priority order), initialized with the configuration found in the specified directory. The returned object implements the DataValidator abstract interface, providing validate() methods for data quality checks.
Raises
  • ImportError: Raised when neither the great_expectations nor pandera packages are installed. The error message provides detailed installation instructions for both options using the uv package manager, and also mentions the alternative of implementing a custom validator or using NoValidator (though the latter is not recommended for production use).
Warns
  • UserWarning: Issued when Great Expectations is not installed but Pandera is available. This warning informs users that they are using the fallback validator implementation rather than the preferred default. The warning includes stacklevel=2 to show the calling code location rather than the factory function itself.
See Also

adc_toolkit.data.validators.gx.GXValidator: Great Expectations validator implementation.
adc_toolkit.data.validators.pandera.PanderaValidator: Pandera validator implementation.
adc_toolkit.data.validators.no_validator.NoValidator: No-op validator (not recommended).
adc_toolkit.data.abs.DataValidator: Abstract base class for data validators.
adc_toolkit.data.ValidatedDataCatalog: Validated catalog using this default.

Notes

Lazy Import Mechanism: The function uses importlib.util.find_spec to check for package availability before importing. This allows the module to be imported even when validation libraries are not installed, with the ImportError only raised when the function is actually called.

Installation Options: Users should install the validation library that best fits their needs:

  • For comprehensive validation and data documentation: uv sync --group gx
  • For lightweight DataFrame validation: uv sync --group pandera
  • For both (if needed): uv sync --group gx --group pandera

Alternative Implementations: Users who don't want to use the defaults can:

  1. Implement a custom DataValidator subclass
  2. Use the NoValidator class (bypasses all validation, not recommended)
  3. Directly instantiate a specific validator and pass it to ValidatedDataCatalog

Warning Behavior: The fallback warning uses stacklevel=2 to ensure the warning appears to originate from the user's code that called this function, not from within the factory function itself. This makes it easier for users to identify where the fallback is being triggered.

Examples

Basic usage to get a default validator:

>>> from adc_toolkit.data.default_attributes import default_validator
>>> validator = default_validator("path/to/config")
>>> # validator is either GXValidator or PanderaValidator
>>> validated_df = validator.validate("my_dataset", df)

Using with ValidatedDataCatalog (typical usage pattern):

>>> from adc_toolkit.data import ValidatedDataCatalog
>>> validated_cat = ValidatedDataCatalog.in_directory("config/")
>>> # This internally calls default_validator("config/")
>>> df = validated_cat.load("my_dataset")  # Validates after loading

Handling the fallback warning:

>>> import warnings
>>> warnings.filterwarnings("ignore", message=".*PanderaValidator.*")
>>> validator = default_validator("config/")
>>> # Warning is suppressed if only Pandera is installed

Explicitly choosing a validator to avoid the default behavior:

>>> from adc_toolkit.data.validators.pandera import PanderaValidator
>>> from adc_toolkit.data.validators.gx import GXValidator
>>> # Choose GXValidator explicitly
>>> validator = GXValidator.in_directory("config/")

Handling the ImportError when no validators are installed:

>>> from adc_toolkit.data.validators.no_validator import NoValidator
>>> try:
...     validator = default_validator("config/")
... except ImportError:
...     print("No validators installed, using NoValidator")
...     validator = NoValidator()

Working with Path objects:

>>> from pathlib import Path
>>> config_dir = Path(__file__).parent / "config"
>>> validator = default_validator(config_dir)