Cookbook: Ship deterministic data with focused recipes

Apply task-sized patterns for scaling JSON, tuning pytest fixtures, freezing seeds, extending providers, and inspecting explain trees.

Recipe 1 — Shard massive JSON/JSONL outputs

You need predictable files when generating thousands of records.

pfg gen json ./models.py \

  --include app.models.User \

  --n 20000 \

  --jsonl \

  --shard-size 1000 \

  --out "artifacts/{model}/shard-{case_index}.jsonl" \

  --indent 0 \

  --orjson

Use --jsonl to stream one record per line.
--shard-size controls how many records land in each file; a final shard uses the remaining count.
--orjson activates the optional high-performance encoder (pip install 'pydantic-fixturegen[orjson]').
Template placeholders keep shards organised per model and shard index.

Validate the emitters by checking the metadata banner for seed, version, and digest.

Switch fixture ergonomics without editing generated code.

pfg gen fixtures ./models.py \

  --out tests/fixtures/test_models.py \

  --style factory \

  --scope session \

  --cases 5 \

  --return-type dict

--style accepts functions, factory, or class. Pick the format that matches your test suite guidelines.
--scope accepts function, module, or session. Choose wider scopes to reuse expensive models.
--cases parametrises fixtures with deterministic data; indexes become request.param.
--return-type dict converts fixtures into dictionaries for JSON-like assertions.

Pair this recipe with Ruff’s formatter or pytest --fixtures to verify output.

Keep generated data stable across machines by freezing per-model seeds.

pfg gen json ./models.py \

  --out ./out/users.json \

  --freeze-seeds \

  --freeze-seeds-file .pfg-seeds.json

The freeze file stores model digests and derived seeds.
When digests change, the CLI prints seed_freeze_stale warnings so you can review updated fixtures.
Combine with --preset boundary to explore edge cases while staying deterministic.

Review the freeze file:

  "version": 1,

  "models": {

    "app.models.User": {

      "seed": 412067183,

      "model_digest": "8d3db06f…"

Commit the file to source control when you want cross-team consistency.

Extend generation without forking core code.

# plugins/nickname_boost.py

from pydantic import BaseModel

from pydantic_fixturegen.plugins.hookspecs import hookimpl

class NicknameBoost:

    @hookimpl

    def pfg_modify_strategy(self, model: type[BaseModel], field_name: str, strategy):

        if model.__name__ == "User" and field_name == "nickname":

            return strategy.model_copy(update={"p_none": 0.1})

        return None

plugin = NicknameBoost()

from pydantic_fixturegen.core.providers import create_default_registry

from pydantic_fixturegen.plugins.loader import register_plugin

from plugins.nickname_boost import plugin

registry = create_default_registry(load_plugins=False)

register_plugin(registry, plugin)

The hook returns a modified strategy for the matching field; return None to fall back to defaults.
Use entry points under pydantic_fixturegen for discovery in packages.
Inspect the effect with pfg gen explain --tree.

Expose provider choices to understand constraint handling.

pfg gen explain ./models.py --tree --max-depth 2 --include app.models.User

--tree prints an indented diagram showing providers, presets, and policy tweaks.
--json emits a machine-readable payload suitable for CI or dashboards.
--max-depth limits recursion when dealing with large nested models.

Use --json together with jq for targeted checks:

pfg gen explain ./models.py --json | jq '.models["app.models.User"].fields.nickname'

You see probability adjustments, active presets, and provider names, making policy mismatches easy to spot.

Lean on the built-in pytest helper to keep JSON, fixtures, or schema outputs in sync with snapshots.

from pathlib import Path

from pydantic_fixturegen.testing import JsonSnapshotConfig

def test_user_snapshot(pfg_snapshot):

    config = JsonSnapshotConfig(out=Path("tests/snapshots/users.json"), indent=2)

    pfg_snapshot.assert_artifacts(

        target="./models.py",

        json=config,

        include=["app.models.User"],

The pfg_snapshot fixture ships via the pytest11 entry point; no manual plugin registration needed.
Pass one or more configs (JsonSnapshotConfig, FixturesSnapshotConfig, SchemaSnapshotConfig) to cover the artifacts you want to track.
Run pytest --pfg-update-snapshots=update or set PFG_SNAPSHOT_UPDATE=update to refresh snapshots when behaviour changes; by default the helper fails with the unified diff from pfg diff.
When pytest-regressions is installed, pytest --force-regen refreshes snapshots but still fails the test, and pytest --regen-all refreshes snapshots while letting the suite pass.
For one-off overrides, tag a test with @pytest.mark.pfg_snapshot_config(update="update", timeout=10) to opt into updates or tweak sandbox limits without affecting neighbouring tests.
All pfg diff knobs are available—include, exclude, seed, preset, freeze_seeds, etc.—so update determinism is preserved.
Outside pytest, reach for the CLI helpers: pfg snapshot verify ... to assert snapshots in CI and pfg snapshot write ... to regenerate JSON/fixtures/schema artifacts in bulk.

Use this in tandem with Recipe 3 (freeze seeds) to keep fixtures stable across machines.

Reflect regional datasets by mapping models or fields to specific locales.

[tool.pydantic_fixturegen.locales]

"app.models.Customer.*" = "de_DE"

"app.models.Customer.email" = "en_GB"

Patterns accept glob wildcards or regex (re: prefix).
Field matches override broader model patterns, so the example keeps German defaults while forcing .email to the UK locale.
To blanket a model you can omit the trailing .* ("app.models.Customer") or use the bare class name ("Customer").
Combine with pfg gen json --seed 42 to verify deterministic outputs across locales.
See the configuration reference at docs/configuration.md for deeper details and validation rules.

Avoid runaway allocations by capping shapes and dtypes globally.

[tool.pydantic_fixturegen.arrays]

max_ndim = 2

max_side = 4

max_elements = 16

dtypes = ["float32", "int16"]

Requires the numpy extra (pip install pydantic-fixturegen[numpy]).
Fields annotated as numpy.ndarray or numpy.typing.NDArray[...] automatically use these caps.
Deterministic seeds flow through SeedManager.numpy_for, so array contents remain stable across runs.
Combine with field policies when you need to override other behaviours (for example p_none).

Use the built-in privacy bundles when you need obviously synthetic datasets in CI but richer identifiers locally.

pfg gen json ./models.py --out snapshots/users.json --profile pii-safe

pfg diff ./models.py --json-out snapshots/users.json --profile realistic

pii-safe masks emails/URLs/IPs with reserved example values and makes optional PII fields far more likely to be None.
realistic keeps optional contact fields populated and re-enables full identifier distributions.
Set [tool.pydantic_fixturegen].profile = "pii-safe" for CI defaults, then override locally with PFG_PROFILE=realistic or --profile realistic.

Automate regeneration with watch mode: pfg gen fixtures ... --watch.
Diff outputs before committing: pfg diff models.py --fixtures-out tests/fixtures/test_models.py --show-diff.
Harden imports by running pfg doctor with --fail-on-gaps 0 and review the structured report.