Documentation
Cookbook
Reference docs for pydantic-fixturegen.
Cookbook: Ship deterministic data with focused recipes
Apply task-sized patterns for scaling JSON, tuning pytest fixtures, freezing seeds, extending providers, and inspecting explain trees.
Recipe 1 — Shard massive JSON/JSONL outputs
You need predictable files when generating thousands of records.
pfg gen json ./models.py \
--include app.models.User \
--n 20000 \
--jsonl \
--shard-size 1000 \
--out "artifacts/{model}/shard-{case_index}.jsonl" \
--indent 0 \
--orjson
- Use
--jsonlto stream one record per line. --shard-sizecontrols how many records land in each file; a final shard uses the remaining count.--orjsonactivates the optional high-performance encoder (pip install 'pydantic-fixturegen[orjson]').- Template placeholders keep shards organised per model and shard index.
Validate the emitters by checking the metadata banner for seed, version, and digest.
Recipe 2 — Adjust pytest fixture style and scope
Switch fixture ergonomics without editing generated code.
pfg gen fixtures ./models.py \
--out tests/fixtures/test_models.py \
--style factory \
--scope session \
--cases 5 \
--return-type dict
--styleacceptsfunctions,factory, orclass. Pick the format that matches your test suite guidelines.--scopeacceptsfunction,module, orsession. Choose wider scopes to reuse expensive models.--casesparametrises fixtures with deterministic data; indexes becomerequest.param.--return-type dictconverts fixtures into dictionaries for JSON-like assertions.
Pair this recipe with Ruff’s formatter or pytest --fixtures to verify output.
Recipe 3 — Freeze seeds for reproducible CI diffs
Keep generated data stable across machines by freezing per-model seeds.
pfg gen json ./models.py \
--out ./out/users.json \
--freeze-seeds \
--freeze-seeds-file .pfg-seeds.json
- The freeze file stores model digests and derived seeds.
- When digests change, the CLI prints
seed_freeze_stalewarnings so you can review updated fixtures. - Combine with
--preset boundaryto explore edge cases while staying deterministic.
Review the freeze file:
{
"version": 1,
"models": {
"app.models.User": {
"seed": 412067183,
"model_digest": "8d3db06f…"
}
}
}
Commit the file to source control when you want cross-team consistency.
Recipe 4 — Register a custom provider via Pluggy
Extend generation without forking core code.
# plugins/nickname_boost.py
from pydantic import BaseModel
from pydantic_fixturegen.plugins.hookspecs import hookimpl
class NicknameBoost:
@hookimpl
def pfg_modify_strategy(self, model: type[BaseModel], field_name: str, strategy):
if model.__name__ == "User" and field_name == "nickname":
return strategy.model_copy(update={"p_none": 0.1})
return None
plugin = NicknameBoost()
Register the plugin and run generation:
from pydantic_fixturegen.core.providers import create_default_registry
from pydantic_fixturegen.plugins.loader import register_plugin
from plugins.nickname_boost import plugin
registry = create_default_registry(load_plugins=False)
register_plugin(registry, plugin)
- The hook returns a modified strategy for the matching field; return
Noneto fall back to defaults. - Use entry points under
pydantic_fixturegenfor discovery in packages. - Inspect the effect with
pfg gen explain --tree.
Recipe 5 — Debug strategies with explain trees
Expose provider choices to understand constraint handling.
pfg gen explain ./models.py --tree --max-depth 2 --include app.models.User
--treeprints an indented diagram showing providers, presets, and policy tweaks.--jsonemits a machine-readable payload suitable for CI or dashboards.--max-depthlimits recursion when dealing with large nested models.
Use --json together with jq for targeted checks:
pfg gen explain ./models.py --json | jq '.models["app.models.User"].fields.nickname'
You see probability adjustments, active presets, and provider names, making policy mismatches easy to spot.
Recipe 6 — Assert artifacts with pytest snapshots
Lean on the built-in pytest helper to keep JSON, fixtures, or schema outputs in sync with snapshots.
from pathlib import Path
from pydantic_fixturegen.testing import JsonSnapshotConfig
def test_user_snapshot(pfg_snapshot):
config = JsonSnapshotConfig(out=Path("tests/snapshots/users.json"), indent=2)
pfg_snapshot.assert_artifacts(
target="./models.py",
json=config,
include=["app.models.User"],
)
- The
pfg_snapshotfixture ships via thepytest11entry point; no manual plugin registration needed. - Pass one or more configs (
JsonSnapshotConfig,FixturesSnapshotConfig,SchemaSnapshotConfig) to cover the artifacts you want to track. - Run
pytest --pfg-update-snapshots=updateor setPFG_SNAPSHOT_UPDATE=updateto refresh snapshots when behaviour changes; by default the helper fails with the unified diff frompfg diff. - When
pytest-regressionsis installed,pytest --force-regenrefreshes snapshots but still fails the test, andpytest --regen-allrefreshes snapshots while letting the suite pass. - For one-off overrides, tag a test with
@pytest.mark.pfg_snapshot_config(update="update", timeout=10)to opt into updates or tweak sandbox limits without affecting neighbouring tests. - All
pfg diffknobs are available—include,exclude,seed,preset,freeze_seeds, etc.—so update determinism is preserved. - Outside pytest, reach for the CLI helpers:
pfg snapshot verify ...to assert snapshots in CI andpfg snapshot write ...to regenerate JSON/fixtures/schema artifacts in bulk.
Use this in tandem with Recipe 3 (freeze seeds) to keep fixtures stable across machines.
Recipe 7 — Mix Faker locales per model
Reflect regional datasets by mapping models or fields to specific locales.
[tool.pydantic_fixturegen.locales]
"app.models.Customer.*" = "de_DE"
"app.models.Customer.email" = "en_GB"
- Patterns accept glob wildcards or regex (
re:prefix). - Field matches override broader model patterns, so the example keeps German defaults while forcing
.emailto the UK locale. - To blanket a model you can omit the trailing
.*("app.models.Customer") or use the bare class name ("Customer"). - Combine with
pfg gen json --seed 42to verify deterministic outputs across locales. - See the configuration reference at docs/configuration.md for deeper details and validation rules.
Recipe 8 — Generate NumPy arrays safely
Avoid runaway allocations by capping shapes and dtypes globally.
[tool.pydantic_fixturegen.arrays]
max_ndim = 2
max_side = 4
max_elements = 16
dtypes = ["float32", "int16"]
- Requires the
numpyextra (pip install pydantic-fixturegen[numpy]). - Fields annotated as
numpy.ndarrayornumpy.typing.NDArray[...]automatically use these caps. - Deterministic seeds flow through
SeedManager.numpy_for, so array contents remain stable across runs. - Combine with field policies when you need to override other behaviours (for example
p_none).
Recipe 9 — Enforce privacy profiles per environment
Use the built-in privacy bundles when you need obviously synthetic datasets in CI but richer identifiers locally.
pfg gen json ./models.py --out snapshots/users.json --profile pii-safe
pfg diff ./models.py --json-out snapshots/users.json --profile realistic
pii-safemasks emails/URLs/IPs with reserved example values and makes optional PII fields far more likely to beNone.realistickeeps optional contact fields populated and re-enables full identifier distributions.- Set
[tool.pydantic_fixturegen].profile = "pii-safe"for CI defaults, then override locally withPFG_PROFILE=realisticor--profile realistic.
More plays
- Automate regeneration with watch mode:
pfg gen fixtures ... --watch. - Diff outputs before committing:
pfg diff models.py --fixtures-out tests/fixtures/test_models.py --show-diff. - Harden imports by running
pfg doctorwith--fail-on-gaps 0and review the structured report.