Documentation

Quickstart

Reference docs for pydantic-fixturegen.

main

Quickstart: deterministic data in minutes

Install the CLI, generate JSON/datasets/fixtures, diff them, and lock in deterministic snapshots.

Need ready-to-run snippets? Check the new examples page for a shared model module plus CLI/Python variants of every command.

0. Before you begin

python -m pip install --upgrade pip
pip install "pydantic-fixturegen"
pfg --version
  • Add extras that match your stack (openapi for schema ingestion, fastapi for smoke/mock commands, dataset for CSV/Parquet/Arrow, polyfactory if you already have ModelFactory classes, seed for SQLModel + Beanie). E.g., pip install "pydantic-fixturegen[openapi,fastapi,dataset]".
  • Run pfg --help once to ensure the entry point is on your PATH. All CLI examples assume a POSIX shell; on Windows, swap \ for ^ when you wrap long commands.

1. Scaffold a model module

Create models.py anywhere inside your repo:

from pydantic import BaseModel, Field
class Address(BaseModel):
    street: str
    city: str
class User(BaseModel):
    id: int
    name: str
    nickname: str | None = None
    address: Address
    email: str = Field(regex=r".+@example.org$")

Inspect it with pfg list models.py --include models.User to confirm discovery works. Need a config file? pfg init --seed 42 --json-indent 2 writes TOML/YAML templates so you can persist defaults (seed, presets, emitters).

2. Generate JSON and datasets

JSON / JSONL / sharded payloads

pfg gen json ./models.py \
  --include models.User \
  --out artifacts/{model}/sample-{case_index}.json \
  --n 5 \
  --indent 2 \
  --seed 42 \
  --freeze-seeds \
  --preset boundary \
  --watch
  • --include narrows generation when the module exports multiple models.
  • --out accepts placeholders ({model}, {case_index}, {timestamp}) so you can route artifacts per model/shard.
  • Use --jsonl for newline-delimited output or --shard-size to split runs without buffering everything in memory.
  • --seed and --freeze-seeds keep runs deterministic even after adding/removing models; the header banner records seed, version, and model-digest.
  • --preset boundary applies opinionated constraints (higher optional None frequency, min/max numeric bias). Add --respect-validators if your models enforce @field_validator rules.

CSV / Parquet / Arrow datasets

pfg gen dataset ./models.py \
  --include models.User \
  --format parquet \
  --compression zstd \
  --n 100000 \
  --shard-size 25000 \
  --out warehouse/users.parquet
  • Install the dataset extra to pull in PyArrow.
  • --format csv|parquet|arrow controls the sink; CSVs stream row-by-row (add --compression gzip for .csv.gz), while columnar formats flush in batches.
  • Shards/multiple files use the same templating rules as JSON emission, and every dataset includes a cycles column if recursion policies (max depth, cycle policy) fire.

3. Emit pytest fixtures or seeds

pfg gen fixtures ./models.py \
  --include models.User \
  --out tests/fixtures/test_users.py \
  --style functions \
  --scope module \
  --cases 3 \
  --return-type model
  • Swap --style factory or --style class to match the test style you prefer.
  • --return-type dict keeps fixtures JSON-serialisable.
  • The generated module includes a banner with seed + digest so diffs are easy to audit.

Need a populated database for integration tests?

pfg gen seed sqlmodel ./models.py \
  --database sqlite:///seed.db \
  --include models.User \
  --n 25 \
  --create-schema \
  --truncate \
  --rollback

For MongoDB stacks, replace sqlmodel with beanie and install the seed extra. Both commands honour the same determinism knobs (--seed, --preset, --link, --with-related, --respect-validators, --max-depth, --on-cycle, --rng-mode).

4. Diff & snapshot artifacts

pfg diff ./models.py \
  --json-out artifacts/users.json \
  --fixtures-out tests/fixtures/test_users.py \
  --schema-out schema \
  --show-diff \
  --seed 42 \
  --freeze-seeds

Diff reruns generation in-memory and compares against existing files. Combine with pfg check to validate configs without writing new files.

Lock in deterministic reviews with snapshots:

pfg snapshot verify ./models.py \
  --json-out artifacts/users.json \
  --fixtures-out tests/fixtures/test_users.py \
  --seed 42
pfg snapshot write ./models.py \
  --json-out artifacts/users.json \
  --fixtures-out tests/fixtures/test_users.py \
  --seed 42

Pair snapshot commands with the pytest plugin to let pfg_snapshot assertions update on demand (pytest --pfg-update-snapshots=update) while CI runs pfg snapshot verify to block regressions.

Finally, capture a coverage manifest and enforce it in CI:

pfg lock ./models.py --lockfile .pfg-lock.json
pfg verify ./models.py --lockfile .pfg-lock.json

5. Layer on advanced workflows

Watch & explain

  • Add --watch --watch-debounce 0.5 to any generation command for live regeneration.
  • Use pfg gen explain ./models.py --tree --include models.User to visualise heuristic/provider choices when debugging deterministic output.

Schema ingestion & OpenAPI examples

pfg gen json --schema contracts/user.schema.json --out artifacts/{model}.json
pfg gen openapi api.yaml --route "GET /users" --out openapi/{model}.json
pfg gen examples api.yaml --out api.examples.yaml

The openapi extra pulls in datamodel-code-generator and PyYAML so schema ingestion, OpenAPI fan-out, and example injection require no manual scaffolding.

FastAPI smoke tests & mock servers

pfg fastapi smoke app.main:app --out tests/test_fastapi_smoke.py
pfg fastapi serve app.main:app --port 8050 --seed 7

Dependency overrides (--dependency-override original=stub) let you bypass auth/session providers, and both commands reuse your deterministic settings so contract drift shows up as diff noise immediately.

Polyfactory interoperability

# auto-detect ModelFactory subclasses when the polyfactory extra is installed
pfg gen json ./models.py --include models.User --seed 15
# export wrapper factories that call fixturegen under the hood
pfg gen polyfactory ./models.py --out tests/factories_pfg.py --seed 15

Set [polyfactory] prefer_delegation = true in your config to let fixturegen defer to existing factories while still controlling seeds, presets, and relation wiring.

Datasets + anonymizer + FastAPI?

See Emitters for CSV/Parquet/Arrow details, anonymize for rule syntax, and features for FastAPI, Hypothesis, and seeding highlights.

Next steps

Because every command shares the same deterministic engine, once you lock in seeds/presets, any new workflow (snapshots, anonymizer, FastAPI smoke tests, Polyfactory exports) becomes a copy-paste away.

Edit this page