Documentation

Api

Reference docs for pydantic-fixturegen.

main

Python API reference

Use the same deterministic engine as the CLI without shelling out.

from pathlib import Path
from pydantic_fixturegen.api import (
    generate_json,
    generate_dataset,
    generate_fixtures,
    generate_schema,
    DatasetGenerationResult,
    JsonGenerationResult,
    FixturesGenerationResult,
    SchemaGenerationResult,
    anonymize_payloads,
    anonymize_from_rules,
)

All helpers return dataclasses defined in pydantic_fixturegen.api.models. Even though the project name emphasises “pydantic”, the Python API mirrors the CLI and happily drives stdlib @dataclass types and typing.TypedDict definitions alongside BaseModel subclasses—point target (or type_annotation) at any supported model family and overrides/presets behave the same way.

Every result exposes:

  • paths / path — filesystem locations written atomically.
  • base_output — the resolved template root (OutputTemplateContext).
  • configConfigSnapshot capturing seed, include/exclude, RNG mode, time anchor, etc.
  • warnings — tuple of warning strings (("provider_missing", "relation_warning", ...)).
  • constraint_summary — field-level reports (when applicable).
  • delegatedTrue when a plugin handled generation (e.g., Polyfactory delegate).

Because these are dataclasses you can dataclasses.asdict(result) for logging or JSON serialization.

Generation helpers

generate_json

result: JsonGenerationResult = generate_json(
    "./models.py",
    out="artifacts/{model}/sample-{case_index}.json",
    count=5,
    jsonl=True,
    indent=0,
    use_orjson=True,
    shard_size=1000,
    include=["app.models.User"],
    exclude=["app.models.Legacy*"],
    seed=42,
    preset="boundary",
    profile="pii-safe",
    freeze_seeds=True,
    freeze_seeds_file=".pfg-seeds.json",
    now="2025-11-08T12:00:00Z",
    type_annotation=None,
)
Parameter Type Notes
target str/Path/None Module file that exports supported models (Pydantic, dataclasses, TypedDicts). Set to None when using type_annotation.
out str/Path Templated OutputTemplate path ({model}, {case_index}, {timestamp} etc.).
count, jsonl, indent, use_orjson, shard_size Scalars Match the CLI flags (--n, --jsonl, --indent, --orjson, --shard-size).
include / exclude sequence Fully-qualified model globs.
seed, preset, profile scalars Deterministic knobs identical to the CLI.
freeze_seeds, freeze_seeds_file bool / path Manage per-model seeds via .pfg-seeds.json.
now ISO timestamp Overrides the deterministic “current time” anchor.
type_annotation, type_label any Generate data for arbitrary type expressions (mirrors pfg gen json --type). Cannot be combined with relation helpers or freeze files.
collection_min_items, collection_max_items, collection_distribution scalars Clamp collection lengths globally for the run (uniform, min-heavy, max-heavy). Defaults match [collections] config.

Return fields: paths (tuple of written files), base_output, model (when exactly one model was emitted), config, warnings.

generate_dataset

dataset: DatasetGenerationResult = generate_dataset(
    "./models.py",
    out="warehouse/{model}-{case_index}.parquet",
    count=250_000,
    format="parquet",
    shard_size=25_000,
    compression="zstd",
    include=["app.models.User"],
    seed=7,
    preset="boundary-max",
    freeze_seeds=True,
    preset="boundary-max",
    profile="realistic",
    respect_validators=True,
    validator_max_retries=3,
    relations={"app.models.Order.user_id": "app.models.User.id"},
    max_depth=4,
    cycle_policy="reuse",
    rng_mode="portable",
)
Parameter Type Notes
format str "csv", "parquet", or "arrow". Requires the dataset extra for PyArrow.
compression str/None "gzip" for CSV, "snappy", "zstd", "brotli", "lz4" for columnar formats.
relations Mapping[str, str] Equivalent to CLI --link.
respect_validators, validator_max_retries bool/int Enforce model validators with bounded retries.
max_depth, cycle_policy, rng_mode scalars Mirror CLI recursion + RNG controls.
collection_min_items, collection_max_items, collection_distribution scalars Same semantics as generate_json; governs list/set/tuple/mapping lengths before schema clamps.

The result mirrors JsonGenerationResult but includes dataset-specific metadata about shard counts and format.

generate_fixtures

fixtures = generate_fixtures(
    "./models.py",
    out="tests/fixtures/{model}_fixtures.py",
    style="factory",
    scope="session",
    cases=4,
    return_type="dict",
    seed=99,
    p_none=0.35,
    include=["app.models.User"],
    preset="boundary",
    profile="pii-safe",
    freeze_seeds=True,
)
Parameter Notes
style "functions", "factory", or "class". Mirrors CLI --style.
scope Pytest scope string ("function", "module", "session").
cases Number of parametrized cases per fixture.
return_type "model" or "dict".
p_none Overrides optional field probability.
collection_min_items, collection_max_items, collection_distribution Control how many elements collections inside fixtures contain (clamped by schema constraints).

Return values include path (written module), metadata (banner extras like digest, seed, style), and the resolved style/scope/return_type/cases.

generate_schema

schema = generate_schema(
    "./models.py",
    out="schema/{model}.json",
    indent=2,
    include=["app.models.User", "app.models.Address"],
    profile="pii-safe",
)

Writes JSON Schema files atomically. include, exclude, and profile behave like the CLI. Useful for embedding schema generation into build scripts or documentation tooling.

Anonymizer helpers

anonymize_payloads

from pathlib import Path
from pydantic_fixturegen.api import anonymize_payloads
sanitized = anonymize_payloads(
    payloads=[{"email": "alice@example.com", "name": "Alice"}],
    rules_path=Path("anonymize.toml"),
    profile="pii-safe",
    salt="rotation-2025-11",
    entity_field="account.id",
    max_required_misses=0,
    max_rule_failures=0,
)
Parameter Notes
payloads Iterable of dicts or JSON strings. Use anonymize_from_rules when you want the CLI-style file/directory behaviour.
rules_path TOML/YAML/JSON rules file describing matchers + strategies.
profile Optional preset layering (e.g., pii-safe, realistic).
salt, entity_field Control deterministic hashing / pseudonyms.
max_required_misses, max_rule_failures Override privacy budgets.

Returns a generator of sanitized payloads (same shape as input). Combine with the CLI to reuse reporting/diffing features.

anonymize_from_rules

anonymize_from_rules(
    source=Path("data/raw"),
    destination=Path("data/sanitized"),
    rules_path=Path("anonymize.toml"),
    profile="pii-safe",
    report_path=Path("reports/anonymize.json"),
    doctor_target=Path("models.py"),
)
  • Mirrors pfg anonymize behaviour: accepts files or directories, mirrors structure under destination, writes optional JSON reports, and can pipe sanitized output into pfg doctor automatically.
  • Use when you want the CLI ergonomics but need to call from Python (e.g., inside a custom build step or notebook).

Tips

  • Every helper accepts the same deterministic knobs as the CLI. When you add a new config key (seed, preset, privacy profile), you only need to set it once in pyproject.toml or pydantic-fixturegen.toml.
  • Results are dataclasses; call asdict(result) or result.model_dump() (via Pydantic) when you want to serialize metadata to JSON logs.
  • Combine these helpers with the logging reference to emit JSON logs from embedded runs, and with testing when you want to drive pytest snapshots from code instead of shell scripts.
Edit this page