Documentation
Api
Reference docs for pydantic-fixturegen.
Python API reference
Use the same deterministic engine as the CLI without shelling out.
from pathlib import Path
from pydantic_fixturegen.api import (
generate_json,
generate_dataset,
generate_fixtures,
generate_schema,
DatasetGenerationResult,
JsonGenerationResult,
FixturesGenerationResult,
SchemaGenerationResult,
anonymize_payloads,
anonymize_from_rules,
)
All helpers return dataclasses defined in pydantic_fixturegen.api.models. Even though the project name emphasises “pydantic”, the Python API mirrors the CLI and happily drives stdlib @dataclass types and typing.TypedDict definitions alongside BaseModel subclasses—point target (or type_annotation) at any supported model family and overrides/presets behave the same way.
Every result exposes:
paths/path— filesystem locations written atomically.base_output— the resolved template root (OutputTemplateContext).config—ConfigSnapshotcapturing seed, include/exclude, RNG mode, time anchor, etc.warnings— tuple of warning strings (("provider_missing", "relation_warning", ...)).constraint_summary— field-level reports (when applicable).delegated—Truewhen a plugin handled generation (e.g., Polyfactory delegate).
Because these are dataclasses you can dataclasses.asdict(result) for logging or JSON serialization.
Generation helpers
generate_json
result: JsonGenerationResult = generate_json(
"./models.py",
out="artifacts/{model}/sample-{case_index}.json",
count=5,
jsonl=True,
indent=0,
use_orjson=True,
shard_size=1000,
include=["app.models.User"],
exclude=["app.models.Legacy*"],
seed=42,
preset="boundary",
profile="pii-safe",
freeze_seeds=True,
freeze_seeds_file=".pfg-seeds.json",
now="2025-11-08T12:00:00Z",
type_annotation=None,
)
| Parameter | Type | Notes |
|---|---|---|
target |
str/Path/None |
Module file that exports supported models (Pydantic, dataclasses, TypedDicts). Set to None when using type_annotation. |
out |
str/Path |
Templated OutputTemplate path ({model}, {case_index}, {timestamp} etc.). |
count, jsonl, indent, use_orjson, shard_size |
Scalars | Match the CLI flags (--n, --jsonl, --indent, --orjson, --shard-size). |
include / exclude |
sequence | Fully-qualified model globs. |
seed, preset, profile |
scalars | Deterministic knobs identical to the CLI. |
freeze_seeds, freeze_seeds_file |
bool / path | Manage per-model seeds via .pfg-seeds.json. |
now |
ISO timestamp | Overrides the deterministic “current time” anchor. |
type_annotation, type_label |
any | Generate data for arbitrary type expressions (mirrors pfg gen json --type). Cannot be combined with relation helpers or freeze files. |
collection_min_items, collection_max_items, collection_distribution |
scalars | Clamp collection lengths globally for the run (uniform, min-heavy, max-heavy). Defaults match [collections] config. |
Return fields: paths (tuple of written files), base_output, model (when exactly one model was emitted), config, warnings.
generate_dataset
dataset: DatasetGenerationResult = generate_dataset(
"./models.py",
out="warehouse/{model}-{case_index}.parquet",
count=250_000,
format="parquet",
shard_size=25_000,
compression="zstd",
include=["app.models.User"],
seed=7,
preset="boundary-max",
freeze_seeds=True,
preset="boundary-max",
profile="realistic",
respect_validators=True,
validator_max_retries=3,
relations={"app.models.Order.user_id": "app.models.User.id"},
max_depth=4,
cycle_policy="reuse",
rng_mode="portable",
)
| Parameter | Type | Notes |
|---|---|---|
format |
str |
"csv", "parquet", or "arrow". Requires the dataset extra for PyArrow. |
compression |
str/None |
"gzip" for CSV, "snappy", "zstd", "brotli", "lz4" for columnar formats. |
relations |
Mapping[str, str] |
Equivalent to CLI --link. |
respect_validators, validator_max_retries |
bool/int | Enforce model validators with bounded retries. |
max_depth, cycle_policy, rng_mode |
scalars | Mirror CLI recursion + RNG controls. |
collection_min_items, collection_max_items, collection_distribution |
scalars | Same semantics as generate_json; governs list/set/tuple/mapping lengths before schema clamps. |
The result mirrors JsonGenerationResult but includes dataset-specific metadata about shard counts and format.
generate_fixtures
fixtures = generate_fixtures(
"./models.py",
out="tests/fixtures/{model}_fixtures.py",
style="factory",
scope="session",
cases=4,
return_type="dict",
seed=99,
p_none=0.35,
include=["app.models.User"],
preset="boundary",
profile="pii-safe",
freeze_seeds=True,
)
| Parameter | Notes |
|---|---|
style |
"functions", "factory", or "class". Mirrors CLI --style. |
scope |
Pytest scope string ("function", "module", "session"). |
cases |
Number of parametrized cases per fixture. |
return_type |
"model" or "dict". |
p_none |
Overrides optional field probability. |
collection_min_items, collection_max_items, collection_distribution |
Control how many elements collections inside fixtures contain (clamped by schema constraints). |
Return values include path (written module), metadata (banner extras like digest, seed, style), and the resolved style/scope/return_type/cases.
generate_schema
schema = generate_schema(
"./models.py",
out="schema/{model}.json",
indent=2,
include=["app.models.User", "app.models.Address"],
profile="pii-safe",
)
Writes JSON Schema files atomically. include, exclude, and profile behave like the CLI. Useful for embedding schema generation into build scripts or documentation tooling.
Anonymizer helpers
anonymize_payloads
from pathlib import Path
from pydantic_fixturegen.api import anonymize_payloads
sanitized = anonymize_payloads(
payloads=[{"email": "alice@example.com", "name": "Alice"}],
rules_path=Path("anonymize.toml"),
profile="pii-safe",
salt="rotation-2025-11",
entity_field="account.id",
max_required_misses=0,
max_rule_failures=0,
)
| Parameter | Notes |
|---|---|
payloads |
Iterable of dicts or JSON strings. Use anonymize_from_rules when you want the CLI-style file/directory behaviour. |
rules_path |
TOML/YAML/JSON rules file describing matchers + strategies. |
profile |
Optional preset layering (e.g., pii-safe, realistic). |
salt, entity_field |
Control deterministic hashing / pseudonyms. |
max_required_misses, max_rule_failures |
Override privacy budgets. |
Returns a generator of sanitized payloads (same shape as input). Combine with the CLI to reuse reporting/diffing features.
anonymize_from_rules
anonymize_from_rules(
source=Path("data/raw"),
destination=Path("data/sanitized"),
rules_path=Path("anonymize.toml"),
profile="pii-safe",
report_path=Path("reports/anonymize.json"),
doctor_target=Path("models.py"),
)
- Mirrors
pfg anonymizebehaviour: accepts files or directories, mirrors structure underdestination, writes optional JSON reports, and can pipe sanitized output intopfg doctorautomatically. - Use when you want the CLI ergonomics but need to call from Python (e.g., inside a custom build step or notebook).
Tips
- Every helper accepts the same deterministic knobs as the CLI. When you add a new config key (seed, preset, privacy profile), you only need to set it once in
pyproject.tomlorpydantic-fixturegen.toml. - Results are dataclasses; call
asdict(result)orresult.model_dump()(via Pydantic) when you want to serialize metadata to JSON logs. - Combine these helpers with the
loggingreference to emit JSON logs from embedded runs, and withtestingwhen you want to drive pytest snapshots from code instead of shell scripts.