Documentation
Pfg Anonymize
Reference docs for pydantic-fixturegen.
pfg anonymize
Capabilities
pfg anonymize rewrites JSON/JSONL payloads (files or entire directory trees) according to declarative rule files. Rules describe matchers, faker-based replacements, hashing strategies, and required budgets; the CLI enforces determinism via salts and entity fields so repeated runs produce identical sanitized output.
Typical use cases
- Strip PII from production payloads before importing them into local QA/test environments.
- Build redacted fixtures that still satisfy downstream validation because they are generated by the same providers.
- Enforce privacy budgets in CI and emit machine-readable reports for audits.
- Chain directly into
pfg doctorafter anonymizing to confirm coverage still holds.
Inputs & outputs
- Input argument: file or directory. Directories are walked recursively while preserving structure under the destination.
- Output: either
--out/-oor a second positional argument. Required. - Rules:
--rulespoints to a TOML/YAML/JSON file describing field-level strategies (faker, mask, hash, redact, etc.). - Report: optional JSON summary via
--reportcapturing counts, anonymization outcomes, and privacy budget stats.
Flag reference
Determinism & privacy
--profile: apply built-in profiles (for examplepii-safe,realistic,edge,adversarial) before loading the rules file.--salt: override the hash/pseudonym salt. Combine with--entity-field(dotted path likeuser.id) to derive deterministic replacements per logical entity.
Budget control
--max-required-misses: override the allowed number of rules markedrequiredthat did not match.--max-rule-failures: cap the number of strategy failures before aborting.
Doctor integration
--doctor-target: optional module passed topfg doctorafter anonymization completes.--doctor-include,--doctor-exclude,--doctor-timeout,--doctor-memory-mb: forwarded to the doctor run so you can scope gap analysis.
Diagnostics
--json-errors: surface structured errors (missing rules file, invalid output, etc.).
Example workflows
Redact a directory and emit a report
pfg anonymize data/raw data/sanitized \
--rules anonymize.toml --profile pii-safe \
--entity-field account.id --salt rotate-2025-11 \
--report reports/anonymize.json
Sample output
Anonymized 12,800 records across 4 file(s).
Wrote anonymization report to reports/anonymize.json
Before/after excerpt (data/raw/users.json → data/sanitized/users.json)
// input
{"email":"taylor@company.com","account":{"id":"acct-001"}}
// output
{"email":"sha256:3e6c...","account":{"id":"acct-001"},"name":"Perry Evans"}
Chain into doctor for coverage checks
pfg anonymize payloads.json payloads.clean.json \
--rules anonymize.yaml \
--doctor-target app/models.py --doctor-include app.models.User
Sample output
Anonymized 1,000 records across 1 file(s).
pfg doctor ./app/models.py --include app.models.User
Model app.models.User coverage OK (20/20 fields)
Report excerpt
{
"records_processed": 1000,
"strategies": {
"faker.email": 1000,
"hash.sha256": 1000
},
"required_rule_misses": 0
}
Operational notes
- Input detection is format aware: JSON arrays produce
.json, newline-delimited streams stay JSONL, and binary files are ignored with warnings. - The anonymizer logs aggregate stats (files processed, records rewritten) and, with
--report, persists structured telemetry suitable for auditing. - Rule parsing errors surface as
EmitErrorinstances with pointers to the offending rule path and line number when available.