Documentation

Pfg Anonymize

Reference docs for pydantic-fixturegen.

main

pfg anonymize

Capabilities

pfg anonymize rewrites JSON/JSONL payloads (files or entire directory trees) according to declarative rule files. Rules describe matchers, faker-based replacements, hashing strategies, and required budgets; the CLI enforces determinism via salts and entity fields so repeated runs produce identical sanitized output.

Typical use cases

  • Strip PII from production payloads before importing them into local QA/test environments.
  • Build redacted fixtures that still satisfy downstream validation because they are generated by the same providers.
  • Enforce privacy budgets in CI and emit machine-readable reports for audits.
  • Chain directly into pfg doctor after anonymizing to confirm coverage still holds.

Inputs & outputs

  • Input argument: file or directory. Directories are walked recursively while preserving structure under the destination.
  • Output: either --out/-o or a second positional argument. Required.
  • Rules: --rules points to a TOML/YAML/JSON file describing field-level strategies (faker, mask, hash, redact, etc.).
  • Report: optional JSON summary via --report capturing counts, anonymization outcomes, and privacy budget stats.

Flag reference

Determinism & privacy

  • --profile: apply built-in profiles (for example pii-safe, realistic, edge, adversarial) before loading the rules file.
  • --salt: override the hash/pseudonym salt. Combine with --entity-field (dotted path like user.id) to derive deterministic replacements per logical entity.

Budget control

  • --max-required-misses: override the allowed number of rules marked required that did not match.
  • --max-rule-failures: cap the number of strategy failures before aborting.

Doctor integration

  • --doctor-target: optional module passed to pfg doctor after anonymization completes.
  • --doctor-include, --doctor-exclude, --doctor-timeout, --doctor-memory-mb: forwarded to the doctor run so you can scope gap analysis.

Diagnostics

  • --json-errors: surface structured errors (missing rules file, invalid output, etc.).

Example workflows

Redact a directory and emit a report

pfg anonymize data/raw data/sanitized \
  --rules anonymize.toml --profile pii-safe \
  --entity-field account.id --salt rotate-2025-11 \
  --report reports/anonymize.json
Mirrors `data/raw` under `data/sanitized`, replaces sensitive fields as defined in `anonymize.toml`, and writes a JSON report with before/after samples and counters.

Sample output

Anonymized 12,800 records across 4 file(s).
Wrote anonymization report to reports/anonymize.json

Before/after excerpt (data/raw/users.jsondata/sanitized/users.json)

// input
{"email":"taylor@company.com","account":{"id":"acct-001"}}
// output
{"email":"sha256:3e6c...","account":{"id":"acct-001"},"name":"Perry Evans"}

Chain into doctor for coverage checks

pfg anonymize payloads.json payloads.clean.json \
  --rules anonymize.yaml \
  --doctor-target app/models.py --doctor-include app.models.User
After anonymization completes, `pfg doctor` runs to verify the models referenced by the payloads still have coverage.

Sample output

Anonymized 1,000 records across 1 file(s).
pfg doctor ./app/models.py --include app.models.User
Model app.models.User coverage OK (20/20 fields)

Report excerpt

{
  "records_processed": 1000,
  "strategies": {
    "faker.email": 1000,
    "hash.sha256": 1000
  },
  "required_rule_misses": 0
}

Operational notes

  • Input detection is format aware: JSON arrays produce .json, newline-delimited streams stay JSONL, and binary files are ignored with warnings.
  • The anonymizer logs aggregate stats (files processed, records rewritten) and, with --report, persists structured telemetry suitable for auditing.
  • Rule parsing errors surface as EmitError instances with pointers to the offending rule path and line number when available.
Edit this page