`pfg anonymize`

Capabilities

pfg anonymize rewrites JSON/JSONL payloads (files or entire directory trees) according to declarative rule files. Rules describe matchers, faker-based replacements, hashing strategies, and required budgets; the CLI enforces determinism via salts and entity fields so repeated runs produce identical sanitized output.

Typical use cases

Strip PII from production payloads before importing them into local QA/test environments.
Build redacted fixtures that still satisfy downstream validation because they are generated by the same providers.
Enforce privacy budgets in CI and emit machine-readable reports for audits.
Chain directly into pfg doctor after anonymizing to confirm coverage still holds.

Inputs & outputs

Input argument: file or directory. Directories are walked recursively while preserving structure under the destination.
Output: either --out/-o or a second positional argument. Required.
Rules: --rules points to a TOML/YAML/JSON file describing field-level strategies (faker, mask, hash, redact, etc.).
Report: optional JSON summary via --report capturing counts, anonymization outcomes, and privacy budget stats.

Flag reference

Determinism & privacy

--profile: apply built-in profiles (for example pii-safe, realistic, edge, adversarial) before loading the rules file.
--salt: override the hash/pseudonym salt. Combine with --entity-field (dotted path like user.id) to derive deterministic replacements per logical entity.

Budget control

--max-required-misses: override the allowed number of rules marked required that did not match.
--max-rule-failures: cap the number of strategy failures before aborting.

Doctor integration

--doctor-target: optional module passed to pfg doctor after anonymization completes.
--doctor-include, --doctor-exclude, --doctor-timeout, --doctor-memory-mb: forwarded to the doctor run so you can scope gap analysis.

Diagnostics

--json-errors: surface structured errors (missing rules file, invalid output, etc.).

Example workflows

Redact a directory and emit a report

pfg anonymize data/raw data/sanitized \

  --rules anonymize.toml --profile pii-safe \

  --entity-field account.id --salt rotate-2025-11 \

  --report reports/anonymize.json

Mirrors `data/raw` under `data/sanitized`, replaces sensitive fields as defined in `anonymize.toml`, and writes a JSON report with before/after samples and counters.

Sample output

Anonymized 12,800 records across 4 file(s).

Wrote anonymization report to reports/anonymize.json

Before/after excerpt (data/raw/users.json → data/sanitized/users.json)

// input

{"email":"taylor@company.com","account":{"id":"acct-001"}}

// output

{"email":"sha256:3e6c...","account":{"id":"acct-001"},"name":"Perry Evans"}

Chain into doctor for coverage checks

pfg anonymize payloads.json payloads.clean.json \

  --rules anonymize.yaml \

  --doctor-target app/models.py --doctor-include app.models.User

After anonymization completes, `pfg doctor` runs to verify the models referenced by the payloads still have coverage.

Sample output

Anonymized 1,000 records across 1 file(s).

pfg doctor ./app/models.py --include app.models.User

Model app.models.User coverage OK (20/20 fields)

Report excerpt

  "records_processed": 1000,

  "strategies": {

    "faker.email": 1000,

    "hash.sha256": 1000

},

  "required_rule_misses": 0

Operational notes

Input detection is format aware: JSON arrays produce .json, newline-delimited streams stay JSONL, and binary files are ignored with warnings.
The anonymizer logs aggregate stats (files processed, records rewritten) and, with --report, persists structured telemetry suitable for auditing.
Rule parsing errors surface as EmitError instances with pointers to the offending rule path and line number when available.

Edit this page

pfg anonymize