did - PII Removal

did - PII Removal

did pseudonymizes PII in text docs for safe LLM use. Replaces names with placeholders/fakes, matches person variants (e.g., John Doe = John D.).

This produces parametric documents: anonymized files with consistent entity placeholders (e.g., [PER1], [ORG1]). Swap parameters post-hoc for bias testing, multilingual prompts, or jurisdiction-specific fakes—enabling reproducible LLM analysis without retraining.

Why did?

  • Privacy: Local processing, minimal leakage.
  • Context: Tracks entities across variants.
  • Bias control: Gender/ethnicity swaps for fairness.
  • Legal compliance: De-ID for sharing/analysis.

Features

  • NER-based PII detection.
  • Configurable replacement (fake names, placeholders).
  • Person clustering for consistency.
  • CLI for batch/files.

Install

uv pip install https://github.com/evidlabel/did.git
did -h

Usage

did input.txt --output output.txt --mode fake --swap-gender

Options:

  • --mode placeholder|fake: Replace strategy.
  • --swap-gender|ethnicity: Neutralize bias.
  • --cluster: Merge person variants.

Example

Input: “John Doe (JD) sued Jane Smith.”

Output: “[PER1] ([PER1]) sued [PER2].” (or fakes: “Alex Lee sued Pat Kim”)

GitHub