did - PII Removal

did pseudonymizes PII in text docs for safe LLM use. Replaces names with placeholders/fakes, matches person variants (e.g., John Doe = John D.).

This produces parametric documents: anonymized files with consistent entity placeholders (e.g., [PER1], [ORG1]). Swap parameters post-hoc for bias testing, multilingual prompts, or jurisdiction-specific fakes—enabling reproducible LLM analysis without retraining.

Why did?

Privacy: Local processing, minimal leakage.
Context: Tracks entities across variants.
Bias control: Gender/ethnicity swaps for fairness.
Legal compliance: De-ID for sharing/analysis.

Features

NER-based PII detection.
Configurable replacement (fake names, placeholders).
Person clustering for consistency.
CLI for batch/files.

Install

uv pip install https://github.com/evidlabel/did.git
did -h

Usage

did input.txt --output output.txt --mode fake --swap-gender

Options:

--mode placeholder|fake: Replace strategy.
--swap-gender|ethnicity: Neutralize bias.
--cluster: Merge person variants.

Example

Input: “John Doe (JD) sued Jane Smith.”

Output: “[PER1] ([PER1]) sued [PER2].” (or fakes: “Alex Lee sued Pat Kim”)

GitHub

evid - PDF Labelling hudoc - HUDOC Downloader