OpenAI’s Open‑Source Privacy Filter: Local PII Detection Without Server Upload (Apache 2.0)
OpenAI released an Apache‑2.0 licensed 1.5B‑parameter Privacy Filter model that runs entirely locally via Transformers.js and WebGPU, detecting eight categories of personal data without sending any text to a server, while offering fine‑tuning and adjustable precision‑recall trade‑offs.
Model Overview
OpenAI released an open‑source Privacy Filter model under Apache 2.0. The model has 1.5 B parameters, with a sparse Mixture‑of‑Experts architecture that activates only 50 M parameters per inference, enabling execution on ordinary hardware. It detects eight predefined personally identifiable information (PII) categories—names, addresses, phone numbers, email addresses, account identifiers, secret data, etc.—and supports multilingual input. The model runs fully locally via Transformers.js and WebGPU, so inference stays on the user’s device and no text is transmitted to a server.
Key technical specifications:
Context window: 128 k tokens, allowing processing of long documents without chunking.
Runtime controls to trade precision against recall.
Fine‑tuning capability to adapt the eight categories to domain‑specific rules.
Usage Examples
Python (Transformers) – quick pipeline call :
# Use pipeline for quick inference
from transformers import pipeline
pipe = pipeline("token-classification", model="openai/privacy-filter")
# Direct model loading
from transformers import AutoModel
model = AutoModel.from_pretrained("openai/privacy-filter", dtype="auto")Browser (Transformers.js) – WebGPU execution :
// Install dependency
npm i @huggingface/transformers
import { pipeline } from '@huggingface/transformers';
// Initialize pipeline on WebGPU
const classifier = await pipeline('token-classification', 'openai/privacy-filter', { device: 'webgpu', dtype: 'q4' });
// Example input
const input = 'My name is Harry Potter and my email is [email protected].';
const output = await classifier(input, { aggregation_strategy: 'simple' });
console.dir(output, { depth: null });Sample output for the above input:
[
{
entity_group: 'private_person',
score: 0.9999957978725433,
word: ' Harry Potter'
},
{
entity_group: 'private_email',
score: 0.9999990728166368,
word: ' [email protected]'
}
]Adoption Metrics
In the first month after release, the repository on Hugging Face recorded over 305 000 downloads, 36 fine‑tuned variants, and 38 related application spaces. Users reported very high detection accuracy for email addresses; Chinese‑language PII detection was noted as less reliable.
Official Limitations
Only the eight predefined PII types are supported; adding new categories requires fine‑tuning and cannot be changed at runtime.
Performance degrades on non‑English or non‑Latin scripts, with higher miss rates for low‑frequency names and identifier formats.
False positives may occur, such as labeling common nouns or public entities as private data, and some domain‑specific identifiers may be missed.
In high‑sensitivity scenarios, human review remains necessary; the model should be treated as an assistive tool rather than a fully automated compliance solution.
Model repository: https://huggingface.co/openai/privacy-filter
Demo space (WebGPU): https://huggingface.co/spaces/webml-community/privacy-filter-webgpu
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
AI Engineering
Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
