OpenAI’s Open‑Source Privacy Filter: Local PII Detection Without Server Upload (Apache 2.0)

OpenAI released an Apache‑2.0 licensed 1.5B‑parameter Privacy Filter model that runs entirely locally via Transformers.js and WebGPU, detecting eight categories of personal data without sending any text to a server, while offering fine‑tuning and adjustable precision‑recall trade‑offs.

AI Engineering
AI Engineering
AI Engineering
OpenAI’s Open‑Source Privacy Filter: Local PII Detection Without Server Upload (Apache 2.0)

Model Overview

OpenAI released an open‑source Privacy Filter model under Apache 2.0. The model has 1.5 B parameters, with a sparse Mixture‑of‑Experts architecture that activates only 50 M parameters per inference, enabling execution on ordinary hardware. It detects eight predefined personally identifiable information (PII) categories—names, addresses, phone numbers, email addresses, account identifiers, secret data, etc.—and supports multilingual input. The model runs fully locally via Transformers.js and WebGPU, so inference stays on the user’s device and no text is transmitted to a server.

Key technical specifications:

Context window: 128 k tokens, allowing processing of long documents without chunking.

Runtime controls to trade precision against recall.

Fine‑tuning capability to adapt the eight categories to domain‑specific rules.

Usage Examples

Python (Transformers) – quick pipeline call :

# Use pipeline for quick inference
from transformers import pipeline

pipe = pipeline("token-classification", model="openai/privacy-filter")

# Direct model loading
from transformers import AutoModel
model = AutoModel.from_pretrained("openai/privacy-filter", dtype="auto")

Browser (Transformers.js) – WebGPU execution :

// Install dependency
npm i @huggingface/transformers
import { pipeline } from '@huggingface/transformers';

// Initialize pipeline on WebGPU
const classifier = await pipeline('token-classification', 'openai/privacy-filter', { device: 'webgpu', dtype: 'q4' });

// Example input
const input = 'My name is Harry Potter and my email is [email protected].';
const output = await classifier(input, { aggregation_strategy: 'simple' });
console.dir(output, { depth: null });

Sample output for the above input:

[
  {
    entity_group: 'private_person',
    score: 0.9999957978725433,
    word: ' Harry Potter'
  },
  {
    entity_group: 'private_email',
    score: 0.9999990728166368,
    word: ' [email protected]'
  }
]

Adoption Metrics

In the first month after release, the repository on Hugging Face recorded over 305 000 downloads, 36 fine‑tuned variants, and 38 related application spaces. Users reported very high detection accuracy for email addresses; Chinese‑language PII detection was noted as less reliable.

Official Limitations

Only the eight predefined PII types are supported; adding new categories requires fine‑tuning and cannot be changed at runtime.

Performance degrades on non‑English or non‑Latin scripts, with higher miss rates for low‑frequency names and identifier formats.

False positives may occur, such as labeling common nouns or public entities as private data, and some domain‑specific identifiers may be missed.

In high‑sensitivity scenarios, human review remains necessary; the model should be treated as an assistive tool rather than a fully automated compliance solution.

Model repository: https://huggingface.co/openai/privacy-filter

Demo space (WebGPU): https://huggingface.co/spaces/webml-community/privacy-filter-webgpu

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

OpenAIWebGPUTransformers.jsApache-2.0privacy filterPII detection
AI Engineering
Written by

AI Engineering

Focused on cutting‑edge product and technology information and practical experience sharing in the AI field (large models, MLOps/LLMOps, AI application development, AI infrastructure).

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.