OpenAI has quietly dropped something worth paying attention to. Released on Hugging Face under the Apache 2.0 License, Privacy filter It is an open, bidirectional token classification model specifically designed to detect and refine personally identifiable information (PII) in text. It’s small enough to run in a web browser or on a laptop and fast enough for high-throughput data sanitization pipelines.
What does he do?
The privacy filter is a named entity recognition (NER) model but one that is tuned specifically for your privacy use case. Detects eight categories of sensitive extensions: account_number, private_address, private_email, private_person, private_phone, private_url, private_dateand secret. the secret The class covers credential formats, project-specific token patterns, and high-entropy strings – the pattern card explicitly calls out missing detection of “new credential formats” and “secrets split across surrounding syntax” as known failure modes, which indicates what the class is trained to target.
The intended use case is clear: development teams that need to clean datasets, clean logs, or preprocess user-generated content before it enters the training pipeline or is stored in a data warehouse. And because it runs on-premises and on commodity hardware, it fits directly into the growing set of deployable AI tools at the edge that organizations can adopt without routing sensitive data to a third-party API.

Architecture is the real story
The privacy filter contains 1.5 billion total parameters but only 50 million parameters are active at inference time. This gap, approximately 30 times, is entirely explained by the sparse feedforward design of the model.
Architecturally, the model is “similar to gpt-oss, albeit on a smaller size.” It is built on 8 pre-standard transformer blocks with residual current width (d_model) of 640. Attention uses Gathered Query Attention (GQA) with Rotary Positional Embeddings (RoPE) – 14 query vertices over two KV vertices, meaning 7 query vertices share each KV – which significantly reduces the memory footprint of the key-value cache compared to standard multi-vertical vertices. RoPE is also what enables the model context window of 128,000 tokens. The feedforward layers use sparse MoE with a total of 128 experts and top-4 routing for each token: for each token, 4 of the 128 experts are activated, and all other expert parameters remain idle. This is exactly the mechanism that produces a 30x gap between the total number of active parameters.
Three-stage training pipeline
What makes this model architecturally unusual is not only its size, but the way it was built. Privacy Filter was produced in Three distinct stages.
Firstlyis regression pre-trained as a standard model for the next code prediction language – in imitation of GPT-style decoders. second,This checkpoint was transformed geometrically: the language model head was replaced with the token classification head above the privacy label classification, and the attention mechanism was transformed from (unidirectional) causality to Zonal attention is bidirectional With a range size of 128, giving each token an effective context window of 257 tokens (the token itself plus 128 on each side). thirdit was the converted model Post-training with supervised classification loss – A distinct fine-tuning phase using tagged PII data, separate from the architectural conversion step.
Autoregressive pre-training provides the model with rich linguistic representations learned from far more data and computation than any task-specific budget could support. Architectural transformation allows two-way context, which is essential for NER – a name like “Alice” in the “Alice Smith call” is unambiguous, but with just the left context it can be missed. Subsequent supervised training then assigns those representations to the privacy detection task.
Compared to classical masked language model approaches such as BERT, this is a post-training transformation of the autoregressive model rather than the original masked LM setup – a useful distinction in how underlying representations are formed.
Restricted Viterbi decoding instead of Argmax
The classification system used by the Privacy Filter is BIOES – Beginning, Inside, Outside, End, and Individual. Each of the eight privacy classes gets four boundary-labeled token classes (B-, I-, E-, S-) plus a background class O, resulting in 33 total output classes for each token. For a sequence of length T, the output registers have the form [T, 33].
Instead of taking the argmax for each token over those 33 logit, which can produce incoherent naming sequences such as B- immediately followed by S-, the model runs a constrained Viterbi decoder at inference time. The decoder uses linear chain transition registration and enforces valid BIOES boundary transitions. It records complete naming paths using start, transition, and end terms, as well as… Six transition bias parameters Which specifically controls: background persistence, extension insertion, extension persistence, extension closure, and border-to-boundary handover. This global path optimization improves span coherence and boundary stability by making every token decision based on sequence-level structure, not just local logistics-which is especially valuable in noisy or mixed-format text.
These six transition bias parameters can also be set by the user at runtime. This prompts AI developers to push for broader, more contiguous masking to improve recall, or tighter bounds to improve accuracy, without retraining the model.
Key takeaways
- OpenAI has released a privacy filteran open source PII redaction model within Apache 2.0, is capable of detecting eight sensitive domain classes including
account_number,private_person,secretAnd more – deployable locally without routing data to an external API. - The model has 1.5 billion total parameters but only 50 million are active when inferringthanks to the MoE’s sparse feed design featuring 128 experts and the top 4 directives for each token – making it lightweight enough to run in a browser or on a laptop.
- Backbone is architecturally similar to gpt-oss: 8 pre-standard converter blocks,
d_model=640,Batched query attention using RoPE, and sparse MoE FFN -was pre-trained with initial regression, then transformed into a bidirectional domain attention encoder, and subsequently trained with supervised classification loss. - When inferred, it runs Viterbi constrained decoding via the BIOES naming system Instead of argmax for each token, coherent span bounds are produced with six tunable transition bias parameters that allow engineers to adjust the precision/recall trade-off at runtime without retraining.
verify Typical weights. Also, feel free to follow us on twitter And don’t forget to join us 130k+ ml SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.
Do you need to partner with us to promote your GitHub Repo page, face hug page, product release, webinar, etc.? Contact us