Fastino Labs GLiGuard open source: Safety monitoring model with 300 million parameters matches or exceeds accuracy of models 23-90 times its size


As LLM-powered applications move into production – and as AI agents take on more critical tasks like browsing the web, writing and executing code, and interacting with external services – security oversight has quietly become one of the most operationally expensive parts of the stack.

Most developers who have deployed a production LLM system know the problem: you need to evaluate every user prompt before it reaches the form, and every form response before it reaches the user. This means your guardrail model works on every request, at every turn in the conversation. Cummins handrail compounds. Cost vehicles. The current generation of open source guardrail models – LlamaGuard4 (12B), WildGuard (7B), ShieldGemma (27B), and NemoGuard (8B) – are all decoder-only models with billions of parameters, and are designed for flexibility but not speed.

Fastino Labs has released GLiGuard, an open source security monitoring model with 300 million parameters designed to address exactly this problem. GLiGuard evaluates multiple safety dimensions in a single pass, and across nine safety criteria, its accuracy matches or exceeds models 23 to 90 times its size while operating up to 16 times faster.

https://pioneer.ai/blog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

To understand what makes GLiGuard different, it helps to understand why current guardrail models are so slow. Most major guardrail models are built on decoder-only switch architectures, and they generate their integrity judgments water-wise, one symbol at a time – in the same way that a large language model generates a response to a conversational message.

This design made sense when safety requirements were fluid. Decoder models can interpret natural language task descriptions and adapt to new safety policies without retraining. But generating autoregressive is sequential in nature, which makes it slow and computationally expensive.

There is a compound problem on top of that. Most guardrail models need to evaluate inputs across multiple safety dimensions: what kind of damage is present, whether the user prompt is trying to bypass safety training, whether the model’s response is itself unsafe, and so on. Since decoder models generate outputs sequentially, these evaluations are typically produced one at a time, and latency components are evaluated with further criteria.

In other words, the structure that makes decoder models flexible is also the structure that makes them the wrong tool for what is essentially a classification problem.

What GLiGuard actually does

GLiGuard is a small encoder-based model that reformulates secure supervision as a text classification problem rather than a text generation problem. Encoder models process the entire input at once and output a single classification label for a set of fixed labels, while decoder models generate their output one token at a time, from left to right.

The key architectural insight is how GLiGuard handles multiple tasks simultaneously. Instead of creating tokens, GLiGuard encrypts both the input text and task definitions (labels) together. These are then fed into the model, which scores each mark simultaneously in a single forward pass and returns the mark with the highest score for each task. Since all tasks and their candidate labels are part of the same input, evaluating additional security dimensions does not add latency; It simply means including more labels in the entry.

https://pioneer.ai/blog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

GLiGuard runs four supervision tasks simultaneously in a single forward pass:

  1. Safety rating (Secure/Not Secure) – Applies to both pre-creation user prompts and post-creation form responses.
  2. Uncover your prison escape strategy Across 11 strategies, including immediate injection, role-playing bypass, instruction bypass, and social engineering. If any jailbreak strategy is detected, the claim will be automatically marked as unsafe.
  3. Damage class detection Across 14 categories – violence, sexual content, hate speech, exposure of PII, misinformation, child safety, copyright infringement, and others. A single entry can trigger multiple classes at once.
  4. Rejection detection (Compliance/Rejection), tracked separately to help measure excessive rejection (when the model rejects secure requests) and false compliance detection (when the model appears to be complying but isn’t). If a rejection is detected, the response is automatically marked as secure.

Training data and fine-tuning

GLiGuard is trained on a combination of human-annotated and synthetically generated training data. For rapid safety, response safety, and denial detection, the team used WildGuardTrain, a dataset of 87,000 human-annotated examples. For damage class and jailbreak strategy detection, labels for insecure samples were generated using GPT-4.1.

During early training, the model struggled to differentiate between similar harm categories such as toxic speech and violence, so the team used Pioneer to create supplemental synthetic data with fringe cases targeting these nuances.

On the architectural side, GLiGuard was trained by fully fine-tuning the GLiNER2-base-v1 checkpoint for 20 epochs using the AdamW optimizer. GLiNER2 is Fastino’s proprietary architecture for multi-task text classification – a natural starting point for a model designed to capture multiple label sets in a single pass.

https://pioneer.ai/blog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

Standard results: accuracy and speed

The research team evaluated GLiGuard across nine consistent safety criteria. These criteria cover both fast and responsive rating, testing whether the model can identify malicious content, resist adversarial attacks, differentiate between different types of damage, and avoid over-flagging safe content. The results use the overall mean F1, which is a standard metric that balances precision and recall.

On accuracy:

  • The GLiGuard scored an 87.7 F1 average in the spot rating, within 1.7 points of the best model (PolyGuard-Qwen at 89.4).
  • It achieved the second-highest average F1 response rating (82.7), behind only the Qwen3Guard-8B (84.1).
  • It outperforms LlamaGuard4-12B, ShieldGemma-27B, and NemoGuard-8B despite being 23-90× smaller.
https://pioneer.ai/blog/gliguard-16x-faster-safety-moderation-with-a-small-language-model

Regarding throughput and latency, measured on a single NVIDIA A100 GPU:

  • GLiGuard achieves up to 16.2× higher throughput (133 vs. 8.2 samples/s at batch size 4).
  • GLiGuard achieves 16.6×:26 ms lower latency versus 426 ms at sequence length 64.

These are not marginal improvements. At 26 ms per request versus 426 ms, the difference is meaningful in any real-time user-facing application, and the compounding effect across a multi-turn conversation makes the gap even larger in practice.

Visual explanation of Marktechpost

01 – Overview
What is it GLiGuard?

GLiGuard is open source Moderation model for safety parameter 300M It was released by Fastino Labs on May 12, 2026. It is designed to act as a layer of protection between users and rights management administrators – examining every user claim before it reaches the form and every form response before it reaches the user.

300 m
Parameters – Runs on a single GPU
16x
Faster throughput compared to SOTA decoder guardrails
4
Safety tasks were evaluated in a single forward pass

Apache 2.0
Face hugging
Pioneering reasoning
Cryptographic engineering

02 – The problem
Why does it exist? Handrail Slow

Most production guardrail models – LlamaGuard4, WildGuard, ShieldGemma, and NemoGuard – are built on Decoder-specific converter architectures only. It generates safety judgments waterfall, one token at a time, in the same way that a large language model generates a chat response.

Decryption protection models
Generate judgments Symbol after symbol
Sequential output – Cumin compounds For every task
7B – 27B Required parameters
Expensive to run in real time
Separate permits for each safety dimension
GLiGuard (encryption)
Processes the entire input Once
All assignments were evaluated One pass forward
300 m parameters
Single GPU deployment
More dimensions = no extra latency

03 – Architecture
Single lane. Multiple tasks.

GLiGuard reframes safety oversight as… Text classification problemit is not a problem to create the text. It encodes the input text and all important definitions (labels) together, then records each label simultaneously in a single forward pass. Adding more security dimensions does not increase latency – it simply means more labels in the input.

Basic model: Tuned from GLiNER2-base-v1 Checkpointing with full fine-tuning for 20 epochs using the AdamW optimizer. Training data: 87,000 human-annotated examples from WildGuardTrain, as well as synthetic edge state data generated via GPT-4.1 and Pioneer to fine-tune damage classes.

04 – Capabilities
4 moderation tasks in Single lane

01

Safety rating – safe/unsafe
It applies to both pre-creation user prompts and post-creation form responses.

02

Prison escape strategy revealed – 11 strategies
Detects instant injection, role-playing bypass, instruction bypass, social engineering, and more. Any automatically detected strategy flags the router as insecure.

03

Damage category detection – 14 categories
Violence, sexual content, hate speech, exposure of personally identifiable information, misinformation, child safety, copyright infringement, and more. A single entry can trigger multiple categories.

04

Rejection Detection – Compliance/Rejection
Tracks excessive denials (rejecting secure requests) and false compliance. A detected denial causes the response to be automatically marked as secure.

05 – Standards
Accuracy vs. Much larger models
Evaluated across 9 safety criteria using overall average F1. Speed ​​measured on a single NVIDIA A100 GPU.

Instant rating – average. F1

26 ms
Cummins in the following. Length 64 (vs. 426 ms for ShieldGemma-27B)
133
Samples/throughput per second at batch size 4

06 – Start
to publish GLiGuard today

At 300 million parameters, GLiGuard runs on Single GPU It can be tuned for domain-specific use cases without requiring heavy infrastructure. Weights are available on the face hugging under Apache License 2.0. Managed inference is available on Pioneer.

Form ID
fastino/gliguard-LLMGuardrails-300M

Instant safety
Response safety
Jailbreak detection
Damage classification
Rejection detection
Single GPU

Key takeaways

  • GLiGuard is a security supervision model based on the 300M encryption tool that handles four tasks – security classification, jailbreak detection, damage classification, and denial detection – in a single forward pass.
  • Unlike decoder-only guardrail models that generate judgments retrospectively, GLiGuard reformulates secure supervision as a text classification problem, eliminating the sequential latency bottleneck.
  • GLiGuard has been benchmarked on a single NVIDIA A100 GPU, and achieves up to 16.2x higher throughput and 16.6x lower latency (26ms vs. 426ms) compared to current SOTA models like the ShieldGemma-27B.
  • Across nine safety measures, GLiGuard scored an F1 average of 87.7 in the Immediate rating and 82.7 in the Response rating – outperforming the LlamaGuard4-12B, ShieldGemma-27B and NemoGuard-8B despite being 23-90× smaller.
  • Model weights under Apache 2.0 are available on Hugging Face (fastino/gliguard-LLMGuardrails-300M), making it deployable on a single GPU without heavy infrastructure.

verify paper, Typical weights at high frequency, GitHub repo and Technical details. Also, feel free to follow us on twitter Don’t forget to join us 150k+ mil SubReddit And subscribe to Our newsletter. I am waiting! Are you on telegram? Now you can join us on Telegram too.

Do you need to partner with us to promote your GitHub Repo page, face hug page, product release, webinar, etc.? Contact us


Leave a Reply