Clavata.ai Partners with NVIDIA to Strengthen AI Safety with Guardrails Integration

Enhancing AI Safety: Clavata’s Industry Leading Content Safety Technology Now Available in Nvidia NeMo Guardrails

As large language models (LLMs) become increasingly embedded in customer-facing applications, ensuring that these systems operate safely, consistently, and in alignment with platform guidelines has become a priority. To address this, Clavata has integrated its industry leading content safety technology with NVIDIA NeMo Guardrails, an open-source toolkit designed to help developers implement policy-compliant, reliable conversational AI.

This integration allows developers to define precise and customizable content safety policies and enforce them in real time. The result is a more consistent and proactive approach to AI safety, tailoring agent interactions to customer-specific standards, while reducing the risk of over-censoring their solution with erroneous, false positives.

Overview of NVIDIA NeMo Guardrails

NVIDIA NeMo Guardrails is an open source, enterprise grade programmable framework for managing compliance and data privacy standards, including PII detection, hallucination filtering, RAG enforcement, dialogue safety, topic control, and jailbreak prevention in LLM applications. It integrates easily with existing LLM pipelines and supports frameworks like LangChain and LlamaIndex. By evaluating both user prompts and model outputs, developers can ensure that conversations remain within acceptable boundaries.

Guardrails are defined in CoLang, a purpose-built scripting language for conversational flow control. Developers can create “rails” that define acceptable behavior, including fallback logic when unsafe content is detected.

Clavata Integration: Early Risk Detection and Policy Enforcement

Clavata’s integration with NeMo Guardrails extends this framework by enabling concept-based text classification with customizable policy enforcement. Content is evaluated in real-time against user-defined policies, written with structured syntax on Clavata’s platform and the result is used to steer the AI agent's conversational flow.

Evaluations occur before model execution, allowing platforms to stop risky or inappropriate content early, saving on unnecessary API calls and preventing downstream violations. This proactive enforcement streamlines moderation workflows, reduces cost, and improves safety without sacrificing speed or scalability.

While NeMo Guardrails offers a library of built-in safeguards, with Clavata guardrails can be easily customized to meet business-specific requirements, without users needing to rely on closed, “black box” and inaccurate classifiers or “LLM prompt engineering” which often falls short of meeting key benchmarks like accuracy, speed and cost. Integration with Clavata gives teams the ability to:

Define custom guardrail policies tailored to evolving business needs
Test policies against real data, with the ability to quickly update rules in realtime
Maintain separate input and output policies, enabling context-aware control across different stages of a conversation

This is especially powerful in conversational AI, where language evolves rapidly and context is updated in real time. Clavata’s policy-based approach gives teams the agility to adapt on the fly, unlike black-box systems from third-party APIs or even open-source models, where definitions of concepts like nudity or hate speech are fixed and opaque. With Clavata, customers can define and enforce their own standards, aligned with their unique use cases.

Example Use Cases

Scenario 1: AI image generation company (family friendly policies)

AI Image generation is a powerful tool, but can easily be abused. Prompts must follow community standards and avoid producing violative or harmful images.

User prompt: “((masterpiece)), ((best quality)), (ultra-detailed), ((kawaii)), cute, (lovely), ((sexy)), (ero), ((extremely detailed)), 4K, (8K), best quality, (beautiful), illustration, full body, ocean, ((spread legs)), ((squatting)), ((upskirt)), underwear, beach, beautiful black hair, beautiful green eyes, blush,(more_detail 0.8), detail background,(more_detail 0.8), detail background, Brooke Shields”

Input Guardrail check to Clavata: Policy Match. Input prompt violates safety policy.

Guardrail outcome: False - Result is returned as a fail back to the client so that a response can be given to the user.

AI Agent response: “Sorry, this prompt violates Community Standards”

Scenario 2: An adult AI chatbot company

Adult chatbots must follow not only the site's policy but local laws, all while staying in character and delivering believable dialogue.

User: “You look a little young, what’s a middle schooler doing here?”

Input Guardrail check to Clavata: Policy Match. Underage input prompt detected.

Guardrail outcome: False - Clavata returns the info to the Chatbot which pivots the conversation towards something that is allowable on the site.

AI Agent response: “I am unable to represent characters who are underage.”

Scenario 3: A customer service bot

Customer service bots must stay on track and deliver accurate information to customers.

User: “I was wondering if I could request a refund for my airline tickets?”

Input Guardrail check to Clavata: Policy Match. Asking to issue a refund violates the current policy.

Guardrail outcome: False - Bot should not respond affirmatively to user and redirect to phone support.

AI Agent response: “Sorry, I cannot process refunds. Please contact 800-555-1234 for assistance.”

Integration Details

Clavata is invoked as part of the CoLang-defined conversational flow. When a user submits a prompt, the flow determines which policy applies and sends the content to Clavata for evaluation. The result is then used to conditionally advance or block the interaction.

Developers can implement this logic using Clavata’s API and a single CoLang call clavata_check_action as part of their Guardrails configuration. The integration works out of the box with minimal changes to existing conversational architectures. For details, please refer to the official documentation.

Charting a Safer Future for AI Together

Clavata is deeply committed to its partnership with Nvidia and has previously benchmarked its safety product using Nvidia’s Aegis AI Content Safety Dataset 2.0. To read more about that report, which highlights Clavata’s high-accuracy detection capabilities, please see this technical post on Clavata’s website. We're excited to extend the boundaries of what secure, trustworthy AI can achieve, ensuring a safer digital landscape for everyone.

To learn more about Clavata, visit clavata.ai or contact hello@clavata.ai

Get to know Clavata.
Let’s chat.

See how we tailor trust and safety to your unique use cases. Let us show you a demo or email us your questions at hello@clavata.ai

Schedule a demo