A guardrail framework for safe and secure interaction with Large Language Models
OpenGuard is a guardrail framework that enforces user-defined rules to ensure safe and secure interactions with Large Language Models (LLMs). OpenGuard allows detecting factual hallucination, instances where generated outputs deviate from factual truth.
The OpenGuard hallucination detection checker integrates Beam Search Sampling (BSS) with Semantic Consistency Analysis to systematically identify hallucinations. BSS generates multiple candidate responses, capturing the model’s confidence distribution across different plausible answers. These responses are then clustered based on semantic similarity, followed by Natural Language Inference (NLI) to evaluate entailment and contradiction relationships.
To quantify hallucinations, we introduce a scoring mechanism that combines token probabilities with semantic similarity metrics, offering a more precise measure of factual consistency. In cases where Beam Search Sampling (BSS) produces only a single response, we employ a Chain-of-Verification (CoVe) mechanism to enhance self-consistency checks.
OpenGuard provides a structured and reliable methodology to enhance the trustworthiness of LLM-generated content, making it an essential tool for responsible LLM deployment.
All available downloads can be found under DepAI/OpenGuard.
The current documentation is available at DepAI/OpenGuard.