Azure AI Content Safety
How Does Azure AI Content
Safety Work?
Azure AI Content Safety
is designed to work with both text and images, as well as content generated by
AI.
It can identify and
moderate inappropriate material. The visual capabilities of Content Safety are
driven by Microsoft's Florence foundation model, which has been trained on
billions of pairs of text and images.
The analysis of text
employs natural language processing methods to enhance the understanding of
subtlety and context. Azure AI Content Safety supports multiple languages and
is capable of recognizing harmful content in both short and long formats. It is
currently available in English, German, Spanish, French, Portuguese, Italian,
and Chinese.
Azure AI Content Safety
features include:
Safeguarding Text Content
- Moderate text
scans text across four categories: violence, hate speech, sexual content, and
self-harm. A severity level from 0 to 6 is returned for each category. This
level helps to prioritize what needs immediate attention by people, and how
urgently. You can also create a blocklist to scan for terms specific to your
situation.
- Prompt shields
is a unified API to identify and block jailbreak attacks from inputs to LLMs.
It includes both user input and documents. These attacks are prompts to LLMs
that attempt to bypass the model's in-built safety features. User prompts are
tested to ensure the input to the LLM is safe. Documents are tested to ensure
they don't contain unsafe instructions embedded within the text.
- Protected material detection
checks AI-generated text for protected text such as recipes, copyrighted song
lyrics, or other original material.
- Groundedness detection
protects against inaccurate responses in AI-generated text by LLMs. Public LLMs
use data available at the time they were trained. However, data can be
introduced after the original training of the model or be built on private
data. A grounded response is one where the model’s output is based on the
source information. An ungrounded response is one where the model's output
varies from the source information. Groundedness detection includes a reasoning
option in the API response. This adds a reasoning field that explains any
ungroundedness detection. However, reasoning increases processing time and
costs.
Safeguarding Image Content
- Moderate images scans
for inappropriate content across four categories: violence, self-harm, sexual,
and hate. A severity level is returned: safe, low, or high. You then set a
threshold level of low, medium, or high. The combination of the severity and
threshold level determines whether the image is allowed or blocked for each
category.
- Moderate multimodal
content scans both images and text, including text extracted
from an image using Optical Character Recognition (OCR). Content is analyzed
across four categories: violence, hate speech, sexual content, and self-harm.
Custom Safety Solutions
- Custom categories
enables you to create your own categories by providing positive and negative
examples, and training the model. Content can then be scanned according to your
own category definitions.
- Safety system message
helps you to write effective prompts to guide an AI system's behavior.
Limitations
Azure AI Content Safety
employs AI algorithms, which means it might not consistently identify
inappropriate language. Occasionally, it may also block acceptable language due
to its reliance on algorithms and machine learning for identifying problematic
language.
It is essential to test
and assess Azure AI Content Safety using real data prior to deployment. After
deployment, it is crucial to keep monitoring the system to evaluate its
performance accuracy.
Conclusion
We have successfully learnt
about AI Content Safety features.
Comments
Post a Comment