Ai Content Detector Faqs
Ai Content Detector Faqs
Ai Content Detector Faqs
Detector FAQs
Last updated 02/09/2024
How It Works
Page 2
sales@copyleaks.com copyleaks.com
sales@copyleaks.com
From ensuring cyber compliance to preventing copyright infringement, knowing what content is
human-created and what was generated by AI is crucial. Featured here are key questions we frequently
get asked regarding our AI Content Detector.
How It Works
For example:
• Credible data at scale, coupled with machine learning and widespread adoption, allows us to continually
refine and improve our ability to understand complex text patterns, resulting in over 99% accuracy—far
higher than any other AI content detector—and improving daily.
• As an enterprise-based platform, we’re the only solution with seamless API and LMS integrations,
allowing you to bring the power of AI Content Detector directly to your native platform and at scale.
• By looking at the individual paragraph and sentence level, our detection report highlights the specific
elements of the text potentially written by AI and provides a confidence level.
How It Works 2
sales@copyleaks.com
Since 2015, we’ve collected, ingested, and analyzed trillions of crawled and user-sourced content pages
from thousands of universities and enterprises worldwide to train our models to understand how humans
write as opposed to AI.
Also, utilizing AI technology, our AI detector can accurately recognize the presence of other AI-generated
text and the signals it leaves behind, adding an additional layer of accuracy.
• Our detection and the algorithms that power it are designed for detecting human-generated text
versus AI-generated text. Detecting AI text tends to give a lower accuracy and increases the likelihood
of false positives.
• To help accelerate our learning and refine the models used, we implemented a feedback loop where
users can rate the accuracy of the results, which allows us to continually use examples of false positives,
rare as they may be, to improve.
• We only introduce new model detection after thorough testing. Once our internal testing reaches
a high confidence threshold, we leverage beta testers, giving an additional layer of assurance.
What models can you detect, and what’s the accuracy of each?
As of February 1, 2024, we can detect the following models:
• Gemini
• Jasper 3
• T5
Using English text, the detection accuracy of each varies slightly from model to model, though each is
above 98.0%.
There’s a chance that you may encounter slightly different results, given the type of content being tested.
Accordingly, we suggest conducting several tests to determine the success rate for your specific
content type.
How It Works 3
sales@copyleaks.com
At the moment, English has the highest accuracy at 99.1%. We continue to develop our models to increase
the accuracy across other supported languages, and there are plans to introduce accurate detection across
dozens of additional languages.
Microsoft Teams only offers a student-view integration with no separate teacher view. Since our integrations
are integral to educators and students, we currently do not offer Teams integration.
How It Works 4
sales@copyleaks.com
Understanding
the Results
It’s important to remember, if you get an AI alert, not to assume immediately. AI content detectors can
offer a lot of insight and data to encourage essential conversations in classrooms and boardrooms alike to
determine the rules and regulations around AI. When your report states that AI content was found, take the
time to investigate further. Again, the data provided by AI detectors should be used to inform the situation
and offer the option for a learning opportunity and alignment on expectations.
Can you detect mixed text where human-created text has been amended with
AI-generated text?
Yes. Our detection report highlights the specific elements of the text written by a human and those written
by AI, even if the text has been interspersed.
How do the Similarity Score and AI content detection percentage differ? Are
these completely distinct metrics, or is there an overlap in calculating them?
The Similarity Score shows the percentage of text in a document that matches other online sources or
sources stored in our internal database. It factors in identical text, minor changes, and paraphrasing.
The AI detection percentage is different. It estimates the total content in a document that generative AI may
have created. The AI percentage does not influence the Similarity Score calculation, nor does the Similarity
Score change the AI percentage. They are independent metrics produced by separate analyses.
Will the AI detection be a different workflow than how we currently work with
the Copyleaks report?
The workflow will remain the same. The only change will be within the report, where you will see a section
for AI content detection alerts. Additionally, you can choose how the AI content alert is shown if you are
working with the API.
Does adding AI content detection alter how I utilize the Similarity Report?
No, your workflow and use of the Similarity Report will not change with the addition of AI detection. It does
not impact how you interpret or act on the Similarity Report. The core functionality and value of the report
remain the same. The AI percentage notification provides supplemental information but does not modify
the Similarity Score or how you leverage the report.
Detection Capabilities
& Limitations
I’ve heard that AI Detection is just vaporware/snake oil. Is that true?
Not at all. Generative AI is still in its infancy, as is the research around AI Detection. Researchers from
The University of Kansas developed their own tool to detect AI in academic writing with verifiable results.
Moreover, they established the GPABenchmark, which helps codify the measurement of AI detectors. Their
paper, Check Me if You Can, lays out some early research showing that AI Detection can be effective while
warning that not all are created equal.
Around the same time, another study from the Department of Computer Science, University of Maryland,
claimed that AI could not be reliably detected. When Copyleakes reviewed the paper, we tested their
examples of AI text that was supposed to be undetectable and predicted with high confidence that their
test examples were indeed AI. See graphic below.
However, we are always developing new features for future product updates.
Here are a couple of commonly used tools that can potentially be flagged as AI:
Language Translators: Tools that translate large bodies of text while maintaining the integrity of the initial
content. These tools often utilize large language models to generate their translations and, as a result, can
be potentially flagged as AI.
Grammar and Spelling Tools: While these tools, in general, do not get flagged as AI, some features of these
tools, such as GrammarlyGO and Wordtune., utilize AI for long-form auto-complete, paraphrasing, etc., and
therefore can be detected as AI.
Can you detect if the text has been put through a spinner? And what if the text
contains intentional typos?
Our latest model can detect paraphrased content, including a high rate of confidence regarding content
that has been put through a text spinner and intentional typos.
Nevertheless, there’s always room for improvement. Therefore, we continue to improve the model in
real-time, increasing accuracy for paraphrasing, text spinners, and intentional typos on a regular cadence.
• Generally speaking, the accuracy of our detection increases as the text length increases. Accordingly,
we suggest testing text containing an average of 350 words.
• The accuracy of creative writing, including poems and song lyrics, is typically lower than other types
of content. We continue to train our models to ensure high accuracy across all types of content.
• At the moment, English has the highest accuracy. With additional text ingestion and model training,
the accuracy across all supported languages will continue to improve.
• While false positives are exceedingly rare (0.2%), AI-generated text has a higher rate of registering
as human-created text. As we continue to train the models, the rate of false negatives will continue
to improve.
The ideal text requirements for each of our AI offerings are as follows:
Will Copyleaks be able to detect newer models that will come out?
Yes. Thanks to machine learning, we train the system to detect new models accurately once released.
• Continued accuracy improvements for detecting AI text that has gone through a text spinner or
otherwise been manipulated (i.e., including deliberate typos).
We’ll continue to monitor the landscape and closely listen to user feedback to ensure we stay one step
ahead of AI content generators and provide the most accurate results possible.
sales@copyleaks.com copyleaks.com