Why AI-Generated Text Detection Fails | Evidence from Explainable AI

Abstract

As Large Language Models (LLMs) become widely used in education and online communication, distinguishing AI-generated text from human writing has emerged as a critical challenge. While many detection systems report high benchmark accuracy, their real-world reliability remains uncertain, particularly in high-stakes settings such as academic assessment.

This paper investigates whether modern detectors truly identify machine authorship or instead learn dataset-specific artefacts. We introduce an interpretable detection framework that combines linguistic feature engineering, classical machine learning, and explainable AI. Across two major benchmark corpora (PAN-CLEF 2025 and COLING 2025), models trained on 38 linguistic features achieve leaderboard-competitive performance, reaching an F1 score of 0.9734 without relying on huge LLMs.

However, performance drops substantially under cross-domain and cross-generator evaluation. Using SHAP-based explanations, we show that the most influential features differ markedly between datasets, indicating that detectors often rely on corpus-specific stylistic cues rather than stable signals of machine authorship. Ensemble models partially improve robustness but do not eliminate this effect.

These findings show that benchmark accuracy is not reliable evidence of authorship detection: strong in-domain performance can coincide with substantial failure under domain and generator shift. We argue that explainability should be treated not only as a transparency aid, but as a validity diagnostic that reveals what evidence a detector is actually using. In educational and other high-stakes contexts, detectors should therefore be used only as probabilistic, explainable support for human judgment and never as an automated basis for punitive decisions. To support replication and practical use, we release an open-source Python package that returns both predictions and instance-level explanations for individual texts.

Interactive Demo

SHAP Explanations in Action

See how our explainable classifier analyzes real text samples, providing transparent insights into its predictions.

🤖 AI-Generated Text

Detected as AI

Input Text

Artificial intelligence has revolutionized numerous industries, transforming the way we approach complex problems and develop innovative solutions. Machine learning algorithms, in particular, have demonstrated remarkable capabilities in pattern recognition, natural language processing, and predictive analytics. These technological advancements continue to reshape our understanding of what machines can accomplish, pushing the boundaries of automation and intelligent systems. The integration of AI into everyday applications has become increasingly seamless, offering enhanced efficiency and unprecedented opportunities for growth across various sectors.

87.3%

AI Confidence

Features

23/15

AI/Human

Top SHAP Contributors

pos_diversity

→ AI +1.06

pronoun_ratio

→ AI +2.67

sentence_length_var

→ AI +0.71

sentiment_subj

→ AI +1.02

stopword_count

→ AI +1.41

hapax_ratio

← Human -1.55

punctuation_count

← Human -1.72

flesch_reading

← Human -0.63

type_token_ratio

← Human -0.51

predictability

→ AI +0.58

paragraph_count

← Human -0.58

specificity_score

← Human -0.64

emotion_variation

→ AI +0.33

👤 Human-Written Text

Detected as Human

Input Text

I still remember the first time I tried to bake bread from scratch – what a disaster! Flour everywhere, dough that wouldn't rise, and a final product that could've doubled as a doorstop. But you know what? I kept at it. There's something deeply satisfying about kneading dough with your own hands, watching it transform. My grandmother always said the secret was patience (and maybe a bit of cursing under your breath when things go wrong). Now, years later, the smell of fresh bread fills my kitchen every Sunday morning. It's not perfect, but it's mine.

91.2%

Human Confidence

Features

14/24

AI/Human

Top SHAP Contributors

hapax_ratio

← Human -2.31

personal_voice

← Human -1.87

punctuation_count

← Human -1.65

modal_freq

← Human -1.42

type_token_ratio

← Human -1.18

paragraph_coherence

← Human -0.94

emotion_variation

← Human -0.78

sentence_complexity

→ AI +0.52

word_entropy

← Human -0.61

discourse_markers

← Human -0.55

hedge_uncertainty

← Human -0.48

repetition_rate

← Human -0.42

pos_diversity

→ AI +0.38

Open Source

Try It Yourself

Install our Python package to analyze your own text with full SHAP explainability.

📦 explain-ai-generated-text v0.1.3

                        # Installation
pip install explain-ai-generated-text
                    

                        # Usage
from xai import shap_explainer

text = "Your text to analyze goes here..."
result = shap_explainer(text)

# Get prediction
print(result['prediction'])  # 'AI-generated' or 'Human-written'

# Explore feature explanations
for feature, data in result.items():
    if feature != 'prediction':
        print(f"{feature}: SHAP={data['SHAP_Value']:.3f}, Direction={data['Direction']}")