Research Paper

Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

An interpretable framework combining linguistic features, machine learning, and SHAP explanations for transparent AI text detection.

Shushanta Pudasaini Technological University Dublin D23129142@mytudublin.ie
Luis Miralles-PechuΓ‘n Technological University Dublin luis.miralles@TUDublin.ie
David Lillis University College Dublin david.lillis@ucd.ie
Marisa Llorens Salvador Technological University Dublin marisa.llorens@TUDublin.ie

As Large Language Models (LLMs) become widely used in education and online communication, distinguishing AI-generated text from human writing has emerged as a critical challenge. While many detection systems report high benchmark accuracy, their real-world reliability remains uncertain, particularly in high-stakes settings such as academic assessment.

This paper investigates whether modern detectors truly identify machine authorship or instead learn dataset-specific artefacts. We introduce an interpretable detection framework that combines linguistic feature engineering, classical machine learning, and explainable AI. Across two major benchmark corpora (PAN-CLEF 2025 and COLING 2025), models trained on 38 linguistic features achieve leaderboard-competitive performance, reaching an F1 score of 0.9734 without relying on huge LLMs.

However, performance drops substantially under cross-domain and cross-generator evaluation. Using SHAP-based explanations, we show that the most influential features differ markedly between datasets, indicating that detectors often rely on corpus-specific stylistic cues rather than stable signals of machine authorship. Ensemble models partially improve robustness but do not eliminate this effect.

These findings show that benchmark accuracy is not reliable evidence of authorship detection: strong in-domain performance can coincide with substantial failure under domain and generator shift. We argue that explainability should be treated not only as a transparency aid, but as a validity diagnostic that reveals what evidence a detector is actually using. In educational and other high-stakes contexts, detectors should therefore be used only as probabilistic, explainable support for human judgment and never as an automated basis for punitive decisions. To support replication and practical use, we release an open-source Python package that returns both predictions and instance-level explanations for individual texts.

SHAP Explanations in Action

See how our explainable classifier analyzes real text samples, providing transparent insights into its predictions.

πŸ€– AI-Generated Text
Detected as AI
Input Text
Artificial intelligence has revolutionized numerous industries, transforming the way we approach complex problems and develop innovative solutions. Machine learning algorithms, in particular, have demonstrated remarkable capabilities in pattern recognition, natural language processing, and predictive analytics. These technological advancements continue to reshape our understanding of what machines can accomplish, pushing the boundaries of automation and intelligent systems. The integration of AI into everyday applications has become increasingly seamless, offering enhanced efficiency and unprecedented opportunities for growth across various sectors.
87.3%
AI Confidence
38
Features
23/15
AI/Human
Top SHAP Contributors
pos_diversity
β†’ AI +1.06
pronoun_ratio
β†’ AI +2.67
sentence_length_var
β†’ AI +0.71
sentiment_subj
β†’ AI +1.02
stopword_count
β†’ AI +1.41
hapax_ratio
← Human -1.55
punctuation_count
← Human -1.72
flesch_reading
← Human -0.63
type_token_ratio
← Human -0.51
predictability
β†’ AI +0.58
paragraph_count
← Human -0.58
specificity_score
← Human -0.64
emotion_variation
β†’ AI +0.33
πŸ‘€ Human-Written Text
Detected as Human
Input Text
I still remember the first time I tried to bake bread from scratch – what a disaster! Flour everywhere, dough that wouldn't rise, and a final product that could've doubled as a doorstop. But you know what? I kept at it. There's something deeply satisfying about kneading dough with your own hands, watching it transform. My grandmother always said the secret was patience (and maybe a bit of cursing under your breath when things go wrong). Now, years later, the smell of fresh bread fills my kitchen every Sunday morning. It's not perfect, but it's mine.
91.2%
Human Confidence
38
Features
14/24
AI/Human
Top SHAP Contributors
hapax_ratio
← Human -2.31
personal_voice
← Human -1.87
punctuation_count
← Human -1.65
modal_freq
← Human -1.42
type_token_ratio
← Human -1.18
paragraph_coherence
← Human -0.94
emotion_variation
← Human -0.78
sentence_complexity
β†’ AI +0.52
word_entropy
← Human -0.61
discourse_markers
← Human -0.55
hedge_uncertainty
← Human -0.48
repetition_rate
← Human -0.42
pos_diversity
β†’ AI +0.38

Try It Yourself

Install our Python package to analyze your own text with full SHAP explainability.

πŸ“¦ explain-ai-generated-text v0.1.3
# Installation pip install explain-ai-generated-text
# Usage from xai import shap_explainer text = "Your text to analyze goes here..." result = shap_explainer(text) # Get prediction print(result['prediction']) # 'AI-generated' or 'Human-written' # Explore feature explanations for feature, data in result.items(): if feature != 'prediction': print(f"{feature}: SHAP={data['SHAP_Value']:.3f}, Direction={data['Direction']}")