Artificial Writing and Automated Detection

Artificial Intelligence (AI) tools are increasingly being used for written deliverables in a wide variety of domains. In some cases, the recipient of the deliverables wants to ensure that the content was written by a human rather than an AI tool, e.g., ensuring assignments were completed by students, product reviews written by actual customers, etc. This creates a demand for AI detection tools that minimize two key statistics: the False Negative Rate (FNR), which corresponds to the proportion of AI-generated text that is falsely classified as human, and the False Positive Rate (FPR), which corresponds to the proportion of human-written text that is falsely classified as AI-generated. We evaluate four commercial and open-source AI-text detectors—Pangram, OriginalityAI, GPTZero and RoBERTa—on these dimensions using a large corpus of human and AI-generated text that spans across topics, length, and AI models. First, we find that detectors vary in their capacity to minimize FNR and FPR, with the commercial detectors outperforming open-source. Second, most commercial AI detectors perform remarkably well, with Pangram in particular achieving a near zero FPR and FNR within our set of stimuli; these results are stable across AI models. Third, while Pangram’s performance largely holds up on very short passages (< 50 words) and is robust to “humanizer” tools (e.g., StealthGPT), the performance of other detectors becomes case-dependent. Finally, we consider the implementation of detectors as policy, noting that a policy designer faces a trade-off between maximizing the probability of detecting true AI-generated text while minimizing the risk of false accusations. Given this tradeoff, we propose an evaluation metric that uses policy caps—a scale-free, detector-independent measure that corresponds to the designer’s tolerance for false positives or negatives—to compare detectors. Using this metric, we show that Pangram is the only detector that meets a stringent policy cap (FPR ≤ 0.005) without compromising the ability to accurately detect AI text.

View Working Paper View on SSRN View Research Brief

Related People

Alex Imas

Related Insights

Artificial Writing and Automated Detection

Research Briefs

BFI Data Studio

Podcasts

Videos

Upcoming Events

Fall 2025 Behavioral Economics Seminar Series

2025 International Macro Finance Conference

Fall 2025 Trade and Spatial Afternoons Seminar Series