Artificial Intelligence (AI) tools are increasingly being used for written deliverables in a wide variety of domains. In some cases, the recipient of the deliverables wants to ensure that the content was written by a human rather than an AI tool, e.g., ensuring assignments were completed by students, product reviews written by actual customers, etc. This creates a demand for AI detection tools that minimize two key statistics: the False Negative Rate (FNR), which corresponds to the proportion of AI-generated text that is falsely classified as human, and the False Positive Rate (FPR), which corresponds to the proportion of human-written text that is falsely classified as AI-generated. We evaluate four commercial and open-source AI-text detectors—Pangram, OriginalityAI, GPTZero and RoBERTa—on these dimensions using a large corpus of human and AI-generated text that spans across topics, length, and AI models. First, we find that detectors vary in their capacity to minimize FNR and FPR, with the commercial detectors outperforming open-source. Second, most commercial AI detectors perform remarkably well, with Pangram in particular achieving a near zero FPR and FNR within our set of stimuli; these results are stable across AI models. Third, while Pangram’s performance largely holds up on very short passages (< 50 words) and is robust to “humanizer” tools (e.g., StealthGPT), the performance of other detectors becomes case-dependent. Finally, we consider the implementation of detectors as policy, noting that a policy designer faces a trade-off between maximizing the probability of detecting true AI-generated text while minimizing the risk of false accusations. Given this tradeoff, we propose an evaluation metric that uses policy caps—a scale-free, detector-independent measure that corresponds to the designer’s tolerance for false positives or negatives—to compare detectors. Using this metric, we show that Pangram is the only detector that meets a stringent policy cap (FPR ≤ 0.005) without compromising the ability to accurately detect AI text.