What is The Fallacy of High Accuracy?
Mathematical Foundation
Laws & Principles
- Precision (Positive Predictive Value): When the model screams 'Fraud!', how often is it actually right? If Precision is high, the model's alarms are highly trustworthy.
- Recall (Sensitivity): Out of all the actual fraud occurring, how much of it did the model manage to catch? If Recall is high, the model casts a wide net and misses very few anomalies.
- The Tension (F1): You can artificially boost Recall to 100% by flagging every single user as 'Fraud,' but you will destroy your Precision. You can boost your Precision to 100% by only flagging the one hyper-obvious fraudster a year, but you destroy your Recall. The F1 Score uses a Harmonic Mean to mathematically force both metrics to be high simultaneously to achieve a good score.
Step-by-Step Example Walkthrough
" Analyzing an AI Cancer screening model on a test set of 1,000 patients. 100 patients actually have cancer, 900 are healthy. The model labels 80 out of the 100 correctly (True Positive), but falsely flags 50 healthy people as having cancer (False Positive). "
- 1. Map the inputs: TP = 80, FP = 50, FN = 20, TN = 850.
- 2. Calculate Overall Accuracy: (80 + 850) / 1000 = 93%. The hospital thinks the model is incredible.
- 3. Calculate Precision: 80 / (80 + 50) = 61.5%. When the AI scares a patient with a cancer diagnosis, it is wrong nearly 40% of the time.
- 4. Calculate Recall: 80 / (80 + 20) = 80.0%. The AI correctly identifies 80% of patients who actually need help.
- 5. Calculate F1 Score: 2 × (0.615 × 0.80) / (0.615 + 0.80) = 69.5%