World Library  
Flag as Inappropriate
Email this Article

Confusion matrix

Article Id: WHEBN0000847558
Reproduction Date:

Title: Confusion matrix  
Author: World Heritage Encyclopedia
Language: English
Subject: Matthews correlation coefficient, Sensitivity and specificity, Confusion matrix terms, Statistical classification, Diagnostic odds ratio
Collection: MacHine Learning, Statistical Classification
Publisher: World Heritage Encyclopedia
Publication
Date:
 

Confusion matrix

Terminology and derivations
from a confusion matrix
true positive (TP)
eqv. with hit
true negative (TN)
eqv. with correct rejection
false positive (FP)
eqv. with false alarm, Type I error
false negative (FN)
eqv. with miss, Type II error

sensitivity or true positive rate (TPR)
eqv. with hit rate, recall
\mathit{TPR} = \frac {\mathit{TP}} {P} = \frac {\mathit{TP}} {\mathit{TP}+\mathit{FN}}
specificity (SPC) or true negative rate (TNR)
\mathit{SPC} = \frac {\mathit{TN}} {N} = \frac {\mathit{TN}} {\mathit{FP} + \mathit{TN}}
precision or positive predictive value (PPV)
\mathit{PPV} = \frac {\mathit{TP}} {\mathit{TP} + \mathit{FP}}
negative predictive value (NPV)
\mathit{NPV} = \frac {\mathit{TN}} {\mathit{TN} + \mathit{FN}}
fall-out or false positive rate (FPR)
\mathit{FPR} = \frac {\mathit{FP}} {N} = \frac {\mathit{FP}} {\mathit{FP} + \mathit{TN}} = 1 - \mathit{SPC}
false discovery rate (FDR)
\mathit{FDR} = \frac {\mathit{FP}} {\mathit{FP} + \mathit{TP}} = 1 - \mathit{PPV}
miss rate or false negative rate (FNR)
\mathit{FNR} = \frac {\mathit{FN}} {P} = \frac {\mathit{FN}} {\mathit{FN} + \mathit{TP}}

accuracy (ACC)
\mathit{ACC} = \frac {\mathit{TP} + \mathit{TN}} {P + N}
F1 score
is the harmonic mean of precision and sensitivity
\mathit{F1} = \frac {2 \mathit{TP}} {2 \mathit{TP} + \mathit{FP} + \mathit{FN}}
Matthews correlation coefficient (MCC)
\frac{ TP \times TN - FP \times FN } {\sqrt{ (TP+FP) ( TP + FN ) ( TN + FP ) ( TN + FN ) } }

Informedness = Sensitivity + Specificity - 1
Markedness = Precision + NPV - 1

Sources: Fawcett (2006) and Powers (2011).[1][2]

In the field of machine learning, a confusion matrix, also known as a contingency table or an error matrix [3] , is a specific table layout that allows visualization of the performance of an algorithm, typically a supervised learning one (in unsupervised learning it is usually called a matching matrix). Each column of the matrix represents the instances in a predicted class while each row represents the instances in an actual class (or vice-versa).[2] The name stems from the fact that it makes it easy to see if the system is confusing two classes (i.e. commonly mislabeling one as another).

Contents

  • Example 1
  • Table of confusion 2
  • See also 3
  • References 4
  • External links 5

Example

If a classification system has been trained to distinguish between cats, dogs and rabbits, a confusion matrix will summarize the results of testing the algorithm for further inspection. Assuming a sample of 27 animals — 8 cats, 6 dogs, and 13 rabbits, the resulting confusion matrix could look like the table below:

Predicted
Cat Dog Rabbit
Actual
class
Cat 5 3 0
Dog 2 3 1
Rabbit 0 2 11
In this confusion matrix, of the 8 actual cats, the system predicted that three were dogs, and of the six dogs, it predicted that one was a rabbit and two were cats. We can see from the matrix that the system in question has trouble distinguishing between cats and dogs, but can make the distinction between rabbits and other types of animals pretty well. All correct guesses are located in the diagonal of the table, so it's easy to visually inspect the table for errors, as they will be represented by values outside the diagonal.

Table of confusion

In predictive analytics, a table of confusion (sometimes also called a confusion matrix), is a table with two rows and two columns that reports the number of false positives, false negatives, true positives, and true negatives. This allows more detailed analysis than mere proportion of correct guesses (accuracy). Accuracy is not a reliable metric for the real performance of a classifier, because it will yield misleading results if the data set is unbalanced (that is, when the number of samples in different classes vary greatly). For example, if there were 95 cats and only 5 dogs in the data set, the classifier could easily be biased into classifying all the samples as cats. The overall accuracy would be 95%, but in practice the classifier would have a 100% recognition rate for the cat class but a 0% recognition rate for the dog class.

Assuming the confusion matrix above, its corresponding table of confusion, for the cat class, would be:

5 true positives
(actual cats that were
correctly classified as cats)
2 false positives
(dogs that were
incorrectly labeled as cats)
3 false negatives
(cats that were
incorrectly marked as dogs)
17 true negatives
(all the remaining animals,
correctly classified as non-cats)

The final table of confusion would contain the average values for all classes combined.

Let us define an experiment from P positive instances and N negative instances for some condition. The four outcomes can be formulated in a 2×2 contingency table or confusion matrix, as follows:

True condition
Total population Condition positive Condition negative Prevalence = Σ Condition positive/Σ Total population
Predicted
condition
Predicted condition
positive
True positive False positive
(Type I error)
Positive predictive value (PPV), Precision = Σ True positive/Σ Test outcome positive False discovery rate (FDR) = Σ False positive/Σ Test outcome positive
Predicted condition
negative
False negative
(Type II error)
True negative False omission rate (FOR) = Σ False negative/Σ Test outcome negative Negative predictive value (NPV) = Σ True negative/Σ Test outcome negative
Accuracy (ACC) = Σ True positive + Σ True negative/Σ Total population True positive rate (TPR), Sensitivity, Recall = Σ True positive/Σ Condition positive False positive rate (FPR), Fall-out = Σ False positive/Σ Condition negative Positive likelihood ratio (LR+) = TPR/FPR Diagnostic odds ratio (DOR) = LR+/LR−
False negative rate (FNR), Miss rate = Σ False negative/Σ Condition positive True negative rate (TNR), Specificity (SPC) = Σ True negative/Σ Condition negative Negative likelihood ratio (LR−) = FNR/TNR

See also

References

  1. ^ Fawcett, Tom (2006). "An Introduction to ROC Analysis". Pattern Recognition Letters 27 (8): 861 – 874.  
  2. ^ a b Powers, David M W (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation" (PDF). Journal of Machine Learning Technologies 2 (1): 37–63. 
  3. ^ Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment 62 (1): 77–89.  

External links

  • Theory about the confusion matrix
  • GM-RKB Confusion Matrix concept page
This article was sourced from Creative Commons Attribution-ShareAlike License; additional terms may apply. World Heritage Encyclopedia content is assembled from numerous content providers, Open Access Publishing, and in compliance with The Fair Access to Science and Technology Research Act (FASTR), Wikimedia Foundation, Inc., Public Library of Science, The Encyclopedia of Life, Open Book Publishers (OBP), PubMed, U.S. National Library of Medicine, National Center for Biotechnology Information, U.S. National Library of Medicine, National Institutes of Health (NIH), U.S. Department of Health & Human Services, and USA.gov, which sources content from all federal, state, local, tribal, and territorial government publication portals (.gov, .mil, .edu). Funding for USA.gov and content contributors is made possible from the U.S. Congress, E-Government Act of 2002.
 
Crowd sourced content that is contributed to World Heritage Encyclopedia is peer reviewed and edited by our editorial staff to ensure quality scholarly research articles.
 
By using this site, you agree to the Terms of Use and Privacy Policy. World Heritage Encyclopedia™ is a registered trademark of the World Public Library Association, a non-profit organization.
 


Copyright © World Library Foundation. All rights reserved. eBooks from Project Gutenberg are sponsored by the World Library Foundation,
a 501c(4) Member's Support Non-Profit Organization, and is NOT affiliated with any governmental agency or department.