It estimates the accuracy for a nominal/categorical predicted-observed dataset.
Arguments
- data
(Optional) argument to call an existing data frame containing the data.
- obs
Vector with observed values (character | factor).
- pred
Vector with predicted values (character | factor).
- tidy
Logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).
- na.rm
Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.
Value
an object of class numeric
within a list
(if tidy = FALSE) or within a
data frame
(if tidy = TRUE).
Details
Accuracy is the simplest and most popular classification metric in literature. It refers to a measure of the degree to which the predictions of a model matches the reality being modeled. The classification accuracy is calculated as the ratio between the number of correctly classified objects with respect to the total number of cases.
It is bounded between 0 and 1. The closer to 1 the better. Values towards zero indicate low accuracy of predictions. It can be also expressed as percentage if multiplied by 100. It is estimated at a global level (not at the class level).
Accuracy presents limitations to address classification quality under unbalanced classes, and it is not able to distinguish among misclassification distributions. For those cases, it is advised to apply other metrics such as balanced accuracy (baccu), F-score (fscore), Matthews Correlation Coefficient (mcc), or Cohen's Kappa Coefficient (cohen_kappa).
Accuracy is directly related to the error_rate, since accuracy = 1 – error_rate.
For the formula and more details, see online-documentation
References
Sammut & Webb (2017). Accuracy. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. doi:10.1007/978-1-4899-7687-1_3
Examples
# \donttest{
set.seed(123)
# Two-class
binomial_case <- data.frame(labels = sample(c("True","False"), 100,
replace = TRUE), predictions = sample(c("True","False"), 100, replace = TRUE))
# Multi-class
multinomial_case <- data.frame(labels = sample(c("Red","Blue", "Green"), 100,
replace = TRUE), predictions = sample(c("Red","Blue", "Green"), 100, replace = TRUE) )
# Get accuracy estimate for two-class case
accuracy(data = binomial_case, obs = labels, pred = predictions, tidy = TRUE)
#> accuracy
#> 1 0.49
# Get accuracy estimate for multi-class case
accuracy(data = multinomial_case, obs = labels, pred = predictions, tidy = TRUE)
#> accuracy
#> 1 0.29
# }