Skip to contents

It estimates the accuracy for a nominal/categorical predicted-observed dataset.

Usage

accuracy(data = NULL, obs, pred, tidy = FALSE, na.rm = TRUE)

Arguments

data

(Optional) argument to call an existing data frame containing the data.

obs

Vector with observed values (character | factor).

pred

Vector with predicted values (character | factor).

tidy

Logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).

na.rm

Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

Accuracy is the simplest and most popular classification metric in literature. It refers to a measure of the degree to which the predictions of a model matches the reality being modeled. The classification accuracy is calculated as the ratio between the number of correctly classified objects with respect to the total number of cases.

It is bounded between 0 and 1. The closer to 1 the better. Values towards zero indicate low accuracy of predictions. It can be also expressed as percentage if multiplied by 100. It is estimated at a global level (not at the class level).

Accuracy presents limitations to address classification quality under unbalanced classes, and it is not able to distinguish among misclassification distributions. For those cases, it is advised to apply other metrics such as balanced accuracy (baccu), F-score (fscore), Matthews Correlation Coefficient (mcc), or Cohen's Kappa Coefficient (cohen_kappa).

Accuracy is directly related to the error_rate, since accuracy = 1 – error_rate.

For the formula and more details, see online-documentation

References

Sammut & Webb (2017). Accuracy. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. doi:10.1007/978-1-4899-7687-1_3

See also

Examples

# \donttest{
set.seed(123)
# Two-class
binomial_case <- data.frame(labels = sample(c("True","False"), 100, 
replace = TRUE), predictions = sample(c("True","False"), 100, replace = TRUE))
# Multi-class
multinomial_case <- data.frame(labels = sample(c("Red","Blue", "Green"), 100, 
replace = TRUE), predictions = sample(c("Red","Blue", "Green"), 100, replace = TRUE) )

# Get accuracy estimate for two-class case
accuracy(data = binomial_case, obs = labels, pred = predictions, tidy = TRUE)
#>   accuracy
#> 1     0.49

# Get accuracy estimate for multi-class case
accuracy(data = multinomial_case, obs = labels, pred = predictions, tidy = TRUE)
#>   accuracy
#> 1     0.29
# }