Matthews Correlation Coefficient | Phi Coefficient

It estimates the mcc for a nominal/categorical predicted-observed dataset.

phi_coef estimates the Phi coefficient (equivalent to the Matthews Correlation Coefficient mcc).

Usage

mcc(data = NULL, obs, pred, pos_level = 2, tidy = FALSE, na.rm = TRUE)

phi_coef(data = NULL, obs, pred, pos_level = 2, tidy = FALSE, na.rm = TRUE)

Arguments

data: (Optional) argument to call an existing data frame containing the data.
obs: Vector with observed values (character | factor).
pred: Vector with predicted values (character | factor).
pos_level: Integer, for binary cases, indicating the order (1|2) of the level corresponding to the positive. Generally, the positive level is the second (2) since following an alpha-numeric order, the most common pairs are (Negative | Positive), (0 | 1), (FALSE | TRUE). Default : 2.
tidy: Logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list; Default : FALSE.
na.rm: Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

The mcc it is also known as the phi-coefficient. It has gained popularity within the machine learning community to summarize into a single value the confusion matrix of a binary classification.

It is particularly useful when the number of observations belonging to each class is uneven or imbalanced. It is characterized for being symmetric (i.e. no class has more relevance than the other). It is bounded between -1 and 1. The closer to 1 the better the classification performance.

For the formula and more details, see online-documentation

References

Chicco, D., Jurman, G. (2020) The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genomics 21, 6 (2020). doi:10.1186/s12864-019-6413-7

Examples

# \donttest{
set.seed(123)
# Two-class
binomial_case <- data.frame(labels = sample(c("True","False"), 100, replace = TRUE), 
predictions = sample(c("True","False"), 100, replace = TRUE))

# Get mcc estimate for two-class case
mcc(data = binomial_case, obs = labels, pred = predictions, tidy = TRUE)
#>            mcc
#> 1 -0.008916106
# }