Skip to contents

It estimates the Fowlkes-Mallows Index for a nominal/categorical predicted-observed dataset.

Usage

fmi(data = NULL, obs, pred, pos_level = 2, tidy = FALSE, na.rm = TRUE)

Arguments

data

(Optional) argument to call an existing data frame containing the data.

obs

Vector with observed values (character | factor).

pred

Vector with predicted values (character | factor).

pos_level

Integer, for binary cases, indicating the order (1|2) of the level corresponding to the positive. Generally, the positive level is the second (2) since following an alpha-numeric order, the most common pairs are (Negative | Positive), (0 | 1), (FALSE | TRUE). Default : 2.

tidy

Logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list; Default : FALSE.

na.rm

Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

The fmi has gained popularity within the machine learning community to summarize into a single value the confusion matrix of a binary classification. It is particularly useful when the number of observations belonging to each class is uneven or imbalanced. It is characterized for being symmetric (i.e. no class has more relevance than the other). It is bounded between -1 and 1. The closer to 1 the better the classification performance.

The fmi is only available for the evaluation of binary cases (two classes). For multiclass cases, fmi will produce a NA and display a warning.

For the formula and more details, see online-documentation

References

Fowlkes, Edward B; Mallows, Colin L (1983). A method for comparing two hierarchical clusterings. Journal of the American Statistical Association. 78 (383): 553–569. doi:10.1080/01621459.1983.10478008

Examples

# \donttest{
set.seed(123)
# Two-class
binomial_case <- data.frame(labels = sample(c("True","False"), 100, replace = TRUE), 
predictions = sample(c("True","False"), 100, replace = TRUE))
# Get fmi estimate for two-class case
fmi(data = binomial_case, obs = labels, pred = predictions)
#> $fmi
#> [1] 0.5077583
#> 
# }