Skip to contents

It estimates the Adjusted F-score for a nominal/categorical predicted-observed dataset.

Usage

agf(
  data = NULL,
  obs,
  pred,
  pos_level = 2,
  atom = FALSE,
  tidy = FALSE,
  na.rm = TRUE
)

Arguments

data

(Optional) argument to call an existing data frame containing the data.

obs

Vector with observed values (character | factor).

pred

Vector with predicted values (character | factor).

pos_level

Integer, for binary cases, indicating the order (1|2) of the level corresponding to the positive. Generally, the positive level is the second (2) since following an alpha-numeric order, the most common pairs are (Negative | Positive), (0 | 1), (FALSE | TRUE). Default : 2.

atom

Logical operator (TRUE/FALSE) to decide if the estimate is made for each class (atom = TRUE) or at a global level (atom = FALSE); Default : FALSE. When dataset is "binomial" atom does not apply.

tidy

Logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list; Default : FALSE.

na.rm

Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

The Adjusted F-score (or Adjusted F-measure) is an improvement over the F1-score especially when the data classes are imbalanced. This metric more properly accounts for the different misclassification costs across classes. It weights more the sensitivity (recall) metric than precision and gives strength to the false negative values. This index accounts for all elements of the original confusion matrix and provides more weight to patterns correctly classified in the minority class (positive).

It is bounded between 0 and 1. The closer to 1 the better. Values towards zero indicate low performance. For the formula and more details, see online-documentation

References

Maratea, A., Petrosino, A., Manzo, M. (2014). Adjusted-F measure and kernel scaling for imbalanced data learning. Inf. Sci. 257: 331-341. doi:10.1016/j.ins.2013.04.016

Examples

# \donttest{
set.seed(123)
# Two-class
binomial_case <- data.frame(labels = sample(c("True","False"), 100, replace = TRUE), 
predictions = sample(c("True","False"), 100, replace = TRUE))
# Multi-class
multinomial_case <- data.frame(labels = sample(c("Red","Blue", "Green"), 100, replace = TRUE),
predictions = sample(c("Red","Blue", "Green"), 100, replace = TRUE)    )

# Get F-score estimate for two-class case
agf(data = binomial_case, obs = labels, pred = predictions, tidy = TRUE)
#>        agf
#> 1 0.489398

# Get F-score estimate for each class for the multi-class case
agf(data = multinomial_case, obs = labels, pred = predictions, tidy = TRUE)
#> Warning: For multiclass cases, the agf should be estimated at a class level. Please, consider using `atom = TRUE`
#>         agf
#> 1 0.4231173

# Get F-score estimate for the multi-class case at a global level
agf(data = multinomial_case, obs = labels, pred = predictions, tidy = TRUE)
#> Warning: For multiclass cases, the agf should be estimated at a class level. Please, consider using `atom = TRUE`
#>         agf
#> 1 0.4231173
# }