Precision | Positive Predictive Value

precision estimates the precision (a.k.a. positive predictive value -ppv-) for a nominal/categorical predicted-observed dataset.

ppv estimates the Positive Predictive Value (equivalent to precision) for a nominal/categorical predicted-observed dataset.

FDR estimates the complement of precision (a.k.a. positive predictive value -PPV-) for a nominal/categorical predicted-observed dataset.

Usage

precision(
  data = NULL,
  obs,
  pred,
  tidy = FALSE,
  atom = FALSE,
  na.rm = TRUE,
  pos_level = 2
)

ppv(
  data = NULL,
  obs,
  pred,
  tidy = FALSE,
  atom = FALSE,
  na.rm = TRUE,
  pos_level = 2
)

FDR(
  data = NULL,
  obs,
  pred,
  atom = FALSE,
  pos_level = 2,
  tidy = FALSE,
  na.rm = TRUE
)

Arguments

data: (Optional) argument to call an existing data frame containing the data.
obs: Vector with observed values (character | factor).
pred: Vector with predicted values (character | factor).
tidy: Logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list; Default : FALSE.
atom: Logical operator (TRUE/FALSE) to decide if the estimate is made for each class (atom = TRUE) or at a global level (atom = FALSE); Default : FALSE.
na.rm: Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.
pos_level: Integer, for binary cases, indicating the order (1|2) of the level corresponding to the positive. Generally, the positive level is the second (2) since following an alpha-numeric order, the most common pairs are (Negative | Positive), (0 | 1), (FALSE | TRUE). Default : 2.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

The precision is a non-normalized coefficient that represents the ratio between the correctly predicted cases (or true positive -TP- for binary cases) to the total predicted observations for a given class (or total predicted positive -PP- for binary cases) or at overall level.

For binomial cases, \(precision = \frac{TP}{PP} = \frac{TP}{TP + FP} \)

The precision metric is bounded between 0 and 1. The closer to 1 the better. Values towards zero indicate low precision of predictions. It can be estimated for each particular class or at a global level.

The false detection rate or false discovery rate (FDR) represents the proportion of false positives with respect to the total number of cases predicted as positive.

For binomial cases, \(FDR = 1 - precision = \frac{FP}{PP} = \frac{FP}{TP + FP} \)

The precision metric is bounded between 0 and 1. The closer to 1 the better. Values towards zero indicate low precision of predictions.

For the formula and more details, see online-documentation

References

Ting K.M. (2017) Precision and Recall. In: Sammut C., Webb G.I. (eds) Encyclopedia of Machine Learning and Data Mining. Springer, Boston, MA. doi:10.1007/978-1-4899-7687-1_659

Examples

# \donttest{
set.seed(123)
# Two-class
binomial_case <- data.frame(labels = sample(c("True","False"), 100, 
replace = TRUE), predictions = sample(c("True","False"), 100, replace = TRUE))
# Multi-class
multinomial_case <- data.frame(labels = sample(c("Red","Blue", "Green"), 100, 
replace = TRUE), predictions = sample(c("Red","Blue", "Green"), 100, replace = TRUE))

# Get precision estimate for two-class case
precision(data = binomial_case, obs = labels, pred = predictions, tidy = TRUE)
#>   precision
#> 1 0.5652174

# Get FDR estimate for two-class case
FDR(data = binomial_case, obs = labels, pred = predictions, tidy = TRUE)
#>         FDR
#> 1 0.4347826

# Get precision estimate for each class for the multi-class case
precision(data = multinomial_case, obs = labels, pred = predictions, tidy = TRUE, atom = TRUE)
#>       precision
#> Blue  0.2903226
#> Green 0.1666667
#> Red   0.3846154

# Get precision estimate for the multi-class case at a global level
precision(data = multinomial_case, obs = labels, pred = predictions, tidy = TRUE, atom = TRUE)
#>       precision
#> Blue  0.2903226
#> Green 0.1666667
#> Red   0.3846154
# }