Skip to contents

It estimates the Distance Correlation coefficient (dcorr) for a continuous predicted-observed dataset.

Usage

dcorr(data = NULL, obs, pred, tidy = FALSE, na.rm = TRUE)

Arguments

data

(Optional) argument to call an existing data frame containing the data.

obs

Vector with observed values (numeric).

pred

Vector with predicted values (numeric).

tidy

logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).

na.rm

Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

The dcorr function is a wrapper for the dcor function from the energy-package. See Rizzo & Szekely (2022). The distance correlation (dcorr) coefficient is a novel measure of dependence between random vectors introduced by Szekely et al. (2007).

The dcorr is characterized for being symmetric, which is relevant for the predicted-observed case (PO).

For all distributions with finite first moments, distance correlation \(\mathcal R\) generalizes the idea of correlation in two fundamental ways:

(1) \(\mathcal R(P,O)\) is defined for \(P\) and \(O\) in arbitrary dimension.

(2) \(\mathcal R(P,O)=0\) characterizes independence of \(P\) and \(O\).

Distance correlation satisfies \(0 \le \mathcal R \le 1\), and \(\mathcal R = 0\) only if \(P\) and \(O\) are independent. Distance covariance \(\mathcal V\) provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients \(\mathcal V\) and \(\mathcal R\) are given in Szekely et al. (2007).

The empirical distance correlation \(\mathcal{R}_n(\mathbf{P,O})\) is the square root of $$ \mathcal{R}^2_n(\mathbf{P,O})= \frac {\mathcal{V}^2_n(\mathbf{P,O})} {\sqrt{ \mathcal{V}^2_n (\mathbf{P}) \mathcal{V}^2_n(\mathbf{O})}}. $$

For the formula and more details, see online-documentation and the energy-package

References

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007). Measuring and testing dependence by correaltion of distances. Annals of Statistics, Vol. 35(6): 2769-2794. doi:10.1214/009053607000000505 .

Rizzo, M., and Szekely, G. (2022). energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7-10. https://CRAN.R-project.org/package=energy.

Examples

# \donttest{
set.seed(1)
P <- rnorm(n = 100, mean = 0, sd = 10)
O <- P + rnorm(n=100, mean = 0, sd = 3)
dcorr(obs = P, pred = O)
#> [1] 0.9290235
# }