Distance Correlation

It estimates the Distance Correlation coefficient (dcorr) for a continuous predicted-observed dataset.

Usage

dcorr(data = NULL, obs, pred, tidy = FALSE, na.rm = TRUE)

Arguments

data: (Optional) argument to call an existing data frame containing the data.
obs: Vector with observed values (numeric).
pred: Vector with predicted values (numeric).
tidy: logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).
na.rm: Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.

Value

an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).

Details

The dcorr function is a wrapper for the dcor function from the energy-package. See Rizzo & Szekely (2022). The distance correlation (dcorr) coefficient is a novel measure of dependence between random vectors introduced by Szekely et al. (2007).

The dcorr is characterized for being symmetric, which is relevant for the predicted-observed case (PO).

For all distributions with finite first moments, distance correlation $\mathcal R$ generalizes the idea of correlation in two fundamental ways:

(1) $\mathcal R(P,O)$ is defined for $P$ and $O$ in arbitrary dimension.

(2) $\mathcal R(P,O)=0$ characterizes independence of $P$ and $O$.

Distance correlation satisfies $0 \le \mathcal R \le 1$, and $\mathcal R = 0$ only if $P$ and $O$ are independent. Distance covariance $\mathcal V$ provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients $\mathcal V$ and $\mathcal R$ are given in Szekely et al. (2007).

The empirical distance correlation $\mathcal{R}_n(\mathbf{P,O})$ is the square root of $$ \mathcal{R}^2_n(\mathbf{P,O})= \frac {\mathcal{V}^2_n(\mathbf{P,O})} {\sqrt{ \mathcal{V}^2_n (\mathbf{P}) \mathcal{V}^2_n(\mathbf{O})}}. $$

For the formula and more details, see online-documentation and the energy-package

References

Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007). Measuring and testing dependence by correaltion of distances. Annals of Statistics, Vol. 35(6): 2769-2794. doi:10.1214/009053607000000505 .

Rizzo, M., and Szekely, G. (2022). energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7-10. https://CRAN.R-project.org/package=energy.

Examples

# \donttest{
set.seed(1)
P <- rnorm(n = 100, mean = 0, sd = 10)
O <- P + rnorm(n=100, mean = 0, sd = 3)
dcorr(obs = P, pred = O)
#> [1] 0.9290235
# }