Skip to contents

It estimates the Distance Correlation coefficient (dcorr) for a continuous predicted-observed dataset.


dcorr(data = NULL, obs, pred, tidy = FALSE, na.rm = TRUE)



(Optional) argument to call an existing data frame containing the data.


Vector with observed values (numeric).


Vector with predicted values (numeric).


logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).


Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.


an object of class numeric within a list (if tidy = FALSE) or within a data frame (if tidy = TRUE).


The dcorr function is a wrapper for the dcor function from the energy-package. See Rizzo & Szekely (2022). The distance correlation (dcorr) coefficient is a novel measure of dependence between random vectors introduced by Szekely et al. (2007).

The dcorr is characterized for being symmetric, which is relevant for the predicted-observed case (PO).

For all distributions with finite first moments, distance correlation \(\mathcal R\) generalizes the idea of correlation in two fundamental ways:

(1) \(\mathcal R(P,O)\) is defined for \(P\) and \(O\) in arbitrary dimension.

(2) \(\mathcal R(P,O)=0\) characterizes independence of \(P\) and \(O\).

Distance correlation satisfies \(0 \le \mathcal R \le 1\), and \(\mathcal R = 0\) only if \(P\) and \(O\) are independent. Distance covariance \(\mathcal V\) provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients \(\mathcal V\) and \(\mathcal R\) are given in Szekely et al. (2007).

The empirical distance correlation \(\mathcal{R}_n(\mathbf{P,O})\) is the square root of $$ \mathcal{R}^2_n(\mathbf{P,O})= \frac {\mathcal{V}^2_n(\mathbf{P,O})} {\sqrt{ \mathcal{V}^2_n (\mathbf{P}) \mathcal{V}^2_n(\mathbf{O})}}. $$

For the formula and more details, see online-documentation and the energy-package


Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007). Measuring and testing dependence by correaltion of distances. Annals of Statistics, Vol. 35(6): 2769-2794. doi:10.1214/009053607000000505 .

Rizzo, M., and Szekely, G. (2022). energy: E-Statistics: Multivariate Inference via the Energy of Data. R package version 1.7-10.


# \donttest{
P <- rnorm(n = 100, mean = 0, sd = 10)
O <- P + rnorm(n=100, mean = 0, sd = 3)
dcorr(obs = P, pred = O)
#> [1] 0.9290235
# }