It estimates the Distance Correlation coefficient (dcorr) for a continuous predicted-observed dataset.
Arguments
- data
(Optional) argument to call an existing data frame containing the data.
- obs
Vector with observed values (numeric).
- pred
Vector with predicted values (numeric).
- tidy
logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).
- na.rm
Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.
Value
an object of class numeric
within a list
(if tidy = FALSE) or within a
data frame
(if tidy = TRUE).
Details
The dcorr function is a wrapper for the dcor
function
from the energy-package. See Rizzo & Szekely (2022). The distance
correlation (dcorr) coefficient is a novel measure of dependence
between random vectors introduced by Szekely et al. (2007).
The dcorr is characterized for being symmetric, which is relevant for the predicted-observed case (PO).
For all distributions with finite first moments, distance correlation \(\mathcal R\) generalizes the idea of correlation in two fundamental ways:
(1) \(\mathcal R(P,O)\) is defined for \(P\) and \(O\) in arbitrary dimension.
(2) \(\mathcal R(P,O)=0\) characterizes independence of \(P\) and \(O\).
Distance correlation satisfies \(0 \le \mathcal R \le 1\), and \(\mathcal R = 0\) only if \(P\) and \(O\) are independent. Distance covariance \(\mathcal V\) provides a new approach to the problem of testing the joint independence of random vectors. The formal definitions of the population coefficients \(\mathcal V\) and \(\mathcal R\) are given in Szekely et al. (2007).
The empirical distance correlation \(\mathcal{R}_n(\mathbf{P,O})\) is the square root of $$ \mathcal{R}^2_n(\mathbf{P,O})= \frac {\mathcal{V}^2_n(\mathbf{P,O})} {\sqrt{ \mathcal{V}^2_n (\mathbf{P}) \mathcal{V}^2_n(\mathbf{O})}}. $$
For the formula and more details, see online-documentation and the energy-package
References
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007).
Measuring and testing dependence by correaltion of distances. Annals of Statistics, Vol. 35(6): 2769-2794.
doi:10.1214/009053607000000505
.
Rizzo, M., and Szekely, G. (2022).
energy: E-Statistics: Multivariate Inference via the Energy of Data.
R package version 1.7-10.
https://CRAN.R-project.org/package=energy.