It estimates the Maximal Information Coefficient (MIC) for a continuous predicted-observed dataset.
Arguments
- data
(Optional) argument to call an existing data frame containing the data.
- obs
Vector with observed values (numeric).
- pred
Vector with predicted values (numeric).
- tidy
logical operator (TRUE/FALSE) to decide the type of return. TRUE returns a data.frame, FALSE returns a list (default).
- na.rm
Logic argument to remove rows with missing values (NA). Default is na.rm = TRUE.
Value
an object of class numeric
within a list
(if tidy = FALSE) or within a
data frame
(if tidy = TRUE).
Details
The MIC function is a wrapper for the mine_stat
function of the
minerva-package, a collection of Maximal Information-Based Nonparametric statistics (MINE).
See Reshef et al. (2011).
For the predicted-observed case (PO), the MIC is defined as follows: $$\textrm{MIC}(D)=\max_{PO<B(n)} M(D)_{X,Y} = \max_{PO<B(n)} \frac{I^ * (D,P,O)} {log(\min{P,O})},$$ where \(B(n)=n^{\alpha}\) is the search-grid size, \(I^*(D,P,O)\) is the maximum mutual information over all grids P-by-O, of the distribution induced by D on a grid having P and O bins (where the probability mass on a cell of the grid is the fraction of points of D falling in that cell). Albanese et al. (2013).
For the formula and more details, see online-documentation
References
Reshef, D., Reshef, Y., Finucane, H., Grossman, S., McVean, G., Turnbaugh, P.,
Lander, R., Mitzenmacher, M., and Sabeti, P. (2011). Detecting novel associations
in large datasets.
Science 334, 6062. doi:10.1126/science.1205438
.
Albanese, D., M. Filosi, R. Visintainer, S. Riccadonna, G. Jurman, C. Furlanello.
minerva and minepy: a C engine for the MINE suite and its R, Python and MATLAB wrappers.
Bioinformatics (2013) 29(3):407-408. doi:10.1093/bioinformatics/bts707
.