A MATLAB implementation of Being Robust (in High Dimensions) Can Be Practical from ICML 2017.
This project requires installation of the following packages:
- Fast SVD and PCA: Used for faster computation of SVD for some large matrices.
- Novembre_etal_2008_misc: A repository containing data from Genes Mirror Geography in Europe. Used for our semi-synthetic experiments. Extract the following three files to
robust-filter/genomicData:POPRES_08_24_01.EuroThinFinal.LD_0.8.exLD.out0-PCA.eigsPOPRES_08_24_01.EuroThinFinal.LD_0.8.exLD.out0-PCA.evalPOPRESID_Color.txt
The following packages are optional, and are only used for comparison of our algorithm with alternative estimators:
- AgnosticMeanAndCovarianceCode: An implementation of the algorithms from the paper Agnostic Estimation of Mean and Covariance from FOCS 2016, authored by Kevin A. Lai, Anup B. Rao, and Santosh Vempala. Used to compare the statistical accuracy of our algorithms against theirs.
- CVX: A MATLAB library for convex optimization. Used for implementing methods from Robust Principal Component Analysis? and Robust PCA via Outlier Pursuit.
This repository contains several algorithms -- we identify files relevant to our algorithm, and those which are only used for comparison with alternatives.
The following are files which are used in our algorithms' implementations, and can all be found in the filterCode subdirectory.
filterGaussianMean.m: Our algorithm for estimating the mean of a GaussianfilterGaussianCov.m: Our algorithm for estimating the covariance of a GaussianfindMaxPoly.m: Finds the structured degree-two polynomial which is maximized by the dataflatten.mandsharpen.m: Convert between matrix and vector representations
filterGaussianCovTuned.m: A version of our covariance estimation algorithm which is tuned to select hyperparametersmahalanobis.m: Compute Mahalanobis rescaling of a matrix
Test files, for demonstrating performance of our estimators for mean and covariance. These can be found in the testCode subdirectory:
testGaussianMean.m: Tests our mean estimation algorithmtestGaussianCov.m: Tests our covariance estimation algorithmtestGenomicData.m: Tests our covariance estimation algorithm on semi-synthetic genome dataset
The following are files which are used for comparison with other competing algorithms, and can be found in the comparisonCode subdirectory:
Algorithms for estimating the mean of a Gaussian:
ransacGaussianMean.m: A RANSAC-based methodgeoMedianGaussianMean.m: Geometric medianpruneGaussianMean.m: Coordinate-wise median, followed by naive pruning of distant points
Algorithms for estimating the covariance of a Gaussian:
pruneGaussianCov.m: Naive pruning of distant pointsransacMVE.m: A RANSAC-based methodMVE.m: Approximating the MVE for a small dataset
ADPCP.m: Principal Component Pursuit by Alternating Directions, from Robust Principal Component Analysis?specThresh.mandshrinkage.m: Singular value thresholding and shrinkage operatorsnorm_nuc.m: Compute the nuclear norm of a matrix
Comparison files, for evaluating and comparing the performance of estimators for mean and covariance:
testGeoMedian.m: Demonstrates that the geometric median incurs an O(sqrt(d)) loss in accuracycompareMeanEstimators.m: Compares mean estimation algorithmscompareCovEstimators.m: Compares covariance estimation algorithmscompareGenomicData.m: Compares covariance estimation algorithms to semi-synthetic genome dataset
Figures in the paper can be approximately reproduced by running the following scripts, which are in the comparisonCode subdirectory:
- Figure 1:
compareMeanEstimators.m - Figure 2:
compareCovEstimators.m - Figures 3 and 4:
compareGenomicData.m
This repository is an implementation of our paper Being Robust (in High Dimensions) Can Be Practical from ICML 2017, authored by Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra, and Alistair Stewart.
See also our original theory paper, Robust Estimators in High Dimensions without the Computational Intractability, which appeared in FOCS 2016.
If you use our code or paper, we ask that you please cite:
@inproceedings{DiakonikolasKKLMS17,
author = {Diakonikolas, Ilias and Kamath, Gautam and Kane, Daniel M. and Li, Jerry and Moitra, Ankur and Stewart, Alistair},
title = {Being Robust (in High Dimensions) Can Be Practical},
booktitle = {Proceedings of the 34th International Conference on Machine Learning},
series = {ICML '17},
year = {2017},
pages = {999--1008},
publisher = {JMLR, Inc.}
}