You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Supporting source code for the paper "Rescaling and other forms of unsupervised preprocessing may bias cross-validation" by Amit Moscovich and Saharon Rosset.
This code base produces the figures of the paper:
"On the cross-validation bias due to unsupervised pre-processing" by Amit Moscovich and Saharon Rosset.
https://arxiv.org/abs/1901.08974v4
By running produce_all_figures_from_scratch.py, you should be able to exactly reproduce the figures in the paper.
Using the default number of repetitions (as used in the paper), this simulation takes 1-2 years on a single core. Therefore it is highly recommended to:
Do a test run with much smaller values of the constants RESCALED_LASSO_LOW_DIM_N_REPETITIONS, etc.
Run this program on a strong multi-core machine. The code automatically parallelizes the simulations using Python's multiprocessing.Pool.
Prerequisites
Python 3 is required with SciPy, scikit-learn, mkl and mkl_random modules.
The easiest way to install these to download the Anaconda python distribution.
Since the figures use latex rendering for the labels, you need:
TeXLive. The latex binary must be in the command path.
dvipdf and dvipng
(or you can just remove the TeX code from the labels used in plotting the figures)
Supporting source code for the paper "Rescaling and other forms of unsupervised preprocessing may bias cross-validation" by Amit Moscovich and Saharon Rosset.