You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Automatic Discovery of the Statistical Types of Variables in a Dataset
This code implements the Bayesian method and reproduces the experiment in
I. Valera and Z. Ghahramani,
"Automatic Discovery of the Statistical Types of Variables in a Dataset",
34th International Conference on Machine Learning (ICML 2017). Sydney (Australia), 2017.
Please, use the above details to cite this work.
Calling from Matlab
function simLik(datasetC,Nits,KK,it)
%% runs proposed Bayesian method to infer the datatypes in a dataset.
% Inputs:
% datasetC: name of the dataset to be inferred
% Nit: number of interations of the Gibbs sampler
% KK: low rank representation complexity (i.e., number of features)
% itt: number of simulation
% Outputs: returns void but saves a file with the restuls, i.e., the
% test log-likelihood adn a vector with the inferred weights for the
% different datatypes in each dimension.
Alternatively, the fucntion simComp(datasetC,Nits,KK,itt) runs baseline in the paper above, which assumes all the continuous variables to be Gaussian and all the dicrete variables to be categorical
Requirements
- Matlab 2012b or higher
- GSL library
In UBUNTU: sudo apt-get install libgsl0ldbl or sudo apt-get install libgsl0-dev
- GMP library
In UBUNTU: sudo apt-get install libgmp3-dev