WIP: Estimator summary #804

mblondel · 2012-04-30T12:14:08Z

Here's a pull request to add a summary of the estimator capabilities to the doc. This is a work-in-progress and I need your help!

To do:

add more columns (time complexity, space complexity, ...)
add more estimators

amueller · 2012-04-30T12:17:00Z

Most awesome!

Still needs:

decomposition & manifold learning
clustering
preprocessing (?)
other

vene · 2012-04-30T12:23:02Z

This is a great idea Mathieu! Will it include all of the estimators? (ie. LinearSVC as a separate entry?)
Also do you think a link to the appropriate section of the docs would be useful? People might start from such a page, and I think the narratives would be more useful than the class reference right there.

mblondel · 2012-04-30T12:33:16Z

Yep, we need to do it for all estimators (I forgot LinearSVC :)). This will take some time but once this is done, we can just ask people who send PR to update it just like we do for whats_new.rst.

The links to the narrative doc are a good idea.

amueller · 2012-04-30T12:34:41Z

I think links to the narrative and to the references would be good. Not sure how to do that, though.

contribs

NelleV · 2012-04-30T12:53:10Z

Very nice ! This is going to be a very useful addition to the docs.

vene · 2012-04-30T13:04:23Z

Should we stick to alphabetical ordering or birds-of-a-feather? (LinearSVC, NuSVC and SVC next to each other?)

mblondel · 2012-04-30T15:25:27Z

@vene: I was wondering about this too. Any opinion?

The clustering chapter (https://scikit-learn.org/dev/modules/clustering.html) has a nice table summary. I wonder if we should keep it in the clustering section or move it here.

mblondel · 2012-04-30T15:35:36Z

@larsmans: fixed, thx. By multi-task, I mean the fact of supporting a 2d Y of shape [n_samples, n_tasks] (i.e. each task is encoded in one column of Y).

larsmans · 2012-04-30T15:47:19Z

I had gathered that from your message on the ML, which I hadn't read yet, sorry. You might want to add an explanatory note; I'm used to this being called "multiple regression". (My first thought was even multiprocessing/n_jobs...)

mblondel · 2012-04-30T15:51:15Z

Sure, I will add a note. One application of multi-task in the case of classification is multi-label classification : Y would then be the indicator matrix.

mblondel · 2012-04-30T15:56:35Z

What would be nice is to use javascript to enable users to sort columns by values (e.g. sort all classifiers with "sparse" set to "yes").

GaelVaroquaux · 2012-05-05T17:14:56Z

I wonder if these tables should not be located at the beginning of the relevant section (supervised_learning for the regression and classification, and clustering for the other). I think that users will more easily stumble on them.

amueller · 2012-05-05T18:48:34Z

I'm not sure if they should be at the beginning of the sections. It also seems good to have them in one place, since then people can more easily browse it.
But then, we definitely need to make it very easy to find. We could even put it in the main menu bar as "overview".

amueller · 2012-05-05T18:50:31Z

As an afterthought, it would then be a bit redundant with the references.....

Is the idea to give a feature list or rather to give details about the pros and cons of different estimators?

GaelVaroquaux · 2012-05-05T18:51:19Z

I'm not sure if they should be at the beginning of the sections. It also seems good to have them in one place, since then people can more easily browse it.

I am a bit worried about the multiplication of entry points.

mblondel · 2012-05-06T01:56:15Z

I thought about adding the tables at the beginning of the chapters (like in the clustering chapter) but the categories are different. For example, "supervised learning" is split into chapters for "SGD", "naive bayes" etc... Whereas I have an entire table for classifiers. I think it is much more useful to compare all classifiers at once. The same goes for regression, clustering (which would include GMMs) and so on. The reference section could go away if the list is complete enough.

Another solution is to reorganize the documentation around new axes:

classification
regression
clustering
decomposition
...

However, some chapters like "SGD" or "Trees" contain information for both classification and regression and may not be easy to split.

amueller · 2012-05-06T07:26:13Z

I agree that having all classifiers in one place would be good.
I have mixed feelings about replacing the references, though. Maybe we need to think this through a bit more...

GaelVaroquaux · 2012-05-06T08:01:06Z

I thought about adding the tables at the beginning of the chapters (like in the clustering chapter) but the categories are different. For example, "supervised learning" is split into chapters for "SGD", "naive bayes" etc... Whereas I have an entire table for classifiers. I think it is much more useful to compare all classifiers at once. The same goes for regression, clustering (which would include GMMs) and so on.

I agree. One option would be to have the tables at the beginning of the
supervised learning and unsupervised learning chapters.

amueller · 2012-09-03T16:50:57Z

@GaelVaroquaux suggested in #1108 that we autogenerate the table. I think this is might actually be feasible.

mblondel · 2013-08-14T16:26:46Z

Closing this one, no time to work on this.

Estimator summary (work in progress).

cae3547

uniform capitalization of Vs

07a06a1

Merge pull request #12 from vene/m-estimator-summary

cecd976

contribs

Use list-table + clustering.

46d1a43

amueller closed this May 5, 2012

amueller reopened this May 5, 2012

mblondel closed this Aug 14, 2013

Uh oh!

WIP: Estimator summary #804

WIP: Estimator summary #804

Uh oh!

Conversation

mblondel commented Apr 30, 2012

Uh oh!

amueller commented Apr 30, 2012

Uh oh!

vene commented Apr 30, 2012

Uh oh!

mblondel commented Apr 30, 2012

Uh oh!

amueller commented Apr 30, 2012

Uh oh!

NelleV commented Apr 30, 2012

Uh oh!

vene commented Apr 30, 2012

Uh oh!

mblondel commented Apr 30, 2012

Uh oh!

mblondel commented Apr 30, 2012

Uh oh!

larsmans commented Apr 30, 2012

Uh oh!

mblondel commented Apr 30, 2012

Uh oh!

mblondel commented Apr 30, 2012

Uh oh!

GaelVaroquaux commented May 5, 2012

Uh oh!

amueller commented May 5, 2012

Uh oh!

amueller commented May 5, 2012

Uh oh!

GaelVaroquaux commented May 5, 2012

Uh oh!

mblondel commented May 6, 2012

Uh oh!

amueller commented May 6, 2012

Uh oh!

GaelVaroquaux commented May 6, 2012

Uh oh!

amueller commented Sep 3, 2012

Uh oh!

mblondel commented Aug 14, 2013

Uh oh!

Uh oh!