CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
WIP: Estimator summary #804
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Most awesome! Still needs:
|
This is a great idea Mathieu! Will it include all of the estimators? (ie. LinearSVC as a separate entry?) |
Yep, we need to do it for all estimators (I forgot LinearSVC :)). This will take some time but once this is done, we can just ask people who send PR to update it just like we do for whats_new.rst. The links to the narrative doc are a good idea. |
I think links to the narrative and to the references would be good. Not sure how to do that, though. |
Very nice ! This is going to be a very useful addition to the docs. |
Should we stick to alphabetical ordering or birds-of-a-feather? (LinearSVC, NuSVC and SVC next to each other?) |
@vene: I was wondering about this too. Any opinion? The clustering chapter (https://scikit-learn.org/dev/modules/clustering.html) has a nice table summary. I wonder if we should keep it in the clustering section or move it here. |
@larsmans: fixed, thx. By multi-task, I mean the fact of supporting a 2d Y of shape |
I had gathered that from your message on the ML, which I hadn't read yet, sorry. You might want to add an explanatory note; I'm used to this being called "multiple regression". (My first thought was even multiprocessing/ |
Sure, I will add a note. One application of multi-task in the case of classification is multi-label classification : Y would then be the indicator matrix. |
What would be nice is to use javascript to enable users to sort columns by values (e.g. sort all classifiers with "sparse" set to "yes"). |
I wonder if these tables should not be located at the beginning of the relevant section (supervised_learning for the regression and classification, and clustering for the other). I think that users will more easily stumble on them. |
I'm not sure if they should be at the beginning of the sections. It also seems good to have them in one place, since then people can more easily browse it. |
As an afterthought, it would then be a bit redundant with the references..... Is the idea to give a feature list or rather to give details about the pros and cons of different estimators? |
I am a bit worried about the multiplication of entry points. |
I thought about adding the tables at the beginning of the chapters (like in the clustering chapter) but the categories are different. For example, "supervised learning" is split into chapters for "SGD", "naive bayes" etc... Whereas I have an entire table for classifiers. I think it is much more useful to compare all classifiers at once. The same goes for regression, clustering (which would include GMMs) and so on. The reference section could go away if the list is complete enough. Another solution is to reorganize the documentation around new axes:
However, some chapters like "SGD" or "Trees" contain information for both classification and regression and may not be easy to split. |
I agree that having all classifiers in one place would be good. |
I agree. One option would be to have the tables at the beginning of the |
@GaelVaroquaux suggested in #1108 that we autogenerate the table. I think this is might actually be feasible. |
Closing this one, no time to work on this. |
Here's a pull request to add a summary of the estimator capabilities to the doc. This is a work-in-progress and I need your help!
To do: