[MRG+2] Add float32 support for Linear Discriminant Analysis #13273

thibsej · 2019-02-26T10:15:54Z

Reference Issues/PRs

this PR works on #11000 by preserving the dtype in LDA.

What does this implement/fix? Explain your changes.

Any other comments?

thibsej · 2019-02-26T14:07:49Z

sklearn/discriminant_analysis.py

@@ -485,9 +485,10 @@ def fit(self, X, y):
            raise ValueError("unknown solver {} (valid solvers are 'svd', "
                             "'lsqr', and 'eigen').".format(self.solver))
        if self.classes_.size == 2:  # treat binary case as a special case
-            self.coef_ = np.array(self.coef_[1, :] - self.coef_[0, :], ndmin=2)
+            my_type = np.float32 if (X.dtype == np.float32) else np.float64


This line is necessary when X.dtype is int32 or int64.
Should we keep the number of bits short, i.e. perfom casting from int32 to float32, or int32 to float 64 ?

ping @glemaitre

agramfort · 2019-02-26T14:52:52Z

sklearn/discriminant_analysis.py

@@ -485,9 +485,10 @@ def fit(self, X, y):
            raise ValueError("unknown solver {} (valid solvers are 'svd', "
                             "'lsqr', and 'eigen').".format(self.solver))
        if self.classes_.size == 2:  # treat binary case as a special case
-            self.coef_ = np.array(self.coef_[1, :] - self.coef_[0, :], ndmin=2)
+            my_type = np.float32 if (X.dtype in [np.float32, np.int32]) else np.float64


you should already have called check_X_y and/or as_float_array above. Then you can just use X.dtype to specify the dtype of arrays you allocate

Indeed, and the check_X_y of line 430 probably needs to be changed to "dtype=[np.float32, np.float64]"

agramfort · 2019-02-26T14:53:23Z

sklearn/tests/test_discriminant_analysis.py

@@ -295,6 +296,27 @@ def test_lda_dimension_warning(n_classes, n_features):
                      "min(n_features, n_classes - 1).")
        assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)

+@pytest.mark.parametrize("data_type, expected_type",[
+    (np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float32), (np.int64, np.float64)])


pep8 violations. please check

GaelVaroquaux · 2019-02-26T15:01:26Z

No additional on top of those @agramfort and myself made above.

thibsej · 2019-02-26T15:49:03Z

sklearn/discriminant_analysis.py

@@ -427,7 +427,8 @@ def fit(self, X, y):
            Target values.
        """
        # FIXME: Future warning to be removed in 0.23
-        X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
+        X = as_float_array(X)


This line is necessary to get coefficients of type float32 output when X is of type int32. Removing this line gives coefficients of type float64.

This is strange becasue the next check_X_y will convert to float anyway. It seems unecessary, isn't it?

Oh I see why. So X will be converted to float64 and you expect it to be converted to float 32 because X is of type int32. I would say that there is no real reason for that. int32 can default to float64. I am fine with that.

@massich what is the mechanism in the other estimators that you modified?

There was no mechanism set. I've the vague idea that we talked about it long ago, but I can't recall that we agreed on anything. To me it makes sense that if someone willing creates the data in 32, wants to keep it in 32 bits. But if such individual is there she/he will scream at us, or would had done it already. So I've no objection on defaulting to float64.

An int32 is a "very big" int. Hence, it make sense to convert it to a float64. People who want to save memory with ints use int16. Indeed, an int has a larger span than a float for the same memory cost.

glemaitre · 2019-02-26T16:33:35Z

CI are failing

glemaitre · 2019-02-26T16:42:38Z

It looks good. Could you add an entry in the what's new because we change the behavior of LDA.
I am still unsure about as_float_array call.

glemaitre · 2019-02-26T16:34:00Z

sklearn/discriminant_analysis.py

@@ -19,7 +19,7 @@
 from .linear_model.base import LinearClassifierMixin
 from .covariance import ledoit_wolf, empirical_covariance, shrunk_covariance
 from .utils.multiclass import unique_labels
-from .utils import check_array, check_X_y
+from .utils import check_array, check_X_y, as_float_array


Suggested change

from .utils import check_array, check_X_y, as_float_array

from .utils import check_array, check_X_y

from .utils import as_float_array

glemaitre · 2019-02-26T16:40:21Z

sklearn/discriminant_analysis.py

@@ -427,7 +427,8 @@ def fit(self, X, y):
            Target values.
        """
        # FIXME: Future warning to be removed in 0.23
-        X, y = check_X_y(X, y, ensure_min_samples=2, estimator=self)
+        X = as_float_array(X)


This is strange becasue the next check_X_y will convert to float anyway. It seems unecessary, isn't it?

thibsej · 2019-02-26T17:15:28Z

sklearn/tests/test_discriminant_analysis.py

@@ -296,6 +297,29 @@ def test_lda_dimension_warning(n_classes, n_features):
        assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)


+@pytest.mark.parametrize("data_type, expected_type", [
+    (np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float32),


Convention for int32 is casting to float64.

glemaitre · 2019-02-26T17:19:42Z

@thibsej Could you add an entry to what's new

massich · 2019-02-26T17:24:43Z

sklearn/tests/test_discriminant_analysis.py

@@ -296,6 +297,29 @@ def test_lda_dimension_warning(n_classes, n_features):
        assert_warns_message(FutureWarning, future_msg, lda.fit, X, y)


+@pytest.mark.parametrize("data_type, expected_type", [
+    (np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float64),
+    (np.int64, np.float64)])


I find this one more readable:

@pytest.mark.parametrize("data_type, expected_type", [ (np.float32, np.float32), (np.float64, np.float64), (np.int32, np.float64), (np.int64, np.float64) ])

massich · 2019-02-26T17:31:01Z

sklearn/tests/test_discriminant_analysis.py

+        # Check value consistency between types
+        rtol = 1e-6
+        assert_allclose(clf_32.coef_, clf_64.coef_.astype(np.float32),
+                        rtol=rtol)


do not use astype assert_allclose should be able to handle it.

assert_allclose(clf_32.coef_, clf_64.coef_, rtol=rtol)

massich · 2019-02-26T17:39:21Z

@thibsej Could you add an entry to what's new

You should search for this section in doc/whats_new/v0.21.rst.
(if its not there you create it). Then you add the following entry

:mod:`sklearn.discriminant_analysis`
....................................
- |Enhancement| :class:`discriminant_analysis.LinearDiscriminantAnalysis` now
preserves ``float32`` and ``float64`` dtypes. :issues:`8769` and
:issues:`11000` by :user:`Thibault Sejourne <thibsej>`

massich

LGTM, once addressed the nitpicks

GaelVaroquaux · 2019-02-26T18:00:07Z

LGTM once the tests pass.

agramfort · 2019-02-26T18:20:47Z

sklearn/tests/test_discriminant_analysis.py

+@pytest.mark.parametrize("data_type, expected_type", [
+    (np.float32, np.float32),
+    (np.float64, np.float64),
+    (np.int32, np.float64),


Going from int32 to float64 is not what was done by isotonic regression isofused branch this morning. We should be consistent here.

I am +1 for int32 -> float64. It was the semantic for check_array.
@ogrisel was also for this conversion. I would think that we should correct the isotonic regression then.

+1. Merging. This is a net improvement. We'll nitpick later :D

glemaitre · 2019-02-26T18:33:51Z

We should almost have a common test.

massich · 2019-02-27T09:13:56Z

Thx a lot. @thibsej wellcome to the family of sklearn contributers

…

On Wed, Feb 27, 2019, 10:11 Gael Varoquaux ***@***.***> wrote: Merged #13273 <#13273> into master. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#13273 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AGt-44_gpAl_XBbRrCq2vg3WTUvSMiMtks5vRkupgaJpZM4bR0bh> .

…learn#13273) * [skip ci] Empty commit to trigger PR * Add dtype testing * Fix: dtype testing * Fix test_estimators[OneVsRestClassifier-check_estimators_dtypes] * TST refactor using parametrize + Add failing test for int32 * Fix for int32 * Fix code according to review + Fix PEP8 violation * Fix dtype for int32 and complex * Fix pep8 violation * Update whatsnew + test COSMIT

…cikit-learn#13273)" This reverts commit 2c98ed6.

…learn#13273) * [skip ci] Empty commit to trigger PR * Add dtype testing * Fix: dtype testing * Fix test_estimators[OneVsRestClassifier-check_estimators_dtypes] * TST refactor using parametrize + Add failing test for int32 * Fix for int32 * Fix code according to review + Fix PEP8 violation * Fix dtype for int32 and complex * Fix pep8 violation * Update whatsnew + test COSMIT

thibsej added 4 commits February 26, 2019 10:50

[skip ci] Empty commit to trigger PR

da6518a

Add dtype testing

a748478

Fix: dtype testing

4586496

Fix test_estimators[OneVsRestClassifier-check_estimators_dtypes]

f6c6bcc

thibsej marked this pull request as ready for review February 26, 2019 13:59

thibsej commented Feb 26, 2019

View reviewed changes

thibsej added 2 commits February 26, 2019 15:44

TST refactor using parametrize + Add failing test for int32

3223d9c

Fix for int32

49c7c38

thibsej changed the title ~~Add float32 support for Linear Discriminant Analysis~~ [MRG] Add float32 support for Linear Discriminant Analysis Feb 26, 2019

agramfort reviewed Feb 26, 2019

View reviewed changes

Fix code according to review + Fix PEP8 violation

d8cd7cb

thibsej commented Feb 26, 2019

View reviewed changes

glemaitre self-requested a review February 26, 2019 16:36

glemaitre reviewed Feb 26, 2019

View reviewed changes

thibsej added 2 commits February 26, 2019 18:12

Fix dtype for int32 and complex

36248e8

Fix pep8 violation

d657605

thibsej commented Feb 26, 2019

View reviewed changes

massich reviewed Feb 26, 2019

View reviewed changes

massich approved these changes Feb 26, 2019

View reviewed changes

GaelVaroquaux changed the title ~~[MRG] Add float32 support for Linear Discriminant Analysis~~ [MRG+1] Add float32 support for Linear Discriminant Analysis Feb 26, 2019

Update whatsnew + test COSMIT

cf02ebb

GaelVaroquaux changed the title ~~[MRG+1] Add float32 support for Linear Discriminant Analysis~~ [MRG+2] Add float32 support for Linear Discriminant Analysis Feb 26, 2019

agramfort reviewed Feb 26, 2019

View reviewed changes

GaelVaroquaux merged commit 415fd83 into scikit-learn:master Feb 27, 2019

thibsej deleted the float32_support_lda branch February 27, 2019 09:20

GaelVaroquaux mentioned this pull request Feb 27, 2019

LogisticRegression convert to float64 #8769

Closed

glemaitre mentioned this pull request Mar 1, 2019

Preserving dtype for float32 / float64 in transformers #11000

Open

28 tasks

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+2] Add float32 support for Linear Discriminant Analysis (s…

cf90e64

…cikit-learn#13273)" This reverts commit 2c98ed6.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "[MRG+2] Add float32 support for Linear Discriminant Analysis (s…

2ac866c

…cikit-learn#13273)" This reverts commit 2c98ed6.

	from .utils import check_array, check_X_y, as_float_array
	from .utils import check_array, check_X_y
	from .utils import as_float_array

Uh oh!

[MRG+2] Add float32 support for Linear Discriminant Analysis #13273

[MRG+2] Add float32 support for Linear Discriminant Analysis #13273

Uh oh!

Conversation

thibsej commented Feb 26, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Feb 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Feb 26, 2019

Uh oh!

glemaitre commented Feb 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Feb 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

massich commented Feb 26, 2019

Uh oh!

massich left a comment

Choose a reason for hiding this comment

Uh oh!

GaelVaroquaux commented Feb 26, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Feb 26, 2019

Uh oh!

massich commented Feb 27, 2019 via email

Uh oh!

Uh oh!