CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 26.1k
[MRG+1] fused types in isotonic_regression #9106
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I implemented and tested dtype preservation for IsotonicRegression as well. I followed the convention that X should drive the dtype, and checked that Ping @Henley13 for comments :) |
further ping @lesteve, @glemaitre, @raghavrv, @TomDLT just in case someone bites :P I am pretty sure I forgot things here, because I failed to participate to all of the discussions on 32bit support over the week. |
@@ -437,3 +438,15 @@ def test_isotonic_copy_before_fit(): | |||
# https://github.com/scikit-learn/scikit-learn/issues/6628 | |||
ir = IsotonicRegression() | |||
copy.copy(ir) | |||
|
|||
|
|||
def test_isotonic_dtype(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this kind of test be here or in common tests / estimator tags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably there, but not possible without estimator tags, right? preserves_precision
? Have you run it on all_estimators
to see the status? We could post an issue with a check list.
test failures were genuine: one of the tests is calling predict without calling fit. Instead of changing the test I made the model support this behavior (so it works the same way as on master) |
sklearn/isotonic.py
Outdated
@@ -375,6 +378,9 @@ def transform(self, T): | |||
The transformed data | |||
""" | |||
T = as_float_array(T) | |||
if hasattr(self, '_dtype'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this required? Is this faster?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically a way to check if there is a saved X. I could replace it with
if hasattr(self, '_necessary_X_'):
T = T.astype(self._necessary_X_.dtype, copy=False)
I could check if it makes a difference in terms of speed but it's probably minimal (just attribute accesses, right?)
sklearn/isotonic.py
Outdated
else: | ||
sample_weight = np.array(sample_weight[order], dtype=np.float64) | ||
sample_weight = np.array(sample_weight[order]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make sure that sample_weight
has the same dtype than y
?
@@ -271,6 +272,7 @@ def _build_y(self, X, y, sample_weight, trim_duplicates=True): | |||
check_consistent_length(X, y, sample_weight) | |||
X, y = [check_array(x, ensure_2d=False) for x in [X, y]] | |||
|
|||
X = as_float_array(X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will trigger a copy, and is rather redundant with check_array
.
I would prefer:
X = check_array(X, dtype=[np.float64, np.float32])
y = check_array(y, dtype=X.dtype, ensure_2d=False)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomDLT this has different behavior than as_float_array
is the input is int32 (and maybe others too?)
import numpy as np
from sklearn.utils import as_float_array, check_array
for dtype in (np.int32, np.int64, np.uint32, np.uint64, np.float32, np.float64):
print('input dtype\t', dtype.__name__)
x = np.arange(5).astype(dtype)
x = as_float_array(x)
print('as_float_array\t', x.dtype)
x = np.arange(5).astype(dtype)
x = check_array(x, dtype=[np.float64, np.float32], ensure_2d=False)
print('check_array\t', x.dtype)
print()
outputs
input dtype int32
as_float_array float32
check_array float64
input dtype int64
as_float_array float64
check_array float64
input dtype uint32
as_float_array float64
check_array float64
input dtype uint64
as_float_array float64
check_array float64
input dtype float32
as_float_array float32
check_array float32
input dtype float64
as_float_array float64
check_array float64
Has there been a decision in the other 32-bit prs what to do in this situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in logistic regression it seems like the check_array way is used:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this has different behavior than as_float_array is the input is int32
Fair enough, just make sure that it does not copy more than necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed the redundant call to check_array on X and am now just calling it on y. Still, I think the behavior on int inputs should be consistent if possible over the codebase.
@Henley13 have you already discussed this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Failing tests on travis seem to suggest that my assumptions here fail on some configs; in particular this:
input dtype int32
as_float_array float32
check_array float64
thanks @agramfort ! my memory might fail but let me know if I can help |
@vene can you replicate the travis failure? It seems I can't |
I was able to reproduce in a clean python 3.5.6 installed with pyenv, with The problem is caused by the call to Basically, at this line, |
# file check.py
import numpy as np
import scipy
from scipy import interpolate
print(np.__version__)
print(scipy.__version__)
X = np.random.randn(5).astype(np.float32)
y = np.random.randn(5).astype(np.float32)
T = np.random.randn(5).astype(np.float32)
f_ = scipy.interpolate.interp1d(X, y, kind='linear', bounds_error=False)
out = f_(T)
print(out.dtype) output
|
I think there are two solutions
EDIT: pushed a quick fix for solution (1) |
sorry, i'm rusty
this is good to go from my end. Should I be worried that azure and appveyor failed? it seems unrelated... |
|
The failure is when calling def _make_unique(np.ndarray[dtype=floating] X,
np.ndarray[dtype=floating] y,
np.ndarray[dtype=floating] sample_weights): Could this be because the fused type specializations are tied, i.e., either |
yes it's very likely. Can you fix? |
Hopefully! I'm trying to reproduce on windows, but it might take a bit longer for technical reasons. |
I was able to reproduce. Will find a fix tomorrow. |
FWIW I can also reproduce on my laptop. |
Thanks @albertcthomas. It was a legitimate bug; per new failing test in previous commit 6f2b6d1. Should be fixed now, let's see what ci says. |
thanks @vene |
thanks @agramfort !! |
* ENH fused types in isotonic_regression * make X drive computation dtype in IsotonicRegression * preserve current behaviour if transform w/o fit * thoroughly check sample weights; avoid storing dtype explicitly * consistent testing and behavior * misc * update what's new * fix for interp1d upcast on old scipy * flake8 remove blank line sorry, i'm rusty * add failing test * FIX dtype bug in _make_unique
This reverts commit a29db54.
This reverts commit a29db54.
* ENH fused types in isotonic_regression * make X drive computation dtype in IsotonicRegression * preserve current behaviour if transform w/o fit * thoroughly check sample weights; avoid storing dtype explicitly * consistent testing and behavior * misc * update what's new * fix for interp1d upcast on old scipy * flake8 remove blank line sorry, i'm rusty * add failing test * FIX dtype bug in _make_unique
Reference Issue #8769 -ish?
What does this implement/fix? Explain your changes.
isotonic regression should preserve 32bit dtypes when possible.
Any other comments?