CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
优化to_tensor函数中的bf16转换 #73050
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
优化to_tensor函数中的bf16转换 #73050
Conversation
modified: python/paddle/tensor/creation.py modified: test/dygraph_to_static/test_to_tensor.py
你的PR提交成功,感谢你对开源项目的贡献! |
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #73050 +/- ##
===========================================
Coverage ? 100.00%
===========================================
Files ? 1
Lines ? 10
Branches ? 0
===========================================
Hits ? 10
Misses ? 0
Partials ? 0 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- 这段代码并没有修改静态图分支,不需要添加动转静单测,单测添加到
test/legacy_test/test_eager_tensor.py
- 只是加了一个普通的函数跑不到,且添加
unittest.skipIf
的方式也是错的
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tensor = core.eager.Tensor(
value=data,
place=place,
persistable=False,
zero_copy=False,
name=None,
stop_gradient=stop_gradient,
)
# tensor = tensor.astype(dtype)
tensor = paddle.cast(tensor, dtype)
return tensor
您好,可否帮忙看一下,在使用转换后,x.grad会变为None
x = paddle.to_tensor( 1e6, dtype=paddle.bfloat16, stop_gradient=False)
print("x:", x)
y = x * x
y.backward()
print("x.grad:", x.grad)
修改后结果
x: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False,
-9.9942e+05)
x.grad: None
原本结果
x: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False,
-9.9942e+05)
x.grad: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False,
-1.9988e+06)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
已修改
modified: python/paddle/tensor/creation.py modified: test/dygraph_to_static/test_to_tensor.py
python/paddle/tensor/creation.py
Outdated
tensor.stop_gradient = stop_gradient | ||
return tensor | ||
else: | ||
data = _handle_np_dtype(data, dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
可以把_handle_np_dtype逻辑挪出来,移除掉原来不正确的bf16分支代码
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问这个是指将_handle_np_dtype函数删除,然后把代码放在else:中吗?因为_handle_np_dtype函数在前面也被调用过。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请问这个是指将_handle_np_dtype函数删除,然后把代码放在else:中吗?因为_handle_np_dtype函数在前面也被调用过。
这个逻辑很简单,可以都挪出来,直接处理掉原来错误的bf16分支
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个逻辑很简单,可以都挪出来,直接处理掉原来错误的bf16分支
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,收到,已修改
python/paddle/tensor/creation.py
Outdated
name=None, | ||
stop_gradient=stop_gradient, | ||
) | ||
tensor = tensor.detach().astype(dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样写呢:
tensor = core.eager.Tensor(
value=data,
place=place,
persistable=False,
zero_copy=False,
name=None,
stop_gradient=True,
)
tensor = tensor.astype('bfloat16')
tensor.stop_gradient = stop_gradient
return tensor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样好像不行,上一个版本是类似的做法。我个人理解是只能数据进行转换,不能带着gradient一起转换。如果不用detach,会导致grad为None。
tensor = core.eager.Tensor( value=data, place=place, persistable=False, zero_copy=False, name=None, stop_gradient=stop_gradient, ) # tensor = tensor.astype(dtype) tensor = paddle.cast(tensor, dtype) return tensor您好,可否帮忙看一下,在使用转换后,x.grad会变为None
x = paddle.to_tensor( 1e6, dtype=paddle.bfloat16, stop_gradient=False) print("x:", x) y = x * x y.backward() print("x.grad:", x.grad)修改后结果
x: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False, -9.9942e+05) x.grad: None原本结果
x: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False, -9.9942e+05) x.grad: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False, -1.9988e+06)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这样好像不行,上一个版本是类似的做法。我个人理解是只能数据进行转换,不能带着gradient一起转换。如果不用detach,会导致grad为None。
tensor = core.eager.Tensor( value=data, place=place, persistable=False, zero_copy=False, name=None, stop_gradient=stop_gradient, ) # tensor = tensor.astype(dtype) tensor = paddle.cast(tensor, dtype) return tensor您好,可否帮忙看一下,在使用转换后,x.grad会变为None
x = paddle.to_tensor( 1e6, dtype=paddle.bfloat16, stop_gradient=False) print("x:", x) y = x * x y.backward() print("x.grad:", x.grad)修改后结果
x: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False, -9.9942e+05) x.grad: None原本结果
x: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False, -9.9942e+05) x.grad: Tensor(shape=[], dtype=bfloat16, place=Place(cpu), stop_gradient=False, -1.9988e+06)
这样写是可以的,这是因为之前的写法,没有设置stop_gradient=True,使得第一个tensor与cast后的tensor建立了反向关系,梯度传导到第一个tensor了,而cast后的tensor的梯度就被清理掉了。
目前把第一个tensor的stop_gradient设置为True,就避免了这个问题。
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,收到,我测试了一下,确实没问题,感谢
modified: python/paddle/tensor/creation.py
modified: python/paddle/tensor/creation.py
python/paddle/tensor/creation.py
Outdated
# Windows default type is 'int32', while Linux/Mac is 'int64'. Unify they. | ||
if data.dtype in ['int32']: | ||
data = data.astype("int64") | ||
|
||
if dtype: | ||
data = _handle_np_dtype(data, dtype) | ||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里不用搞这么多判断吧,代码注意逻辑清晰,可读性强。
这里要么全统一用 convert_dtype(dtype)
判断,要么统一用dtype判断,不用在这里反复冗余判断。ndarray也没必要判断吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,收到,已按照下面的方式修改
python/paddle/tensor/creation.py
Outdated
tensor.stop_gradient = stop_gradient | ||
return tensor | ||
else: | ||
if convert_dtype(dtype) != convert_dtype(data.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if dtype and convert_dtype(dtype) != convert_dtype(data.dtype):
if convert_dtype(dtype) == 'uint16':
...
else:
data = data.astype(convert_dtype(dtype))
这样可以吗,这里的分支显得又多又乱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,收到,已修改,感谢
@@ -757,13 +742,35 @@ def _handle_np_dtype( | |||
if default_type in ['float16', 'float32'] | |||
else 'complex128' | |||
) | |||
data = _handle_np_dtype(data, default_type) | |||
if convert_dtype(default_type) != convert_dtype(data.dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里统一写成:
if convert_dtype(default_type) != convert_dtype(data.dtype):
dtype = default_type
然后交到下面的逻辑里去处理,代码更简洁
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
嗯,收到,已修改,感谢
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
代码注意下可读性方面
modified: python/paddle/tensor/creation.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* optimized bf16 convert in to_tensor modified: python/paddle/tensor/creation.py modified: test/dygraph_to_static/test_to_tensor.py * modified for grad modified: python/paddle/tensor/creation.py modified: test/dygraph_to_static/test_to_tensor.py * changed core.eager.Tensor para modified: python/paddle/tensor/creation.py * deleted _handle_np_dtype modified: python/paddle/tensor/creation.py * updated conditions modified: python/paddle/tensor/creation.py
PR Category
User Experience
PR Types
Improvements
Description
resolve #72484
需求:
测试代码
原本方式平均耗时为 40.4780 ms, 修改后平均耗时为 0.2655 ms。