CARVIEW |
Navigation Menu
-
Notifications
You must be signed in to change notification settings - Fork 5.8k
To distributed api #68123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
To distributed api #68123
Conversation
你的PR提交成功,感谢你对开源项目的贡献! |
dca47ad
to
fed2f25
Compare
58d3036
to
59b4e21
Compare
… to_distributed_api
… to_distributed_api
Sorry to inform you that f6f2e20's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
… to_distributed_api
… to_distributed_api
Sorry to inform you that d01b02e's CIs have passed for more than 7 days. To prevent PR conflicts, you need to re-run all CIs manually. |
… to_distributed_api
… to_distributed_api
… to_distributed_api
… to_distributed_api
6550863
to
bb9eae9
Compare
bb9eae9
to
19766c6
Compare
x2 = x[..., x.shape[-1] // 2 :] | ||
return paddle.concat([-x2, x1], axis=-1) # shape is the same as x | ||
|
||
def scale_dot_product_attention( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code of this function is basically the same as the code of the function in PIRScaleDotProductPattern
above, and rotate_half
, mlp
, etc. also have the same problem. complex networks should be a combination of basic networks, rather than a re implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
|
||
# Llama | ||
@register_pir_pattern | ||
class PIRMLPPattern(PIRBasePattern): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many variations of MLP
, such as having 2 or 3 linear layers, with bias or no bias, and the activation function may also be different, so the name of MLPPattern
needs to be carefully considered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, is it necessary to use the prefix "PIR"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given that pir as default feature for paddlepaddle, remove all prefix "PIR" here
batch_size = 4 | ||
seq_length = 1024 | ||
num_heads = 32 | ||
head_size = 64 | ||
hidden_size = num_heads * head_size | ||
intermediate_size = 4096 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is better to move all these configurations which required for each subclass to the parent class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shall we set the shape smaller without losing generality and reducing memory usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if value in matched_op_node_ids: | ||
result = {} | ||
need_to_append = False | ||
break |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this break
be removed, such as pattern=(matmul, add, matmul, add,)
and prog=(matmul, add, matmul, add, matmul, add, Relu, ...)
will it cause a matching interruption?
matched_ids = set() | ||
matched_op_node_ids = set() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's no difference between these two set, just keep one?
return new_attn_out | ||
|
||
|
||
def reshard_transpose_mlp_layer_input(layer, inputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using sequence parallel but not requiring transpose in MLP, is it better to call reshard_mlp_layer_input
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
return tuple(new_inputs) | ||
|
||
|
||
def transpose_reshard_mlp_layer_output(layer, inputs, outputs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using sequence parallel but not requiring transpose in MLP, is it better to call reshard_mlp_layer_output
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
e980854
to
5cff04a
Compare
5cff04a
to
3defd64
Compare
d16a33f
to
4216105
Compare
# # step_5: pattern recogincation | ||
DECODER_LAYER_NAME = 'decoder_layer' | ||
register_used_patterns(DECODER_LAYER_NAME) | ||
results = match_all_patterns(program) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logging can be added here and some key locations for easy debugging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will add logging in next PR. Thanks
It's better to have one more unit test of gpt3? |
I will try to add gpt3 test in the following PR. Thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
PR Category
User Experience
PR Types
New Features
Description
High-level distribute api
pcard-66975