You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This notebook is a collection of short coding puzzles based on the internals of the Transformer. The puzzles are written in Python and can be done in this notebook. After completing these you will have a much better intutive sense of how a Transformer can compute certain logical operations.
Can we produce a Transformer that does basic elementary school addition?
i.e. given a string "19492+23919" can we produce the correct output?
Rules
Each exercise consists of a function with a argument seq and output seq. Like a transformer we cannot change length. Operations need to act on the entire sequence in parallel. There is a global indices which tells use the position in the sequence. If we want to do something different on certain positions we can use where like in Numpy or PyTorch. To run the seq we need to give it an initial input.
defeven_vals(seq=tokens):
"Keep even positions, set odd positions to -1"x=indices%2# Note that all operations broadcast so you can use scalars.returnwhere(x==0, seq, -1)
seq=even_vals()
# Give the initial input tokensseq.input([0,1,2,3,4])
The main operation you can use is "attention". You do this by defining a selector which forms a matrix based on key and query.
Once you have a selector, you can apply "attention" to sum over the grey positions. For example to compute cumulative such we run the following function.