You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a port of Andrej Karpathy’s llm.c project (the CPU version). I
toyed around with it the day he released the initial version for a
couple of hours, but only continued with it on a train / plane trip I
had last weekend (<2024-04-20 Sat>). In particular we started from
commit a22c22b. In particular this means the tokenizer is not there
at the moment. I might add it one of these days.
Performance is ~comparable to the C version.
Note: the port was done in a bit of a hurry, so who knows what bugs
lurk compared to the original! :)
Differences to the C version
We have a very shallow abstraction of the raw pointer buffer
interface from C in the form of a MView[T] type (which is just a
ptr UncheckedArray[T] in Nim lang + a few goodies, notably a {}
accessor to do pointer arithmetic for a more ‘natural’ access to
another buffer.
We use Nim’s CT features to automatically assign the correct buffer
views for the *Tensor fields based on fieldPairs, their order
in the object and the params/act_sizes input.
Generally less pointer handling.
We use destructors for the GPT2 and DataLoader objects so that
we don’t have to free manually (and copying these is disallowed)
Instead of relying on OpenMP’s collapse primitive to fuse
multiple nested loops, we use a custom Nim CT based loop fusion macro,
see ./fuse_loops.nim. The issue is that because Nim converts for
loops into while statements, it doesn’t play nice with nested
loops for OpenMP. :) So I wrote a macro that wraps around for loops
and performs the loop fusion manually (note that it only works for
for i in 0 ..< X style loops (i.e. lower index 0 and using ..<).
Notes about Nim
We have to compile with --exceptions:quirky, because otherwise the
inserted Nim error checks break the OpenMP compilation. We could
disable checks locally in the code, but for this here it’s fine.