CARVIEW |
Navigation Menu
-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
support binary buffers in comm messages #6110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
https://www.npmjs.org/package/utf8 ? not to reinvent the wheel ? |
Yes, if we decide binary messages are a good idea, I will get the encoding from a library rather than ship a copy from StackOverflow. |
return b''.join(buffers) | ||
|
||
|
||
def unserialize_binary_message(bmsg): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I googled both unserialize
and deserialize
, and it appears that there are a lot more people using the word deserialize
than unserialize
. On top of that, it seems that PHP is the biggest proponent for unserialize
, and it's probably not the worst general policy to choose the opposite of whatever PHP did.
Edit: I see that pyzmq uses serialize/unserialize, so it's probably more important to be consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the text 'unserialize' doesn't appear in pyzmq, what are you referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it was the IPython docs I was looking at: https://ipython.org/ipython-doc/dev/api/generated/IPython.kernel.zmq.session.html#IPython.kernel.zmq.session.Session.unserialize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since it's internal to IPython, I don't have a problem changing it. It's an internal / private API. I am totally behind the correct = (1-PHP did it)
logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on deserialize.
I am still not clear on the usage case for this. Won't the existence of a binary channel encourage people to start creating widgets and such that only work on a particular kernel. Forcing JSON in this case makes sure that all kernels can easily implement the kernel side of a comm/widget and reuse the frontend. |
There's nothing kernel specific about binary messages. Our wire (zmq) format already supports arbitrary binary data attached to every message; there's nothing Python-specific about it. |
But the binary stuff exists mainly to support the parallel computing stuff right. Do we use it outside that context? While the binary support itself is not language specific, I think it will tend to be used in language specific ways. I would still like to understand the usage cases where having binary comm messages is important. I suspect the idea is that it would be used for things like a spreadsheet widget where you want to send lots of data back and forth. Even in that context I want to push back a bit. To have such a widget work for all kernels, we have to make a decision about a a binary array format that all languages agree upon and that is also implemented in JavaScript. I am not at all convinced that is where we want to go if that is the main usage case. But if the main usage case is sending binary - non base64 encoded images - through comms, I could imagine that might be useful in certain cases. |
This came out of conversations with @mdboom in supporting a live comm-based matplotlib object. Michael noticed a nontrivial CPU drain when base64-encoding the images to send over comm messages. |
@minrk: how will this affect people implementing a frontend in another language besides javascript? |
Only data_pub and apply use binary data at the moment. The main use case I can think of is user-defined widgets with comm messages, sending data like images without having to b64-encode them. There are plenty of binary data formats that one could handle in javascript, images being the simplest and most useful (e.g. matplotlib nbagg). It's inherently not language-specific, because it must be understood at the very least in both javascript and the kernel. |
I think the answer to my question is that the wire format of the websocket messages changes here from JSON text to this new wire format. So people would have to implement the unserialize function like you did in javascript. Is that correct? |
Yes, this changes the websocket wire format (when binary messages are present) to be more consistent with the zmq wire format. |
|
||
|
||
Kernel.prototype._unserialize_msg = function (e, callback) { | ||
// unserialze a message and pass the unpacked message object to callback |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: "unserialize"
Ah, looking at the code, I see what you mean. There is no change needed if no messages are sent with binary buffers. It's only when receiving messages with binary buffers that you need to implement the new wire format. |
Thanks for a bit more background on this. I guess I am fine with us moving forward with this. |
|
||
|
||
|
||
Kernel.prototype._unserialize_binary_message = function(blob, callback) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deserialize
Overall the code looks good. Can you add some tests though? |
I have a few things I need to do on this, but various other things I have to get to first. I probably won't have time to come back to this for at least a few weeks. |
To add my +1 to this idea, the use case of sending array data to the clients without multiple encoding/decoding steps is a real one we're already encountering. This thread on the list (esp. subsequent messages) illustrate well cases where too many encode/decode steps will introduce significant latencies with very real impact on user experience, making certain kinds of analysis/visualization widgets impossible to implement. They would simply be too slow to be usable. Right now, projects that implement these kinds of widgets with high framerate image exploration tend to simply ship the entire dataset over to JS in one giant blob and hold it all in memory in JS. But that precludes the kind of kernel-side richer computation we can offer, since they are then forced to implement any analytics entirely client-side in JS. Being able to efficiently send binary array data to the clients will open lots of possibilities here. |
Need rebase. |
I just included the test and restricted it to slimer while we wait for phantom 2.0. |
Need rebase. |
requires text-encoding js polyfill, for now
in zmq.session
only runs on slimerjs for now
because signed ints for sizes is icky
text-encoding has been merged
rebased, and should be ready to go |
'widgets/js/init', | ||
], function(IPython, $, utils, comm, widgetmanager) { | ||
'./comm', | ||
'./serialize', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh ! Nice you can use relative path !
Ok, let's not block thin on wether to use Merging soon if no objections. |
Thanks, last quick read and merging. |
support binary buffers in comm messages
Thanks for this! |
The widget's underlying |
support binary buffers in comm messages
The binary message format used on websockets:
The sizes (nbufs and offsets) are serialized as 32b network-order unsigned integers (
struct.pack('!I')
).demo notebook
RFC @jasongrout