The Magic of Serializing Functions

Messaging is finicky.

There are many scenarios where it’s necessary to asynchronously send messages between two different layers. For example:

  • Between different threads or processes
  • Between different iframes or windows
  • Between a web page and a web-worker or service-worker
  • Between a client application and a server, via a web-socket

Basic Fire-and-Forget Serialized Messages

In most of these scenarios, the default mechanism is just to serialize a message down to a string, send it over to the other layer, and hope it arrived.

For example, in JavaScript, messaging between two different windows or frames looks something like this:

This is fine as a basic messaging primitive:

  • Serializing data to a string is thread-safe. We aren’t risking any window modifying the state of another.
  • Event-driven, asynchronous messaging allows for one window to send and receive messages without blocking the main thread.
  • We also have some basic security mechanisms that allow us to verify the sender and receiver of each message.

From a developer-experience perspective, this leaves a lot to be desired:

  • Messages are fire-and-forget. How do I know if my message arrived and was handled, or failed to get through or errored out?
  • Messages don’t have responses. I can just send another message back as a response, but it’s tricky to correlate that response to the original request. There is no “one request, one response” guarantee, like with an HTTP request.

The alternative: Request and Response

What I really want is a “request and response” style pattern:

With this pattern I can easy make a request to a different layer (e.g. a different window, or some server, or a different process) and then get a single response. That response serves two purposes:

  • It acknowledges that the original request was actually received and acknowledged
  • It gives an opportunity for the receiver to send back some response data that is relevant to the original request

As an example of how to design a protocol like this, rather than just blindly sending messages and responses:

  • For every message the sender sends, generate a unique “request id”
  • Store that request id in the sender’s memory or some other state store
  • Include the request id in the request message from the sender.
  • Include the same request id in the response message from the receiver
  • When the response arrives, the sender can then look up the request id, correlate it with the original request, and then call a success handler function for that request

For example, here’s how this looks in post-robot, a JavaScript library designed to make this kind of messaging easier on the web for communication between iframes and windows:

You can imagine how a similar interface could be applied for literally any system where two layers are able send serialized messages back and forth. The advantages are:

  • We can easily send and receive requests and responses without worrying about the sequence or timing of messages coming to-and-from each window.
  • We now have a 1:1 relationship between a request and response. For example, in the response handler above, we can be sure the user data we received correlates with the userID we passed in the original request.
  • We can handle errors (or time out the request after a delay) if the other window doesn’t — or can’t — respond.

This article is mainly about the messaging protocol itself; but security is also obviously crucial here. In each of these cases we want to verify:

  • Am I sending messages to the party I expect?
  • Am I receiving messages from the party I expect?

Different messaging protocols should care about different attributes to authorize messages. In the post-message case I mainly care about verifying the sending/receiving window/frame and the domain name to which I am sending and receiving messages. In other messaging protocols, like inter-process communication, I would care about verifying different attributes.

This is a huge step in the right direction, but it isn’t perfect yet. What could be improved?

  • The interface is still a little cumbersome; for example determining the window and/or domain I want to send messages to or receive messages from, and having to deal with global listeners and global event names.
  • Everything is still scoped in a very “global” way. Listeners are global to a window, and as such the event name for each listener must be totally unique. This makes it tricky to set up ad-hoc throw-away listeners for a specific use case.
  • I may also need to listen for messages from many different sources at once, for example I may need to communicate with many different iframes, or many other processes, or many web-socket clients. To do this, I have a lot of work to make sure listeners are correctly set up for each potential sender, then to juggle all incoming requests from all of those senders and make sure they are authorized to get the data or perform the action they are requesting.

So we’ve solved the “fire and forget” messaging problem, but we still have to put in some work to make this messaging interface as seamless as possible.

A solution: Serialized Functions

As it turns out, functions are the perfect pattern for “request and response” style interfaces:

  • I call a function and pass some arguments. This is like “making a request”
  • The function returns a value. This is like “sending a response”

So what if we actually just use functions as our developer-facing interface for messaging between different layers?

For example, what if instead of doing postRobot.on(), we just create a function? And instead of doing postRobot.send() we just call that function?

There are three things we have to consider here:

  • First, functions (at least in JavaScript) can be both synchronous or asynchronous. We’re going to have to mandate “asynchronous only” here, since any messaging behind the scenes can typically only be asynchronous. That’s easy enough in JavaScript: we can just mandate that a function will only ever return promises.
  • Second, it’s not enough just to stringify a function and send it over the wire to another window, since that will not help us call the original function back in the original window. We need a way for functions to actually send messages and receive responses. That would normally be tricky, but we’re in luck! We already did most of the work above to invent a messaging protocol to send requests and receive responses, so we can just re-use that!
  • Third, we don’t want to create any functions that can be arbitrarily called by anyone — especially layers we don’t trust. We want to share functions only with trusted layers, allowing only those specific layers to call those functions, and no others.

Essentially we want to do something like this:

  • The calling window calls a fake “serialized function” wrapper and passes in arguments. (This is a function that was previously passed over from the other window)
  • Calling that function sends a message to the receiver window. This message includes an identifier for the function, and the serialized arguments which should be passed into the function.
  • The receiver window looks up the original function based on the identifier, and deserializes the arguments.
  • The receiver window then calls the original function with those arguments, and that function then returns a value.
  • That return value is serialized, and sent in a response message back to the calling window.
  • The calling window deserializes the response, and resolves the promise with the deserialized return value.

Of course, for this to work, we still need some way to communicate between the windows initially, to “bootstrap” these functions. That is, we need to actually pass these serialized functions back and forth so they can be called in future. One way to do this bootstrapping is just by using regular message listeners:

Once we’ve done that, and we’ve received a function, the rest is like magic! The currentUser.logout() function can be directly called in the child frame, even though the original logout function only exists in the parent window:

Notice that we didn’t need to manually create any kind of listener for logout, and we also didn’t need to manually send a logout message from one window to another. We just called the logout function (which was transparently serialized and deserialized behind the scenes) and it automatically did all of the required messaging for us to transmit the function call and arguments back to the original window!

This example is available in post-robot for the window/iframe post-messaging case — but as you can imagine, the same pattern could easily be applied to any platform where serialized messages can be passed between different layers, including between web-socket clients, processes, and so on.

(See here for a simplified example of how this approach is actually implemented in post-robot, if you’re interested)

The net result is: any serialized message can include any function the developer wants to send or receive. Behind the scenes, those functions automatically trigger additional messages to call and pass arguments and return values back and forth between windows, without writing any code to manually send and receive messages.

It’s natural to worry about passing around functions to potentially untrusted windows. But my view is this actually makes messaging more secure, when implemented in the right way:

  • When you pass a function to a window, that function can only be called by the exact window and domain that the function was originally passed to. All you need to ask is “Am I passing this function to the window and domain I expect to call it in future?”. If not: don’t send it.
  • Sending small self-contained functions can actually limit potential misuse, since it is easier to make functions as specific and tightly-scoped as you like. For example, in the above example, the logout function can only be used to log out that one specific user. We didn’t need to accept an id parameter, since we already had the user’s id in closure scope when we created the logout function. So the function is much more specific than a generic, global event listener.

This model is also used in zoid to allow for iframes and popups which feel like React components. When you’re setting up a zoid component, you never have to set up an event listener, you just pass down functions as props, and they are natively available on the child frame or window:

React made the idea of “directly pass functions and callbacks into components” the gold-standard. Zoid and post-robot extend that into the world of cross-domain components.

Passing serialized functions provides the best possible developer experience for messaging between different layers which are not running on the same thread or don’t have access to the same state.

Rather than inventing a new interface for sending messages, it allows for functions to do what they’re good at: accepting a call, and asynchronously returning a response.

Give it a try!

works for PayPal, as a lead engineer in Checkout. Opinions expressed herein belong to him and not his employer. daniel@bluesuncorp.co.uk