To Home page

Protocol Specification

In general, supposing your server, written in Javascript in the node.js environment, is assembling a response to an end user from services provided by other servers, is there any reasonable alternative to having them all talk JSON?

RPC calls usually use XDR, which is binary, or JSON, which is ascii. There are handy libraries to take care of all the boring sockets stuff for you for these cases.

Node.js tools to use RPC are available for RPC in JSON, but is there any easy way to access RPC calls in binary formats such as XDR, ASN.1 PER, or Avro Apache from node.js. Or to do something similar?

I am not asking what is the best way, or what can be done, because anything can be done with enough hard work, but rather what ways have standards to avoid reinventing the wheel, and libraries to do as much of the work as possible.

asked Nov 15 at 3:33

avsc implements Avro's RPC specification, you can use it to expose "type-safe" services over any transport (e.g. HTTP, TCP, WebSockets). The quickstart guide goes through a simple example that should help you get started.

Your avro is in javascript, and is, I assume, expected to run under node.js Apache provides avro to run as compiled C++, Java, Python, and C#. So it looks like an Avro service communicating in binary can communicate between any two environments - also avro can serialize to memory, which should make possible operating system and compiler independent hashing of data. Looks like a standardized solution. So this would seem to imply that Avro services adequately cover anyone who wants to talk to anyone in any environment.

This question turned out to be problematic and insoluble, because I want hashes defined independently of a particular computer and compiler. Which turns out to be hard. ASN.1 theoretically solves this problem, but in practice sucks horribly.

I wish I had a MsgPack that was not dynamic, but would guarantee predefined MsgFormats. Human generation of on the wire message formats case by case should also be considered. Let us see how cryptonote deals with this problem.

Looks like Avro gives me what I want: Predefined message formats.

To render them human readable output them as YAML, but reading YAML is problematic, because YAML could represent anything, including something hostile. So any message from outside has to be presented in Avro.

Cryptonote: https://github.com/cryptonotefoundation/cryptonote

https://cryptonotestarter.org/create-wallet.html

What is a protocol? Protocols wind up being defined by implementations, which people attempt, not very successfully, to reverse engineer. By trial and error they get their client to work with the existing server, and their server to work with the existing client, and so an ill defined protocol becomes over time even worse defined.

To address this problem, we have ASN.1, ASN.1 PER, and ASN.1 DER

ASN.1 is a language for describing data.

It is also a compiler for generating C code to process the data described. Some people complain that DER is too complex for anyone to get right.

The library https://github.com/vlm/asn1c supports canonical per, but does not entirely support Visual Studio.

On the other hand Avro apache seems to be a more complete solution, addressing the problem of protocol negotiation, RPC calls, and map reduce calls. I don't think the ASN.1 crowd have even thought of map reduce.

And there is the oldie but goodie, XDR, which is just the simplest possible way of avoiding compiler and endian dependancy. There is a nice XDR RPC library. ONC RPC is standardized by the Internet Engineering Task Force (IETF) as RFC 1831. It is based on the External Data Representation standard (XDR), known as RFC 4506 (see References). But, on the other hand, everyone using node.js seems to be doing RPC ins JSON.

Maybe this whole canonicalization question is just too hard, and need ad hoc solutions. Surely crypto libraries have serialization formats? Let me see what cryptonote does. No point in reinventing the wheel.

A type that is becoming increasingly common is unordered name value mapping, where a lot of names have defaults, and people keep adding new names.

Now if you send the names with the message every single time, it is a huge overhead. Which is what you wind up with with JSON based rpc calls.

So you want compiled message types, which wind up as a C++ object - and when someone wants more fields, he has a derived type of the C++ object, with the new fields having default values, allowing you to cast the object to have additional names and derived types as needed.

MsgPack is binary, and has modules for every language and environment you have ever heard of, and lots of languages you have never heard of, and library code to do anything you want, but it is all dynamically typed, which means it is not as efficient as PER.

It is basically YAML in binary, in that MsgPack messages correspond closely to YAML documents. Because it supports every language, you can use MsgPack to talk C++ to node.js, and YAML to talk to humans.

Dynamic typing means that a hostile evil program could send no end of messages in MsgPack that your program is not equipped to deal with. The great thing about PER is that anything that you are not expecting just gets magically rejected.

Both of them, Per and MsgPack totally punt on the issue of protocol negotiation, whereas Avro apache has support for protocol negotiation. It combines static and dynamic typing, but dynamic requires both sides to play nice. Evil program with surprise data structures will just have the connection shut down on it.

ASN.1 Per punts by requiring both sides to be in total agreement about the ASN.1 schema at compile time, MsgPack punts in the opposite way by being able to send any schema, including hostile schemas that are totally suprising, unexpected, and impossible for the recipient to deal with.

Maybe the way to go would be to agree on a dictionary mapping numbers to ASN.1 schemas at connection establishment, and if agreement, send schema number and binary blob in a MsgPack message, if no agreement, send dynamic message with a schema GUI, and the end user has to deal with it as best he can. If receiving a message with a schema guid, then precompiled code to generate the corresponding C++ object from the MsgPack message, and to generate a MsgPack message from the C++ object. On reception, execute the generic object->process_on_receive(&connection) job on the received object. Which typically will look up the table for the object to which this is a response. Everything in the table is a pointer to a common base type, but, on getting the object, executes a derived operation. For an expected message, which is anticipated to be a response to a previous message, the response should contain an opaque callback which was sent with the previous message, which is an index into a table of local objects whose base type can run time check that the message is of the type that they expected. (All schemas corresponding to data that can be received, or loaded from files, derive from a base type that reports their schema, so any time you deal with a message, it is mapped into a C++ object that will only be dealt with by code that at compile time expects that object.) Thus every transmissable object has Yaml serialization, MsgPack serialization, Asn.1 Per serialization, Yaml serialization, Yaml serialization inside MsgPack, and Asn.1 Per serialization inside MsgPack. If MsgPack or Asn.1 throws on receiving a message, terminate the connection. If it throws on sending a message, it is a bug. But I don't see that Avro Apache can talk to node.js, in part because it wisely refuses to be fully dynamic.

There is a review of more serializers than you can shake a stick at https://github.com/thekvs/cpp-serializers If you attempt to hand generate code for processing packets described by ASN.1, you will probably get it wrong and your head will explode.  Hence ASN.1 is much cursed and condemned. 

Don't do that.  Don't hand write code to generate or interpret ASN.1 data packets.  You are unlikely to succeed, and your code will have mystery bugs.

ASN.1 PER is ASN.1 data description compiled to produce efficiently compressed data packets that conform to a description in ASN.1, and efficiently decompresses them.

ASN.1 DER  that data description that generates data packets with a description of what the data packet means, so that if two programs sign the same ASN.1 DER data, they agree not only on the data, but on the meaning of that data, and if one program means the same thing as the other program, the signatures will come out the same.

Use it.  ASN.1, used right, is what is needed to rigorously define a protocol so that a client written by one person will work with a server written by another.

There is much loud cursing about the fact that the data on the wire is humanly incomprehensible, and that the code that converts it into program data structures is humanly incomprehensible.  No one should be looking at machine generated code, because machine generated code is notoriously incomprehensible.  The question then is, does the compiler work, and is the compiler usable.

There is an internet tool for compiling asn.1 specifications into C code.

If a program reads DER or BER data, the result is apt to be disastrous.  BER and DER can express an arbitrary data structure - and thus can crash the program receiving the data, probably causing it to execute transmitted data as code.

You can't depend on a DER or BER bit string being able to map back into any well-defined ASN.1 objectthat the program was designed to deal with.  

Incoming data should be parsed as the expected and bounded size data structure, thus we need something that can generate parsing code from a description of the data at compile time.  We need compile time descriptions of the data, not run time descriptions, because the program that uses the incoming data will unavoidably rely on compile time description of the data.

PER, however cannot receive unexpected data structures, because the expected data structure is specified at compile time, not run time.  Malicious or faulty data will generate an error, not a crash.

Thus all data should be received as PER or by a format with the properties of PER. 

These documents are licensed under the Creative Commons Attribution-Share Alike 3.0 License