hRPC and why we moved away from gRPC

So, if you're reading this, you probably need some background information. gRPC is a popular RPC system based on HTTP2 and Protobuf. hRPC is an RPC system we at Harmony are porting to from gRPC, built on HTTP1 and Protobuf as well. Harmony is a chat protocol that falls somewhere in between Matrix and Telegram functionalitywise with a ton of extra goodies (besides our in-progress E2EE draft, which is basically just a micro Matrix statewise implemented with Protobuf instead of JSON.)

(If you're reading this on the KDE planet, there's juicy Qt stuff well after we explain what the heck all this networking stuff is, don't worry :) )

gRPC: The Good

gRPC has substantial language support, and is widely available in distros. It's also extremely optimised, using substantial custom HTTP2 behaviour for minimum network transfer.

gRPC: The Bad

gRPC has a very, very big flaw for publically facing services: streams play awfully with reverse proxies like nginx, as they're essentially HTTP2 requests that aren't closed. This causes proxies to be like “hmmm this is a slow loris attack, time to yeet this stream.” For our homeserver at https://harmonyapp.io, this means we had to configure nginx to be ok with requests taking an entire hour. Any streams would always terminate at exactly 60 minutes. To be fair to gRPC, there's a dedicated HTTP2 streams thing being worked on that would allow reverse proxies like nginx to play nice with it, but unfortunately that's not the case now.

Besides that, gRPC's client libraries, while widely available, range from mediocre to [ censored ] awful. gRPC is a Google product that isn't Go, which means that “error handling” is not a word in its dictionary. This has really bad implications for the C++/Qt client, Challah. Essentially, if anything goes marginally wrong, the client just straight up aborts. There is no way for us to gracefully recover from any errors that originate from the gRPC library. This is terrible for the user experience, as we can't even show a “something is going wrong” page. This is one of the big reasons we're moving away from gRPC: we cannot have our only desktop client be crashing on anything slightly less than perfect network conditions.

That wouldn't be a problem, if making our own implementation of gRPC was easy. Unfortunately, it's not. Remember the part where I said it used low-level HTTP2 a lot? Yeah, that gets very complicated very fast.

Additionally, our web client, Tempest cannot do the said low-level HTTP2 stuff. This requires us to specify in the protocol documentation a place for servers to name a grpc-web proxy for web clients to use.

With all of these issues (for our usecase specifically; none of these would affect its usage with microservices which seem to be the main reason people use gRPC) in mind, we knew that using gRPC wouldn't cut it if we wanted something as polished as we hoped. And thus, we started hRPC.

The Goals

We decided quickly that hRPC should: – require minimal if no changes to our .proto files – be dead-simple to implement – be web-compatible (which basically means HTTP1/WebSockets)

The Implementation

First things first, we needed to write a protoc plugin. Thankfully, that was simple. We decided to use a hybrid approach: simple to generate languages like Go would be done using Go's text/template package to write templates, which could either be packed into the binary or loaded from external files on disks to facilitate third parties writing their own templates. Complex to generate languages like Qt/C++ would be done using dedicated functions in protoc-gen-hrpc. This was actually so simple that we decided to write another plugin, protoc-gen-hdocs which generates our online reference documentation from the .proto files. Our JS client doesn't need to make use of this; as the protobuf implementation in JavaScript is transport agnostic. Blusk, the other lead developer of our project, simply wrote a function that takes request information + the inputs and transforms it into the outputs. Likewise, our Rust client and SDK makes use of its own code generation instead of the protoc plugin. That leaves our Go server/client and our C++ client as being generated by protoc-gen-hrpc.

The Flaws

Not everything is rosy with hRPC. Due to using HTTP1 and the straightforward solution to networking, our implementation is nowhere near as slim on networking as gRPC which spends a lot of time and effort shedding bytes. Additionally, we're forgoing the existing gRPC ecosystem, requiring anyone that wants to implement the Harmony protocol to write their own codegen that works with our .protos.

Challah: gRPC

Challah, our Qt/C++ client, uses QtConcurrent approximately a heck tonne to handle sending gRPC requests and receiving data from streams without blocking the main thread. In short, every request uses a thread from a thread pool to send, and depending on its nature, either uses a callback, or will result in data coming into the “main” events stream, in which case we don't take a callback. For streams, we use two tools:

Events are amazing, and allow us to use a largely mutex-free design. What happens is that our stream-reading thread will read the events stream, and translate gRPC reads into Qt events, which are posted to parents of our object hierarchy, which then re-post the events to their children as necessary. The flow for a message looks like “client thread reads event, posts event to communities model, which posts event to its child channels model, which posts event to its child messages model, which then updates data.” Some of this is working around gRPC, but is mostly a sane concurrency and state management solution on its own, which will mostly be intact with the port to hRPC.

Challah: hRPC

Now that I'm the one authoring the client RPC library, I get to make it perfect as possible for Challah. That means it's written with Qt and uses its proper concurrency mechanisms. Fun stuff :). Besides shedding a runtime dependency, the port to Qt will also massively help with portability: gRPC C++ is a giant and clunky beast with a lot of vendored dependencies. Currently, our macOS build isn't working (though it compiles) due to SSL woes. Qt's networking stuff lacks those woes, and moving to them will mean that our macOS build will begin functioning. This also opens the room to an Android build, as protobuf library will be substantially easier to pack for Android than gRPC. Codewise, this means that we can port from abusing QtConcurrent thread pools to just using Qt's native networking types. All in all, that's pretty good.

I'm not sure how to end this blog post, so I'll just drop some links:

Maybe I'll write some more about how implementation of the codegen or the C++ client worked if I see that this post is somewhat popular or if someone requests it. Tschö.

Tags: #libre