SSR applications at scale

Hello!

Today I want to talk about SSR applications scaling.

There are two problems with SSR - React and Node.js, which are no doubt both advantages of this approach (shared codebase, great DX, easy to maintain for frontend devs, etc.)

Key problems

React disadvantages

Rendering a complex application on a server to HTML string takes quite a long time, 10ms to 100ms in the worst cases, and is a single heavy synchronous task (renderToString). And the new API for streaming rendering (renderToPipeableStream) doesn't solve the problem, more on that later. There are benchmarks, such as https://github.com/BuilderIO/framework-benchmarks#ssr-throughput-reqsecond, in which many frameworks outperform React by times.

Specifics of Node.js

Node.js is single threaded, and quite sensitive to high loads (although I don't have much to compare it with, I have no comparable experience with other platforms). Single thread means that any synchronous task makes application unresponsive at that time - event loop will be loaded, it will not be able to accept new requests and respond to already received ones. One of the side effects of a overloaded application is long responses to requests for metrics or health checks.

It also means that synchronous tasks will be executed one by one. Let's say the application received 20 requests, page rendering takes 50ms - so first request takes 50ms to respond and last one takes 1000ms, which is unacceptably slow (and in real application most of the response time is spent on third party API requests, here we purposely ignore that for simple calculations).

About latency

To maintain appropriate response time, we will need to horizontally scale our application by raising the number of application instances such that at the current RPS, the response time at the theoretical 95 percentile is as expected. If average rendering time is 50ms (take a bad scenario), and we want to respond in 300ms maximum, there must be no more than 4-6 RPS load per instance, which is very low, because we have to allocate enough resources for this instance as well.

About resources

Regarding resources. It seems that single-threaded node doesn't need more than 1 CPU (or 1000m - milliCPU -**** in Kubernetess terms), but there is also Garbage Collector which can work in separate threads - and it seems optimal to allocate 1100m per instance, then GC can work without affecting main thread performance.

This is where the other problem comes in - low utilization of available resources, since we have to leave the pods not overloaded, for the same reason - latency. This is very important in an era of resource scarcity and expensive hardware (current reality probably all over the world).

But can we allocate less than one CPU? Unfortunately, no, as this will directly start to degrade the response timings of the application. By allocating less than 1000m per sub in k8s, synchronous tasks will start to be affected by CPU throttling, an example of the problem:

allocate 400m CPU
start the task at 200ms CPU time
every 100ms the k8s will spend 40ms on the task
in total the task will execute in 440ms

A very good overview of CPU trottling can be found in this article - https://web.archive.org/web/20220317223152/https://amixr.io/blog/what-wed-do-to-save-from-the-well-known-k8s-incident/

Streaming rendering

What about streaming rendering?

If it's possible to redesign application architecture to deliver page in parts, or at least head contents at the beginning - that's great, it can give good boost to Time To First Byte and all paint metrics. But streams have their own overhead, renderToPipeableStream is at least 10-20% slower than renderToString - and for example 100ms task splitted into 5ms tasks still loads our applications, and suffers from trotling as well.