Java's reactive connection pooling performance caveat
R2dbc is a great driver but it can sacrifice some performance.. for the sake of performance
R2DBC problem introduction
You are using a framework like micronaut-data with r2dbc and pooling enabled. Everything is set as per documentation. But you might see that sometimes queries are becoming very slow, as if resources were exhausted. But upon checking on a metrics dashboard you realize that there is plenty of cores idling.
What is happening?
The R2DBC driver
R2DBC is a reactive specification for a database driver.
https://github.com/r2dbc/r2dbc-spi
It provides database interaction in reactive manner, it is: you no longer hold a thread to wait for your query to complete. The thread can run other tasks, thus context switching per core can be limited. Reactive programming aims to maximize resources usage. In theory, application should need no more threads as there are cores available. Any non cpu-using task (like awaiting for I/O) is not blocking, and when your threads are used only for computing, why would you need more of them than the cores in your system?
R2DBC handles database interaction, but at the low-level it uses netty. Netty is a real "reactivity" enabler here.
https://github.com/netty/netty
Netty handles the networking at socket level. This is a library that reads incoming data in reactive manner, using system-provided async methods (For example, linux uses epoll).
We will focus on r2dbc implementation for postgres database. Problem may arise in other implementations too.
https://github.com/pgjdbc/r2dbc-postgresql
If you were to follow the links and read through the project carefully, you could wonder where the problem is. The configuration provided until now works fine, but can be improved.
Starting a new connection is a heavy process. We want to use connection pooling to reuse created connections, thus improve application responsiveness.
Luckily, r2dbc driver has a reactive connection pool, r2dbc-pool.
https://github.com/r2dbc/r2dbc-pool
And once we introduce the pool to improve the performance of your service, we can notice random slowness of some queries....
R2dbc pool problem background
Let's examine the way connection-pooling in r2dbc work and how it can impact performance.
At first, netty is focused on efficiency. To enable reactiveness, it uses event loops to assign threads "listening" to open connections (channels). On incoming data, thread is assigned a task to process it and then to wait for more (it waits in not-blocking manner, so can easily pick up work from other sockets too). As has been said previously, having threads focused on computing, we do not need more than available cores. But the amount of connections open to a server can be in hundreds. How does it compare? Each thread (netty's EventLoop) is assigned to many connections (netty's channels). Because waiting thread is not blocked, it can easily handle data from other channels, whenever it arrives. This model allows to squeeze extra performance from available hardware by not replacing threads occupying cpu cores.
Going back to r2dbc-pooling. We now have X number of database connections. There may be more connections on a system than available cores. Let's say we have 10 cores and 20 initialized connections in a pool. Our netty's EventLoopGroup has 10 EventLoops. To handle all connections, each EventLoop will have 2 connections (channels) assigned.
Now comes the question: what will happen if our 2 connections (channels), assigned to the same thread (EventLoop), were handling SQL queries at the same time? What if one of those queries was a data-heavy one, constantly sending data to a socket?
It seems that because both queries are sharing common resource, (thread, thus core) one of those queries might impact the other one. And this might be suboptimal in a system that has plenty of cores available.
Problem reproduction
I have prepared an example project that reproduces the problem.
https://github.com/PiotrDuz/pidu-r2dbc-pool-performance-test
ConnectionFactoryOptions.Builder optionsBuilder = builder()
.option(DRIVER, "pool")
.option(PROTOCOL, "postgresql") // driver identifier, PROTOCOL is delegated as DRIVER by the pool.
.option(HOST, postgres.getHost())
.option(PORT, postgres.getFirstMappedPort())
.option(USER, postgres.getUsername())
.option(PASSWORD, postgres.getPassword())
.option(DATABASE, postgres.getDatabaseName())
.option(INITIAL_SIZE, 4)
.option(MAX_SIZE, 4);
I am setting a connection pool with maxConnection = minConnections = 4. Default EventLoopGroup has N number of eventLoops, where N = number of cores. By default, colocation is enabled.
The test pre-populates table with hundred thousands of dummy rows. We run one heavy query (select all) and many small queries (select specific ID).
Execute default test: runWithColocation
Output is:
reactor-tcp-epoll-1 :t: 162 :q: select * from items
reactor-tcp-epoll-1 :t: 166 :q: select * from items where id = 161
reactor-tcp-epoll-1 :t: 167 :q: select * from items where id = 181
reactor-tcp-epoll-1 :t: 168 :q: select * from items where id = 31
reactor-tcp-epoll-1 :t: 169 :q: select * from items where id = 151
Every query runs on the same thread! This significantly impacts the performance. The average execution time is 185ms. All queries are basically run on a single core, while other cores are idling. This is showing not only an inefficiency in colocation, but also some problems with r2dbc-pool (or underlying reactor-pool). We didn't even get to the problem of having to split X sockets over Y threads, where Y < X. All sockets are handled by one thread!
Possible Solution
We can disable colocation and manually set amount of workers. We only need as many workers as there are max connections.
builder.option( PostgresqlConnectionFactoryProvider.LOOP_RESOURCES,
LoopResources.create("pref", -1, 4, true, false))
If you are using Micronaut, you can override R2dbcConnectionFactoryBean's method.
Now, when we run runWithoutColocation() test, the output is:
pref-epoll-3 :t: 19 :q: select * from items where id = 481
pref-epoll-4 :t: 19 :q: select * from items where id = 491
pref-epoll-2 :t: 20 :q: select * from items where id = 471
pref-epoll-1 :t: 179 :q: select * from items
Because colocation is disabled, we can see that every worker of EventLoopGroup is used. The work is delegated to less utilized ones. Overally, the queries are not blocked and average execution time is now: 15ms
Further Reading
So we know that there is some inefficiency in reactor-pool implementation. It is actually being worked on:
https://github.com/r2dbc/r2dbc-pool/issues/190#issuecomment-1520166774
A new pool warmup will be introduced, which is going to evenly spread work over colocated LoopResources:
https://github.com/reactor/reactor-pool/pull/171
I will try to test the solution once it is merged. Will there still be a chance for some queries to land on the same EventLoop?
For now, I would recommend using LoopResources with disabled colocation.