feat: optimize rpc calls by sbackend123 · Pull Request #5394 · ethersphere/bee

sbackend123 · 2026-03-12T14:08:56Z

Checklist

I have read the coding guide.
My change requires a documentation update, and I have done it.
I have added tests to cover my changes.
I have filled out the description and linked the related issues.

Description

Removed RPC patterns in the transaction flow. Added cache layer for BlockNumber rpc call.
Fixed lint issues in the new cache package and cleaned up minor lint findings in adjacent test/helper code.

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

#5388

Screenshots (if appropriate):

Highly likely hit ratio is low because of high load errors (EOF), which is fix by PR

Added cach layer Remove redundant calls.

gacevicljubisa · 2026-03-16T13:08:15Z

 		t.Helper()

-		var expSegments [][]byte
+		expSegments := make([][]byte, 0, len(exp))


These are fine but feel like they belong in a separate PR.

Yes, but linter was failing so I decided to fix everything. Next time will create separate PR

gacevicljubisa · 2026-03-16T13:40:07Z

Suggestion: Instead of a fixed TTL (blocktime*85/100, blocktime-2) + BlockNumber RPC, use a single HeaderByNumber(nil) call periodically to get both the block number and timestamp, then extrapolate between syncs with zero RPC calls.

How it works:

Every ~20 blocks, call HeaderByNumber(nil) (latest) — returns block number + timestamp in one RPC call. Store as anchor point.
Between syncs, estimate the current block from pure math: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime. No RPC calls at all.
Cache TTL: anchorTimestamp + (currentBlock - anchorBlock + 1) * blockTime - now.
?

gacevicljubisa · 2026-03-24T10:25:06Z

I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles.

If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here).

The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping.

@janos wdyt?

janos · 2026-03-24T11:44:25Z

I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles.

If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here).

The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping.

@janos wdyt?

Yes, given that block numbers are changing very frequently on 5 to 12s depending on the network (gnosis and sepolia), but not exactly the same period every time. It is a very good suggestion to calculate block number as you described to reduce even more the frequency of rpc call to get the block number and estimate the block time by HeaderByNumber. In that case even signleflight is not needed, as internally, block numbers will always be returned by its specific cache.

I would even go further and use the block time value that is calculated from the HeaderByNumber instead specifying it using options statically.

The consequence could be that it is required to get the block number as the node startup as both block number and block time are needed as known values. Maybe that is even good to do, and to exit the application in case that there are problems with the rpc endpoint when getting the block number on start. Just thinking.

sbackend123 · 2026-03-25T15:13:35Z

Suggestion: Instead of a fixed TTL (blocktime*85/100, blocktime-2) + BlockNumber RPC, use a single HeaderByNumber(nil) call periodically to get both the block number and timestamp, then extrapolate between syncs with zero RPC calls.

How it works:
* Every ~20 blocks, call `HeaderByNumber(nil)` (latest) — returns block number + timestamp in one RPC call. Store as anchor point.

* Between syncs, estimate the current block from pure math: `currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime`. No RPC calls at all.

* Cache TTL: `anchorTimestamp + (currentBlock - anchorBlock + 1) * blockTime - now`.
  ?

For me it sounds like overengineering a little bit from one side, and not so flexible implementations from another side (may be I miss smth?). I thought about whatif would like to cache another type of value? Than

I don't see how this cache meaningfully reduces RPC calls. The BlockNumber callers (postage listener, storage incentives, tx confirmation, API) run on independent staggered timers and rarely overlap within the same block window. With the TTL at 85% of block time, each caller almost always finds the cache expired by its next poll, resulting in a fresh RPC call anyway. The singleflight deduplication only helps when callers hit BlockNumber at the exact same instant, which is uncommon given the staggered schedules. On Gnosis mainnet the block time is only 5s, so the cache TTL would be ~4.25s — hardly enough to span multiple independent caller cycles.
If we do want to reduce BlockNumber RPC calls, a more effective approach would be to use a single HeaderByNumber(nil) call periodically (e.g. every ~20 blocks, or more) to get both the block number and timestamp, then extrapolate between syncs: currentBlock = anchorBlock + (now - anchorTimestamp) / blockTime (mentioned here).
The filterPendingTransactions refactor (eliminating redundant TransactionByHash calls in nextNonce) is a genuine improvement and worth keeping.
@janos wdyt?

Yes, given that block numbers are changing very frequently on 5 to 12s depending on the network (gnosis and sepolia), but not exactly the same period every time. It is a very good suggestion to calculate block number as you described to reduce even more the frequency of rpc call to get the block number and estimate the block time by HeaderByNumber. In that case even signleflight is not needed, as internally, block numbers will always be returned by its specific cache.

I would even go further and use the block time value that is calculated from the HeaderByNumber instead specifying it using options statically.

The consequence could be that it is required to get the block number as the node startup as both block number and block time are needed as known values. Maybe that is even good to do, and to exit the application in case that there are problems with the rpc endpoint when getting the block number on start. Just thinking.

I have couple of concerns:

Drift over time: block production is not perfectly uniform, so we can accumulate error between syncs, especially if the resync interval is relatively large.
Risk of overshooting: extrapolation may produce a block number that has not actually been produced yet
With this approach we resolve problem with block_number call only. If in future we would need to reduce some other rpc calls, we might need to add smth else.

I think, there can be other corner cases, which would be more difficult to catch, since debugging will become more complex.

janos

@sbackend123 thank you for addressing my comments. As far as they are concerned, all is fine.

Given that there is a light on some different approaches, and responses on how sensitive storage incentivization is to the block number precision, I am still not approving it, as there is a possibility to reduce the frequency of rpc call for block number even more.

gacevicljubisa · 2026-03-31T08:41:10Z

Drift over time

Block time variance on Gnosis is small (typically <1s around the 5s average). With a resync every ~20 blocks (~100s), the maximum accumulated error is ~1 block. Using floor() in the extrapolation ensures you never jump ahead.
You can also calculate average block time dynamically on each resync

Risk of overshooting

floor((now - anchorTimestamp) / blockTime) naturally rounds down, so you'll be at most 1 block behind

Only solves block_number

true, but also, using only caching layer, I am not sure how much can we achive with other rpc calls

akrem-chabchoub

There are some git conflicts that needs to be resolved

gacevicljubisa · 2026-04-07T08:53:13Z

 			chunks             = make([]swarm.Chunk, 0, int(chunksPerPO)*2)
 			putter             = storer.ReservePutter()
 		)
+		chunks = make([]swarm.Chunk, 0, chunksPerPO*2)


it is a duplicate,there is already allocation on line 543

that's also why the linter is complaining

acud · 2026-04-07T14:03:24Z

+	"github.com/ethersphere/bee/v2/pkg/transaction/wrapped/cache"
 )

+const defaultBlockSyncInterval = 10


if the value is always this one, i think it makes more sense to have the CLI flag default to be 10 and get rid of this const. it just complicates the configuration and adjustment (and gets rid of the necessity of using a sentinel value)

acud · 2026-04-07T14:15:32Z

+	c.mu.RLock()
+	defer c.mu.RUnlock()
+
+	if c.valid {


if this generalized cache object is already aware of the expiresAt time, i think that it can already be "smart enough" to know something has expired already. you can already determine here whether valid is still valid, and in fact you can do away with the field on the type altogether and just check whether now() >= expiresAt, since any subsequent usage of this value has to also compare now to the expiresAt time (otherwise you're blindly using that value).

also a note that generics increases the complexity here with no obvious place where it can be reused.

acud · 2026-04-08T12:21:10Z

+	}
+}
+
+func (c *ExpiringSingleFlightCache[T]) Peek() (T, time.Time, bool) {


this method does not need to be exported apart from for unit tests. can we not have it tested by just checking the callbacks?

Yes, we can

acud · 2026-04-08T12:25:46Z

+}
+
+func (c *ExpiringSingleFlightCache[T]) PeekOrLoad(ctx context.Context, now time.Time, canReuse ReuseEvaluator[T], loader Loader[T]) (T, error) {
+	if value, expiresAt, ok := c.Peek(); ok {


i am not 100% sure that Peek is needed. since the cache is expiring, and it already has a notion of expiration time, then why do you need to use both peer and pass now (passed by the caller) back to the caller using the data they have already provided just in order to check whether now() >= expirationTime?

would it not be easier to inline the mutex locking and the now>expirationTime check?

Maybe I’m going too much into the details, but PeekOrLoad is already not the most trivial function, and I would keep Peek at least to avoid expanding PeekOrLoad(). Especially since there is also Set, which works with the mutex as well, so we end up with three methods, each responsible for a specific task. Does it makes sense for you?

Yes I agree it is not trivial and there's already a bit too much indirection for what it is supposed to do. I just find it odd that the caller passes now to PeekOrLoad... just to end up getting it back in the callback (plus shadowing) and do the relevance check in the callback with:

if now.Before(expiresAt) { return true, expiresAt }

So why pass now at all? you can have it cleaner with just passing c.value and c.expiresAt into the callback and tell the callback to do the check and call it a day. In both cases Peek is not needed.

Ok, simplified

acud · 2026-04-14T04:51:18Z

+				number:    header.Number.Uint64(),
+				timestamp: time.Unix(int64(header.Time), 0).UTC(),
+			}
+			return anchor, b.nextExpectedBlockTime(anchor, 0), nil


i'm not sure i understand this very well. intuitively i'm interpreting this as this fetched block value will expire in the next upcoming block tick, but the whole change is about "caching" (or trusting the guessing of block numbers) in chunks (i.e. expiration date is more like a next sync date).
also, the passing of 0 here as a time value isn't clear - why?

After brief discussion decided to remove expiresAt, so nextExpectedBlockTime is not needed anymore and we can simplify implementation

gacevicljubisa · 2026-04-14T11:06:43Z

+		minimumGasTipCap:  int64(minimumGasTipCap),
+		blockTime:         blockTime,
+		metrics:           newMetrics(),
+		blockSyncInterval: blockSyncInterval,


We should prevent blockSyncInterval to be 0. Maybe to set it to 1 in that case?
In pkg/transaction/wrapped/wrapped.go#L135 division by zero can happen

acud

LGTM. As general feedback - I'd find it helpful to have a bit more comments in the code as it is just making the reviews easier to do. They don't have to be long, but the general direction of what the code does or should do (method level documentation) is generally good and gives the reader a bit more info while going through the code.

akrem-chabchoub · 2026-04-15T08:27:46Z

 	optionNameRedistributionAddress        = "redistribution-address"
 	optionNameStakingAddress               = "staking-address"
 	optionNameBlockTime                    = "block-time"
+	optionNameBlockSyncInterval            = "block-sync-interval"


Please update the packaging folder with the new flag (in docker-compose,...)

# Conflicts: # packaging/docker/docker-compose.yml # packaging/docker/env

sbackend123 changed the title ~~RPC calls optimisation~~ Feat: RPC calls optimisation Mar 12, 2026

sbackend123 force-pushed the feat/rpc-calls-optimisation branch from f8ca4e2 to e55ac11 Compare March 13, 2026 07:54

sbackend123 added 4 commits March 13, 2026 08:58

feat: rpc calls optimisation

ad48778

Added cach layer Remove redundant calls.

Merge branch 'master' into feat/rpc-calls-optimisation

1da63cc

fix: linter issues

68ecefb

Merge branch 'master' into feat/rpc-calls-optimisation

04f6520

sbackend123 force-pushed the feat/rpc-calls-optimisation branch from e55ac11 to 04f6520 Compare March 13, 2026 08:00

sbackend123 changed the title ~~Feat: RPC calls optimisation~~ feat: optimize rpc calls Mar 13, 2026

fix: enable cache metrics

1ff4f75

sbackend123 marked this pull request as ready for review March 16, 2026 12:31

sbackend123 requested review from gacevicljubisa and janos March 16, 2026 12:32

gacevicljubisa reviewed Mar 16, 2026

View reviewed changes

Comment thread cmd/bee/cmd/deploy.go Outdated

gacevicljubisa reviewed Mar 16, 2026

View reviewed changes

Comment thread pkg/node/chain.go Outdated

gacevicljubisa reviewed Mar 16, 2026

View reviewed changes

Comment thread pkg/transaction/wrapped/cache/metrics.go Outdated

gacevicljubisa reviewed Mar 16, 2026

View reviewed changes

Comment thread pkg/transaction/wrapped/cache/cache.go Outdated

sbackend123 added 2 commits March 17, 2026 14:44

fix: review issues

095f4af

fix: dead code

e3b1aa5

janos reviewed Mar 20, 2026

View reviewed changes

sbackend123 added 3 commits March 25, 2026 16:22

fix: update cache implementation

f8e3049

fix: wrapped backend test

e6945ef

fix: make linter happy

a9fd888

janos reviewed Mar 30, 2026

View reviewed changes

akrem-chabchoub requested changes Apr 3, 2026

View reviewed changes

fix: block sync interval flag

a1e3f83

gacevicljubisa reviewed Apr 7, 2026

View reviewed changes

Comment thread pkg/transaction/transaction.go Outdated

acud reviewed Apr 7, 2026

View reviewed changes

sbackend123 added 3 commits April 8, 2026 12:18

fix: review issues

bf4a030

fix: return dep

54ee6af

fix: remove extra metric test

ba08feb

acud reviewed Apr 8, 2026

View reviewed changes

sbackend123 added 3 commits April 12, 2026 21:58

fix: simplify cache

2a11aa6

fix: remove comments for linter

e4befcf

fix: metric naming

2e40f56

sbackend123 requested review from akrem-chabchoub and janos April 12, 2026 21:17

acud reviewed Apr 14, 2026

View reviewed changes

sbackend123 requested review from acud and gacevicljubisa April 14, 2026 10:34

gacevicljubisa requested changes Apr 14, 2026

View reviewed changes

sbackend123 added 2 commits April 14, 2026 14:52

fix: remove expiresAt because there is no meaningful usage

c7bd723

fix: prevent dividing to zero

76a6e1d

sbackend123 requested a review from gacevicljubisa April 14, 2026 13:21

fix: clean up

ba744e9

acud approved these changes Apr 15, 2026

View reviewed changes

gacevicljubisa approved these changes Apr 15, 2026

View reviewed changes

akrem-chabchoub requested changes Apr 15, 2026

View reviewed changes

sbackend123 added 2 commits April 15, 2026 10:50

fix: add env vars

a858f42

fix: add new flag to all manifests

433f3ff

akrem-chabchoub approved these changes Apr 15, 2026

View reviewed changes

Merge branch 'master' into feat/rpc-calls-optimisation

083da1f

# Conflicts: # packaging/docker/docker-compose.yml # packaging/docker/env

sbackend123 merged commit 53aa35e into master Apr 15, 2026
16 checks passed

sbackend123 deleted the feat/rpc-calls-optimisation branch April 15, 2026 10:34

Conversation

sbackend123 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checklist

Description

Open API Spec Version Changes (if applicable)

Motivation and Context (Optional)

Related Issue (Optional)

Screenshots (if appropriate):

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gacevicljubisa commented Mar 16, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

gacevicljubisa commented Mar 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janos commented Mar 24, 2026

Uh oh!

sbackend123 commented Mar 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

janos left a comment

Choose a reason for hiding this comment

Uh oh!

gacevicljubisa commented Mar 31, 2026

Uh oh!

akrem-chabchoub left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

acud left a comment

Choose a reason for hiding this comment

sbackend123 commented Mar 12, 2026 •

edited

Loading

gacevicljubisa commented Mar 24, 2026 •

edited

Loading

sbackend123 commented Mar 25, 2026 •

edited

Loading