Skip to content

Fully switch to async KVStore persistence#919

Open
tnull wants to merge 18 commits into
lightningdevkit:mainfrom
tnull:2026-06-async-kvstore-persistence
Open

Fully switch to async KVStore persistence#919
tnull wants to merge 18 commits into
lightningdevkit:mainfrom
tnull:2026-06-async-kvstore-persistence

Conversation

@tnull
Copy link
Copy Markdown
Collaborator

@tnull tnull commented Jun 3, 2026

We planned to make the full switch to the async KVStore for a while. Here we do just that: first step-by-step moving the remaining dependencies over to use KVStore, then finally dropping KVStoreSync and all implementations.

Depends on lightningdevkit/rust-lightning#4658 (will drop the DROPME commit once that's merged).

@ldk-reviews-bot
Copy link
Copy Markdown

ldk-reviews-bot commented Jun 3, 2026

👋 Thanks for assigning @joostjager as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

@tnull tnull marked this pull request as draft June 3, 2026 11:00
@tnull tnull force-pushed the 2026-06-async-kvstore-persistence branch 2 times, most recently from 42c16e2 to 4371f6d Compare June 3, 2026 11:25
@tnull tnull mentioned this pull request Jun 3, 2026
7 tasks
@tnull tnull force-pushed the 2026-06-async-kvstore-persistence branch 2 times, most recently from 05ebc28 to 911c169 Compare June 3, 2026 12:15
@tnull tnull added this to the 0.8 milestone Jun 3, 2026
@tnull tnull force-pushed the 2026-06-async-kvstore-persistence branch 3 times, most recently from 39fbf19 to bad8d52 Compare June 3, 2026 12:21
tnull added 4 commits June 3, 2026 14:22
Use a temporary rust-lightning fork revision that exposes async
migratable KV-store support. This lets ldk-node migrate filesystem
stores without reimplementing LDK's migration logic locally.

Co-Authored-By: HAL 9000
In lightningdevkit/vss-server#101 we dropped the
Java version of `vss-server` and moved the Rust version to the repo
root. Here we account for the updated location in our integration tests.
Read and write BDK wallet state through async KVStore helpers while
keeping the current WalletPersister entry points bridged through the
node runtime. This reduces the wallet persistence surface that still
depends on KVStoreSync.

Co-Authored-By: HAL 9000
Static invoice persistence already runs from async handlers, so use
KVStore directly instead of routing those reads and writes through
the blocking KVStoreSync trait.

Co-Authored-By: HAL 9000
@tnull tnull force-pushed the 2026-06-async-kvstore-persistence branch 2 times, most recently from 1b4d24f to c62a503 Compare June 3, 2026 12:28
@tnull tnull marked this pull request as ready for review June 3, 2026 12:28
@tnull tnull requested a review from joostjager June 3, 2026 12:28
Copy link
Copy Markdown
Contributor

@joostjager joostjager left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall happy with this simplification. Big changeset though, even though not all that much is happening. Risk nonetheless.

Bigger open question for me is what the latest is on runtime abstraction and how that is going to work in combination with potential wasm support.

Also, are there sync public api calls that can now be made async such as export_pathfinding_scores?

},
)
.await
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like an unnecessary style change?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right. I have to admit I like the changed version better (which is why I left it in), but now added a fixup to keep it minimally invasive.

Comment thread src/peer_store.rs
locked_peers.insert(peer_info.node_id, peer_info);
PeerStoreSerWrapper(&locked_peers).encode()
};
self.persist_peers(data).await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it safe to have already dropped the lock? If two add_peer calls run interleaved, outdated state may be written?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, would you prefer we take a mutation guard as we do in DataStore in instances like this?

Comment thread src/event.rs
!pending_channel.is_outbound
&& self.peer_store.get_peer(&counterparty_node_id).is_none()
})
.and_then(|_| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this full style rewrite necessary?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, a refactor was somewhat necessary anyways as we needed to move out the add_peer call, and it's way cleaner to do it this way as otherwise we'd need to add at least 2 more else clauses with None anyways.

Comment thread src/data_store.rs
L::Target: LdkLogger,
{
objects: Mutex<HashMap<SO::Id, SO>>,
mutation_lock: tokio::sync::Mutex<()>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me remember why we have a Runtime abstraction, but also reference tokio directly?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's somewhat orthogonal, right now Runtime is an abstraction for the Runtime that ensures we do the right thing (detect outer runtimes, always wrap block_on in block_in_place etc). This is simply using the mutex type.

Comment thread src/data_store.rs
let _guard = self.mutation_lock.lock().await;

self.persist(&object)?;
self.persist(&object).await?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that this no longer holds self.objects, doesn't that, isn't this creating a window where readers get interleaved in an undesirable way?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why we introduced the mutation_lock for the write paths. Or maybe I'm misunderstanding the question?

Comment thread src/io/vss_store.rs Outdated
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No await here? Can that go undetected? Perhaps elsewhere too.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Fixed.

Comment thread src/io/test_utils.rs
signer_provider: &'a test_utils::TestKeysInterface,
}

impl<K: KVStore + Sync> TestMonitorUpdatePersister<'_, K> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message says dropping the trait, but this looks more substantial?

Comment thread src/io/test_utils.rs
pub(crate) fn create_persister<'a, K: KVStoreSync + Sync>(
store: &'a K, chanmon_cfg: &'a TestChanMonCfg, max_pending_updates: u64,
pub(crate) fn create_persister<'a, K: KVStore + Sync>(
store: &'a K, chanmon_cfg: &'a TestChanMonCfg, _max_pending_updates: u64,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't max pending updates exercised in a test?

Comment thread src/io/vss_store.rs
);

let runtime_handle = internal_runtime.handle();
let schema_version = tokio::task::block_in_place(|| {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't it a problem that setup can't fail fast here anymore?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We fail fast elsewhere via setup_schema_version.

Comment thread src/io/vss_store.rs
)?;

let store_key = self.build_obfuscated_key(&primary_namespace, &secondary_namespace, &key);
let schema_version = match self.schema_version().await {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This repetitive code doesn't look great. Maybe an explicit setup call is better?

tnull added 4 commits June 3, 2026 16:42
Persist peer store updates through async KVStore operations. The
synchronous node APIs keep bridging at their runtime boundary while
async event handling awaits peer persistence directly.

Co-Authored-By: HAL 9000
Persist DataStore mutations through async KVStore operations while
keeping the existing synchronous APIs bridged through the node
runtime. Async event handling now awaits payment store writes
directly.

Co-Authored-By: HAL 9000
Persist node metric updates through async KVStore writes and await
them from the chain, gossip, and scoring tasks. This removes the
remaining blocking metrics writer while keeping the helper name
stable.

Co-Authored-By: HAL 9000
@tnull tnull force-pushed the 2026-06-async-kvstore-persistence branch from c62a503 to 290b0a5 Compare June 3, 2026 15:09
tnull added 4 commits June 3, 2026 17:24
Avoid introducing a temporary macro when moving node metrics persistence onto async KV storage.

Co-Authored-By: HAL 9000
Persist the on-chain wallet through BDK's AsyncWalletPersister so
wallet state writes use the async KVStore path. Existing synchronous
wallet APIs keep bridging through the node runtime until their
callers are made async.

Co-Authored-By: HAL 9000
Open filesystem stores through the async LDK migration helper so
v1-to-v2 store migration no longer depends on the blocking KVStoreSync
migration path.

Co-Authored-By: HAL 9000
Move the existing in-memory test store into a shared module without
changing its behavior. This lets integration tests reuse it while
keeping the later async TestSyncStore change separate.

Co-Authored-By: HAL 9000
tnull added 6 commits June 3, 2026 17:36
Keep the shared test store move as a pure code move by restoring the original comments and spacing.

Co-Authored-By: HAL 9000
Exercise async KVStore operations in TestSyncStore and filesystem
migration tests while keeping the temporary sync comparison path until
the final KVStoreSync removal. Also route pathfinding score export
through async KVStore reads.

Co-Authored-By: HAL 9000
Drop the remaining synchronous KV store trait bounds and
implementations. After the preceding migrations, custom stores only
need to provide async KVStore persistence.

Co-Authored-By: HAL 9000
Compile the VSS persistence tests after the shared KV store helper moved to async persistence.

Co-Authored-By: HAL 9000
Resolve the VSS schema through async setup on the caller runtime so
VSS stores no longer create or retain a Tokio runtime.

NodeBuilder preserves fail-fast setup while the VSS builder stays
synchronous.

Co-Authored-By: HAL 9000
Construct PostgreSQL storage on the caller runtime now that the store
only exposes async KVStore operations. This removes the store-owned
runtime and its shutdown path.

Co-Authored-By: HAL 9000
@tnull tnull force-pushed the 2026-06-async-kvstore-persistence branch from 290b0a5 to 9ad8f2d Compare June 3, 2026 15:42
Copy link
Copy Markdown
Contributor

@benthecarman benthecarman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the vss ci job hung for 2 hours hopefully a transient issue

Comment thread src/peer_store.rs
self.persist_peers(&*locked_peers)
pub(crate) async fn add_peer(&self, peer_info: PeerInfo) -> Result<(), Error> {
let data = {
let mut locked_peers = self.peers.write().expect("lock");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we use an async lock for peers now?

Comment thread src/io/utils.rs
let data = locked_node_metrics.encode();
KVStoreSync::write(
let data = {
let mut locked_node_metrics = node_metrics.write().expect("lock");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be async lock?

@benthecarman
Copy link
Copy Markdown
Contributor

codex seems to agree on the locks

  • [P2] Serialize peer-store persistence updates — /home/ben/projects/ldk-node/src/peer_store.rs:43-52
    When two peer-store mutations overlap, this now releases the peer map lock before the async write completes, so an
    older snapshot can persist after a newer one. For example, concurrent add_peer calls can leave memory with both
    peers but the stored peer list with only the first one, causing reconnect peers to be missing after restart. Please
    serialize the mutation plus persistence, as DataStore now does with a separate async mutation lock.

  • [P2] Preserve node-metrics update ordering — /home/ben/projects/ldk-node/src/io/utils.rs:346-351
    If two node-metrics updates run concurrently, the lock is released before the async KV write, so an earlier encoded
    snapshot can finish after a later one and overwrite it. This can drop one of the timestamp fields on disk and make
    the node believe a sync/broadcast task has not run after restart. The previous code explicitly held the write lock
    through persistence to avoid this, so this path needs equivalent serialization for the async write.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants