Fix CloudKit data loss on delete-then-reinsert#421
Conversation
332871e to
1f5150a
Compare
1f5150a to
bdd291d
Compare
|
I've had a few rounds of refinement to make this a cleaner and more correct fix. |
| .update { | ||
| $0._isDeleted = false | ||
| $0.userModificationTime = $currentTime() | ||
| $0._lastKnownServerRecordAllFields = #bind(nil) |
There was a problem hiding this comment.
One note here: clearing _lastKnownServerRecordAllFields may be too blunt an instrument to get the desired result of fresh timestamps for every field. It does guarantee that the next set of changes sent to the server will have fresh timestamps for all fields, but I believe also introduces a window of time (after reinsert but before successful submission of changes) where concurrent modifications from another device could be fetched and applied with no delta-filtering baseline, potentially overwriting locally-reinserted values.
A more direct solution may call for a separate field on SyncMetadata, something like _isReinserted: Bool. Or better yet: roll up _isDeleted and _isReinserted as cases in a new optional enum field PendingStatus. The status being reinserted could then be used to control outbound change inclusion behavior (include changes for all fields, each modified at the row's userModificationTime) and inbound filtering behavior (only take server field value if its modified time is greater than or equal to local row's userModificationTime). I'll wait for feedback before pursuing this in case there may be other options.
There was a problem hiding this comment.
Update: I went with this approach laid out here in latest revision 08adaa0. PR description updated.
|
This sounds like a pretty critical issue - easy to trigger and results in data loss |
|
Any updates on this? I'm pinned against this commit at the moment 😬 |
|
I haven't received feedback on this approach from Brandon or Stephen, but I know it's on their radar. This branch is overdue for a rebase which I'll do now. @nagra Since this change modifies For example, if you’ve already pinned to this and distributed widely via TestFlight / App Store, but the final version of this fix ends up not needing the schema change, then in order to adopt the official fix you would have to spin up a fork + custom patch to ensure that |
08adaa0 to
399663a
Compare
| try #sql( | ||
| """ | ||
| ALTER TABLE "\(raw: .sqliteDataCloudKitSchemaName)_metadata" | ||
| DROP COLUMN "_isDeleted" | ||
| """ | ||
| ) | ||
| .execute(db) |
There was a problem hiding this comment.
@nagra - after my rebase, I pushed this additional commit so that the old _isDeleted column is no longer dropped. This would make it marginally safer to pin to this commit and deploy, but I still wouldn't recommend it since the risk of schema divergence still remains, and it'd be a pain to clean that up in the future.
Fixes #418
Root cause
Reinsertion of a record after a delete does not produce a new pending
.saveRecordchange entry due to missing undelete/reinsert state detection inPrimaryKeyedTable.afterInserttrigger. No new.saveRecordmeans that only the.deleteRecordis propagated.Solution
To fulfill the expected behavior of reinsertion resulting in a
.saveRecordand assigning fresh timestamps to all fields, several changes are needed: one prerequisite change to test infrastructureMockSyncEngineState, and a four-part change to detect and handle reinsertion properly.Prerequisite change necessary for writing accurate tests: Update
MockSyncEngineStateto deduplicate changes across types like the realCKSyncEngine.StateFrom
CKSyncEngine.State.add(pendingRecordZoneChanges:)documentation:MockSyncEngineState.add(pendingRecordZoneChanges:)deduplicates fully identical changes (type+ID) by virtue of the underlyingOrderedSet, but does not deduplicate when the type (save vs delete) is different, as is described above withCKSyncEngine.State.MockSyncEngineState.add(pendingDatabaseChanges:)has a similar issue. The fix: update both to deduplicate based on ID alone which will allow for the cross-type deduplication described above, and add tests.Changes to detect and handle reinsertion:
1. Replace
SyncMetadata._isDeletedwith enum valueSyncMetadata._pendingStatusThe
_isDeletedfield does not provide enough information about the current pending/transition status of the record. Introduce new enum value field_pendingStatusto replace_isDeletedwith two cases:.deleted(soft-deleted) and.reinserted(row re-added while soft-deleted).2. Detect reinsertion of soft-deleted record, update
_pendingStatusand queue new.saveRecordchangeModify
PrimaryKeyedTable.afterInserttrigger to update theSyncMetadatarecord when reinsertion of a soft-deleted row is detected (insert comes through while_pendingStatusis.deleted). Updates include:_pendingStatusto.reinserted_lastKnownServerRecordAllFieldsto be based on local, reinserted state when preparing an outbound record (see#3below) and before processing inbound server record (see#4below).userModificationTime_lastKnownServerRecordAllFieldsHook up new
SyncMetadata.afterReinsertTriggerto listen for reinsertions. This trigger is responsible for queuing a.saveRecordchange viasyncEngine.$didUpdate(the same as what happens after a normal insert or update).3. In
nextRecordZoneChangeBatch(outbound record preparation), ensure record is built with reinserted row valuesWhen preparing an outbound record for a
.reinsertedrow, useSyncMetadata.lastKnownServerRecord(system fields only) as the base for building the record. UsinglastKnownServerRecordas the base (vs_lastKnownServerRecordAllFieldswhich would contain stale user field data) ensures the resulting all fields record is built up entirely from the current contents of the reinserted row + reinsertuserModificationTime.Additionally, as a cleanup step, upon saving the newly built record back into the metadata row, reset metadata
_pendingStatusto nil. This ensures this "rebuild record from reinserted row" behavior will occur only once per reinsertion.4. In
upsertFromServerRecord(inbound record processing), ensure localallFieldsstate reflects reinserted row values prior to server record applicationSince
upsertFromServerRecordcan occur beforenextRecordZoneChangeBatch, we also need to ensure that the local metadata 'allFields' state is rebuilt and accurate when we encounter any.reinsertedrows in this code path.So before reconciling local metadata and user row with server record data, check if we're dealing with a
.reinsertedrow, and if so, update the localallFieldsrecord to reflect all reinserted row values + reinsertuserModificationTime. And similarly to the outbound path, clean up by setting_pendingStatusto nil at the conclusion of the upsert.Testing
In addition to new unit tests, performed a manual test with 9ea5bdd applied to verify that the data loss issue no longer occurs. Result (compare with before):
Screen.Recording.2026-03-13.at.12.32.44.AM.mov