feat(tables): background import for large CSVs with live progress#4861
feat(tables): background import for large CSVs with live progress#4861TheodoreSpeaks wants to merge 5 commits into
Conversation
|
The latest updates on your projects. Learn more about Vercel for GitHub. |
PR SummaryMedium Risk Overview DB: migration adds import columns on UI: Other: shared Reviewed by Cursor Bugbot for commit 6993ae9. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
@greptile review |
Greptile SummaryThis PR adds async background CSV/TSV import for large files, routing them direct-to-storage while a detached worker streams, infers schema, and bulk-inserts in committed batches — avoiding request/ALB timeouts. It also replaces per-row
Confidence Score: 4/5Safe to merge for create and replace async imports; the append async path produces rows with wrong sort positions until the one-line fix is applied. The async import worker initialises apps/sim/lib/table/import-runner.ts — the Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Storage
participant KickoffRoute as POST /import-async
participant Worker as runTableImport (detached)
participant DB
participant SSE as SSE stream
Client->>Storage: PUT file (direct-to-storage upload)
Storage-->>Client: fileKey
Client->>KickoffRoute: "POST { workspaceId, fileKey, fileName, mode }"
KickoffRoute->>DB: markTableImporting(tableId, importId)
KickoffRoute-->>Client: "200 { tableId, importId }"
KickoffRoute--)Worker: runDetached("table-import", ...)
Worker->>DB: getTableById(tableId)
alt replace mode
Worker->>DB: deleteAllTableRows(tableId)
end
Worker->>Storage: downloadFile(fileKey)
Storage-->>Worker: Buffer
loop for each batch (CSV_MAX_BATCH_SIZE rows)
Worker->>DB: "bulkInsertImportBatch(startPosition=inserted)"
Worker->>DB: updateImportProgress(rows)
Worker->>SSE: appendTableEvent(importing, progress)
end
alt success
Worker->>DB: markImportReady(tableId)
Worker->>SSE: appendTableEvent(ready)
else failure
Worker->>DB: markImportFailed(tableId, error)
Worker->>SSE: appendTableEvent(failed)
end
Client->>SSE: "EventSource /api/table/{tableId}/events/stream"
SSE-->>Client: import progress ticks → ready / failed
Reviews (3): Last reviewed commit: "Merge remote-tracking branch 'origin/sta..." | Re-trigger Greptile |
| * for `create`, mapping onto the existing schema for `append`/`replace`), then bulk-inserts | ||
| * in committed batches — **no rollback**: committed batches persist even if a later batch | ||
| * fails. Progress and the terminal state are surfaced via the table-events SSE stream. | ||
| */ | ||
| export async function runTableImport(payload: TableImportPayload): Promise<void> { | ||
| const { importId, tableId, workspaceId, userId, fileKey, fileName, delimiter, mode } = payload |
There was a problem hiding this comment.
Uploaded CSV file is never deleted after import
downloadFile({ key: fileKey, context: 'workspace' }) fetches the file from workspace storage, but there is no deleteFile call in either the success or the catch path. Every async import permanently retains the source CSV/TSV in the workspace storage bucket. Over many imports this becomes a non-trivial storage accumulation with no cleanup mechanism.
Greptile SummaryThis PR adds async background CSV/TSV import for large files (≥ 8 MB): the client uploads directly to storage, two new kickoff routes create a placeholder table (or mark an existing one as
Confidence Score: 3/5Two correctness issues need attention before shipping: a potential silent data loss in replace-mode async imports, and a liveness-check gap that could cause legitimate long-running imports to be terminated by the cron cleaner. In replace-mode async imports, apps/sim/lib/table/import-runner.ts (replace-mode row deletion order) and apps/sim/lib/table/service.ts (updateImportProgress missing updatedAt) Important Files Changed
Sequence DiagramsequenceDiagram
participant Client
participant Storage
participant KickoffRoute as POST /import-async
participant BG as runTableImport (detached)
participant DB
Client->>Storage: upload file (direct-to-storage)
Storage-->>Client: fileKey
Client->>KickoffRoute: "POST {workspaceId, fileKey, fileName}"
KickoffRoute->>DB: createTable / markTableImporting
KickoffRoute->>BG: runDetached(runTableImport)
KickoffRoute-->>Client: "{tableId, importId}"
BG->>DB: deleteAllTableRows (replace mode only)
BG->>Storage: downloadFile(fileKey)
Storage-->>BG: buffer
BG->>DB: "appendTableEvent(importing, progress=0, total)"
loop each batch (1000 rows)
BG->>DB: bulkInsertImportBatch
BG->>DB: "appendTableEvent(importing, progress=N)"
end
BG->>DB: markImportReady / markImportFailed
BG->>DB: "appendTableEvent(ready | failed)"
DB-->>Client: SSE stream to ImportProgressMenu
Reviews (2): Last reviewed commit: "feat(tables): background import for larg..." | Re-trigger Greptile |
…/empty validation
# Conflicts: # apps/sim/app/api/table/[tableId]/import/route.ts # apps/sim/app/workspace/[workspaceId]/tables/[tableId]/hooks/use-table-event-stream.ts # apps/sim/app/workspace/[workspaceId]/tables/[tableId]/table.tsx # apps/sim/lib/table/events.ts # packages/db/migrations/meta/0222_snapshot.json # packages/db/migrations/meta/_journal.json # scripts/check-api-validation-contracts.ts
|
Addressed Bugbot review + synced with staging:
@greptile review |
…elete-on-replace after download
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit db9cdc8. Configure here.
| if (rows.length === 0 || !schema || !headerToColumn) return | ||
| const coerced = coerceRowsForTable(rows, schema, headerToColumn) | ||
| inserted += await bulkInsertImportBatch( | ||
| { tableId, workspaceId, userId, rows: coerced, startPosition: inserted }, |
There was a problem hiding this comment.
Append import reuses row positions
High Severity
Background append imports pass startPosition as the number of rows inserted in this run (0, 1000, …), not the next free position after existing table rows. New rows are written at positions that already hold data, so ordering and grid display can be wrong while the synchronous append path uses nextAutoPosition.
Reviewed by Cursor Bugbot for commit db9cdc8. Configure here.
…ng the whole file
|
Now streaming the import instead of buffering the whole file (Bugbot High):
@greptile review |


Summary
user_table_rowscount triggers with statement-level triggers (transition tables) so bulk insert/delete no longer serialize per row (migration 0222).missing final boundaryon CSV upload by streaming multipart with busboy instead ofrequest.formData().ProgressItememcn component + a header indicator, driven by SSE events; uploading → processing → done/failed stages defined programmatically.materialize_fileoperation: 'table'+ fail-fast guard for unimplemented ops.Type of Change
Testing
Tested manually: imported 10MB and ~1M-row CSVs from the list and in-table; verified upload→processing→done progress, refresh persistence, and failure handling.
bun run lintclean;bun run check:api-validation:strictpasses.Checklist