v0.6.102: support S3-compatible in object storage, GitLab code knowledgebase connector, SSO provider ID allowlist, singleton memory leak fix#4871
Conversation
…rigger faults (#4860) * fix(webhook): don't fault trigger run on user/workflow execution errors Webhook-triggered executions re-threw every error, so trigger.dev marked the run failed and fired #eng-errors alerts. The vast majority of these are user-caused workflow failures (missing required fields, invalid field references, bad URLs, provider 4xx, expired models, low credit) that are already recorded in the execution logs. Distinguish fault vs error in executeWebhookJobInternal: when the failure was finalized by core (the workflow ran and its failure is logged), complete the run with { success: false } instead of throwing. Errors that were not finalized came from the webhook pipeline itself and still re-throw to fault the run. Await waitForPostExecution first so the finalized flag is reliable. The error is still recorded on the run's OTel span via recordException (no ERROR status, so the run isn't faulted) and remains in the execution logs, so these stay investigable in Tempo/Loki without false alerts. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedule): don't fault trigger run on error-recovery failures The schedule task already treats workflow-execution failures as recorded errors rather than trigger faults, but the outermost catch's own recovery code (the infra-retry and releaseClaim calls) was unguarded. A secondary DB blip while releasing the claim re-threw and escaped run(), faulting the trigger.dev run and firing an alert — a double-fault during cleanup. Wrap the recovery path in a try/catch: log and record the exception on the span without re-throwing. The claim expires on its TTL and the next tick re-claims the schedule, so swallowing the cleanup failure is safe. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * test(webhook): assert waitForPostExecution runs on the non-finalized path Guards the race fix on the infra-error path so a future refactor can't silently drop the await. Addresses Greptile review feedback. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…le storage (#4865) * feat(storage): support S3-compatible endpoints (R2, MinIO, B2) for file storage Add S3_ENDPOINT and S3_FORCE_PATH_STYLE env vars, wired into the shared upload S3 client so Cloudflare R2, MinIO, Backblaze B2, and other S3-compatible stores work for self-hosted file storage. The endpoint is trusted operator config (no SSRF/HTTPS gate). Makes the multipart Location fallback endpoint-aware, extends the S3 client unit tests, and documents the new vars in Helm values, .env.example, and the English self-hosting docs (incl. browser-reachability + CORS guidance). * docs(storage): add RustFS as an S3-compatible provider example * fix(storage): address review feedback and fix env mock for CI - Add envBoolean to the shared env test mock (createEnvMock) so config.ts's forcePathStyle coercion resolves — fixes failing knowledge/utils.test.ts - Declare S3_FORCE_PATH_STYLE as z.string() (every other env var's pattern); it's coerced via envBoolean at the consumption site, avoiding a boolean type that never matches the string process.env value - Log path-style from S3_CONFIG.forcePathStyle (envBoolean) instead of a separate isTruthy call, so the startup log can't disagree with the client - Make buildObjectFallbackUrl honor forcePathStyle: virtual-hosted-style URL (bucket as subdomain) for R2, path-style only when forcePathStyle is set * docs(storage): add backlinks to S3-compatible providers (R2, MinIO, Ceph, B2, RustFS) and backends
…Marketplace guidelines (#4867)
* fix(auth): link SSO sign-in to existing same-email accounts SSO sign-ins failed with "account not linked" (then a cascading "Invalid callbackURL") when an account with the same email already existed. Better Auth's `@better-auth/sso` plugin hardcodes the provisioned user's `emailVerified: options?.trustEmailVerified ? <claim> : false`, so with the option unset every SSO login arrived unverified and tripped the account linking gate `(!isTrustedProvider && !userInfo.emailVerified)` whenever the provider was not in `accountLinking.trustedProviders`. - Set `trustEmailVerified: true` on the SSO plugin so the IdP's verified-email claim is honored (Okta, Entra ID, Google Workspace, Auth0 all assert it). - Trust the operator's configured provider for linking: merge `SSO_PROVIDER_ID` (when present in the app env) plus a new `SSO_TRUSTED_PROVIDER_IDS` list into `trustedProviders`. Empty/unset => no-op, so existing deployments are unchanged. - Invite callback URL: return a clean `/invite/<id>` (token already persists in sessionStorage) so an appended `?error=` cannot produce a malformed URL. - Document `SSO_TRUSTED_PROVIDER_IDS` in SSO docs, Helm values, and schema. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(auth): address review — guard trusted SSO providers, revert invite callback - Only compute additionalTrustedSsoProviders when SSO_ENABLED, so trustedProviders is exactly unchanged for non-SSO deployments. - Revert the invite getCallbackUrl change: keep the token in the callback URL (with sessionStorage/searchParams fallback) so the token survives when sessionStorage is unavailable. The account-linking fix removes the "account not linked" error that caused the malformed callback URL, so the callback cleanup is unnecessary. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(auth): guard trusted SSO providers with isSsoEnabled (isTruthy) env.SSO_ENABLED can be the string "false" (t3-env returns strings for booleans), which is truthy in JS. Use the canonical isSsoEnabled flag (isTruthy(env.SSO_ENABLED)) so SSO_ENABLED="false"/"0" correctly yields an empty trusted-provider list, matching how SSO is gated elsewhere. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
* feat(gitlab): sync repository files (code/docs) alongside wiki and issues * fix(gitlab): follow full keyset next-link for repo tree + skip disabled wiki gracefully in all/both * fix(gitlab): error on bad user branch (tree 404), warn on resolveRef fallback, normalize pathPrefix to directory boundary * fix(gitlab): preserve slashes in branch ref for file source URLs (GitFlow branches) * fix(gitlab): never abort sync on repo-tree 404 (empty repo); validate user branch exists at setup instead * fix(gitlab): validate ref via commits endpoint so tags and commit SHAs are accepted, not just branches * fix(gitlab): skip repo phase on tree 403 (missing read_repository) so wiki/issues still sync under all * fix(byok): add Fal icon and repair corrupted Ollama icon path The Ollama BYOK icon rendered blank because its SVG path had spaces stripped between arc-command flags (e.g. `a5.05 5.05 0 12.05-.636`), producing invalid tokens. Replaced with the canonical Ollama path. Also added a dedicated FalIcon (was falling back to the generic ImageIcon) and wired it into the BYOK provider list. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(icons): repair corrupted Fireworks icon arc command The leftmost spark of the Fireworks icon never rendered because its third subpath used a corrupted arc command (`a34.59 34.59 0 17.15 37.65`) with collapsed flags, yielding an invalid sweep-flag of 7 that aborts the path parse. Replaced with the canonical lobehub Fireworks source. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
…less execution (#4870) * fix(mothership): run client-routed workflow tools server-side in headless execution Headless Mothership (Mothership block, no browser) could not run workflows. The run_workflow/run_workflow_until_block/run_block/run_from_block tools are registered with route 'client', so the executor gate (isSimExecuted) skipped their registered server handlers and fell through to executeAppTool, throwing 'Tool not found'. Interactive runs delegate these to the browser before reaching the executor, so only the headless path broke. Allow a client-routed tool to use its registered server handler when one exists, which only affects the four run tools (the only client-routed tools, all of which have server handlers). * test(mothership): clear handler registry between executor tests Add clearHandlers() helper and reset the module-level handler registry in beforeEach so handlers registered in one test do not leak into the next.
…aks (#4869) * fix(dev): use globalThis for singleton state to prevent HMR memory leaks * fix(dev): apply globalThis guard to rate-limiter storage factory to prevent listener accumulation * fix(types): resolve McpConnectionManager globalThis undefined type error
|
The latest updates on your projects. Learn more about Vercel for GitHub.
|
PR SummaryMedium Risk Overview Object storage adds first-class S3-compatible backends via SSO links an SSO login to an existing same-email account when the IdP asserts The GitLab connector can sync repository files (branch/ref, path prefix, extension filters) alongside wiki/issues, with keyset pagination guarded by Webhook and schedule background runs distinguish finalized workflow/user errors (complete the run, Headless copilot/Mothership runs client-routed tools (e.g. Smaller changes: treat empty Reviewed by Cursor Bugbot for commit aed4402. Bugbot is set up for automated code reviews on this repo. Configure here. |
Greptile SummaryThis PR bundles several independent fixes and features: S3-compatible object-storage support, GitLab repository-file syncing, SSO account-linking improvements, and singleton memory-leak fixes across server singletons.
Confidence Score: 5/5Safe to merge — all changes are well-scoped fixes and additive features with good test coverage and no regressions in core paths. The S3-compatible endpoint, GitLab file sync, SSO linking, and singleton fixes are all contained and independently testable. The GitLab cursor SSRF guard, webhook fault-vs-error reclassification, and HMR singleton migration are each backed by targeted tests. No previously-flagged issues remain unresolved in the diff. No files require special attention beyond the previously-noted trustEmailVerified: true breadth in apps/sim/lib/auth/auth.ts, which was documented in an earlier review thread. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Webhook job starts] --> B[executeWebhookJobInternal]
B --> C[executeWorkflowCore]
C -->|success| D[Return result]
C -->|throws error| E[await waitForPostExecution]
E --> F{wasExecutionFinalizedByCore?}
F -->|YES - user/workflow error| G[recordException on span]
G --> H[Return failure result - job completes normally]
F -->|NO - infra/pipeline error| I[safeCompleteWithError]
I --> J[re-throw error - trigger.dev job faults]
Reviews (2): Last reviewed commit: "fix(storage): percent-encode object key ..." | Re-trigger Greptile |
|
Re: the flagged Intended. A configured SSO provider whose IdP asserts
The one genuinely valid residual — multi-tenant: an org registering SSO for a domain it doesn't actually own — is pre-existing (same surface as the common-ID trust list), gated by Enterprise + the domain-conflict check, and the principled fix is DNS-based domain verification + |
…sSameOrigin (#4873) * fix(gitlab): pin pagination cursor to configured host before following it The repository-tree keyset cursor stores GitLab's verbatim rel="next" URL and re-fetches it with an Authorization: Bearer header. Assert the cursor's origin matches the configured apiBase before following it, so a tampered or corrupted fileNextUrl cannot exfiltrate the access token to an attacker-controlled host. Fails closed on mismatch. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * improvement(validation): generalize isSameOrigin and reuse across connectors/tools Add an optional base argument to the shared isSameOrigin (defaulting to the app base URL) so callers can pin a URL to any trusted origin. The GitLab connector's cursor host-check and the tools self-origin check now consume the shared helper instead of their own URL-parsing. --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
) * fix(storage): percent-encode object key in multipart fallback URL buildObjectFallbackUrl built the object URL from a raw key. Keys with spaces or reserved characters (and the pre-existing AWS branch) would produce a structurally invalid location. Encode the key per path segment (preserving '/' separators) across all branches (AWS, custom path-style, virtual-hosted). * refactor(storage): clearer per-segment key encoding in fallback URL * test(storage): cover multipart fallback URL (AWS, R2 virtual-hosted, MinIO path-style, key encoding)
|
@greptile |
|
@cursor review |
There was a problem hiding this comment.
✅ Bugbot reviewed your changes and found no new issues!
Comment @cursor review or bugbot run to trigger another review on this PR
Reviewed by Cursor Bugbot for commit 5efb47e. Configure here.
…agnostics) (#4868) * fix(tables): retry transient DB/Redis failures in cell execution and surface error causes Workflow-group-cell runs intermittently failed on trivial DB reads/writes under heavy fan-out, stranding cells in `running`. Investigation showed the PlanetScale and ElastiCache backends were healthy at the time — the failures are transient connection-level faults that the cell (maxAttempts: 1) had no tolerance for, and the real cause was never logged (Drizzle wraps it as "Failed query: ..." and the driver cause lives in error.cause). Resilience: - Add retryTransient (lib/table/retry-transient.ts): retries only transient infra errors (reuses isRetryableInfrastructureError; adds an ioredis command-timeout match) with jittered backoff, then rethrows. Fail-fast for everything else. - Wrap the cell's getTableById/getRowById reads, the terminal write (cell-write updateRow — idempotent via the executionId guard), and the Redis cascade-lock acquire. Diagnostics: - Add describeError (lib/core/errors/retryable-infrastructure.ts): walks the .cause chain and always returns the underlying driver cause (code/errno/ syscall + causeChain), including for unclassified errors like AbortError. - Log `cause` + a `retryable` flag (and aborted/timedOut in the cell's main catch) across the cell + finalization error paths, mirroring the existing schedule-execution pattern. Logging-only; no behavior change. This lets the next recurrence reveal the real cause and whether the retry applies. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): address review feedback on cell retry resilience - retryTransient: re-check the abort signal after the backoff sleep so a cancellation during sleep stops the next attempt (don't run/return work for an already-cancelled task). - isRetryableRedisError: walk the .cause chain (mirroring the infra classifier) so wrapped Redis timeouts are recognized; drop "Connection is in subscriber mode" — that's a connection-state programming error, not a transient drop, and would just fail identically every retry. - cascade-lock: stop wrapping acquireLock in retryTransient. acquireLock is a non-idempotent SET NX, so retrying after a timed-out-but-applied first SET returns false (key already ours) and yields a false `contended` that skips the cascade. A transient Redis blip here just fails the run before pickup (no stranded cell); the dispatcher re-drives it. - Tests: cause-chain Redis match, subscriber-mode exclusion, abort-during-sleep. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(tables): drop out-of-scope abort/timeout fields from cell catch The main catch logged `aborted`/`timedOut` from `abortSignal`/`timeoutController`, but those are declared inside the outer try block (the inner try around executeWorkflow is try/finally, so this catch belongs to the outer try) and are not in scope in the catch — `next build`'s type-check failed with "Cannot find name 'abortSignal'". Local incremental `tsc --noEmit` had skipped the file and falsely passed; the Cursor/Greptile reviewers flagged this correctly. Removed the two fields. Abort/timeout is still surfaced via `cause: describeError(err)` (an aborted run shows `name: 'AbortError'` / the timeout message), so no diagnostic signal is lost. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(tables): drop in-process retry, keep cause diagnostics only In-process retry is the wrong layer for this path: the cell task is maxAttempts:1 by design, retrying on a possibly-degraded worker may not help, and it masks the very transient-failure signal we're trying to capture before we understand the root cause. Removed retryTransient entirely (file + all wrapping in cell-write, the cascade reads, and the lock acquire) and kept only the diagnostic logging. - Deleted lib/table/retry-transient.ts (+ test); cell-write and the cascade reads call getTableById/getRowById/updateRow directly again, fail-fast. - Kept describeError + `cause`/`retryable` fields across the cell + finalization catch blocks; the cell-path `retryable` flag now sources from isRetryableInfrastructureError (the canonical classifier) for consistency. Diagnostics-first: surface the real driver cause on the next recurrence, then decide the actual fix (e.g. task-level maxAttempts, or addressing the worker- side cause) from evidence rather than a speculative in-process retry. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * fix(schedules): log error cause on scheduled-execution failure paths The scheduled-job failure paths logged the raw error (.message/stack only) — its `.cause` (the real driver error behind a Drizzle "Failed query: ..." wrapper) was never recorded, and the classified-only `describeRetryableInfrastructureError` returns undefined for unrecognized errors. A real failed run (same incident window as the cell failures) failed in `applyScheduleUpdate` with exactly this unrecorded cause. Added `cause: describeError(error)` (always-on, walks the cause chain) to the applyScheduleUpdate catch, the early-failure catch, and the unhandled-error catch — passed as a second arg so the existing message+stack still emit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * refactor(errors): move describeError to @sim/utils/errors `describeError` is a general-purpose error/cause-chain helper — it didn't belong in `lib/core/errors/retryable-infrastructure.ts` (that module is specifically about classifying retryable infra errors, and the name read wrong for a generic diagnostic). Moved it to `@sim/utils/errors` alongside `toError`/ `getErrorMessage`/`getPostgresErrorCode`, with its own cycle-safe cause walk. - Added describeError + DescribedError + tests to packages/utils/src/errors.ts. - Reverted the describeError addition from retryable-infrastructure.ts (it keeps only isRetryableInfrastructureError / describeRetryableInfrastructureError, which are accurately named and still used by the schedule retry path). - Re-pointed all consumers (cell, logging-session, pause-persistence, schedule) to import describeError from @sim/utils/errors. The `retryable` classification flag still sources from isRetryableInfrastructureError where used. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Uh oh!
There was an error while loading. Please reload this page.