fix(providers/anthropic): forward FilePart.Filename as document title and warn on unsupported media types#38
Open
ethanndickson wants to merge 2 commits into
Open
Conversation
Anthropic's DocumentBlockParam exposes a Title field that the model uses when it refers back to an attached document. Forward FilePart.Filename into that field so users can ask the model about a document by name. The title is sanitized first: Anthropic restricts titles to alphanumerics, whitespace, hyphens, parentheses, and square brackets, and returns 'The document file name can only contain alphanumeric characters, whitespace characters, hyphens, parentheses, and square brackets.' for any title containing other runes. Disallowed runes are replaced with spaces, runs of whitespace are collapsed, and the result is trimmed. Empty or fully disallowed input falls back to 'Document' so every attached document has a stable handle, matching the invariant the OpenAI provider already enforces with its part-N.pdf synthetic name. The sanitizer is a Go port of the implementation in coder/mux (src/node/utils/messages/sanitizeAnthropicDocumentFilename.ts); prior art for sending filename as title also includes vercel/ai's @ai-sdk/anthropic, which sets document.title from part.filename when no provider-options title is supplied.
…ed media types Mirror the PDF document-title handling on the text/* document branch so text attachments also reach Anthropic with a stable handle the model can refer back to. The filename runs through the same sanitizer; an empty or fully disallowed filename falls back to 'Document'. Also add a default case to the file MediaType switch that emits a CallWarning when a FilePart's media type is not handled. Previously the Anthropic provider silently dropped any file with a media type other than image/*, application/pdf, or text/*, so unsupported attachments left no trace for the caller. The new behavior matches the openai, openaicompat, openrouter, and vercel providers, which already warn on unsupported FilePart media types.
4331a3f to
492b6b0
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The Anthropic provider currently ignores
FilePart.Filenamefor both PDFs and text documents, and it silently drops any other media type without surfacing a warning. Claude therefore has no handle the model can use to refer back to an attachment, and unsupported attachments leave no trace for the caller.This PR makes three small changes to
providers/anthropic:file.FilenameintoDocumentBlockParam.Titleon theapplication/pdfbranch. The filename is sanitized first (see below).text/*branch, which also produces aDocumentBlockParam(viaPlainTextSourceParam).CallWarning. Match the other providers in this repo by emitting afantasy.CallWarningwhen aFilePartmedia type is not handled, instead of silently dropping it.default: warnings = append(warnings, fantasy.CallWarning{ Type: fantasy.CallWarningTypeOther, Message: fmt.Sprintf("file part media type %s not supported", file.MediaType), })Why
titleand the sanitizerAnthropic restricts document titles to alphanumerics, whitespace, hyphens, parentheses, and square brackets. A title that contains other runes (the
.and_characters that occur in almost every real filename, for example) is rejected with:The new
sanitizeAnthropicDocumentTitlehelper replaces disallowed runes with spaces, collapses consecutive whitespace, and trims. Empty or fully disallowed input falls back to"Document"so every attached document has a stable handle the model can refer back to.Why the image branch is untouched
ImageBlockParamdoes not have a free-form filename or title slot, so there is no equivalent place to forwardFilePart.Filename. The two branches touched here (application/pdf,text/*) are the full set of Anthropic content blocks that can carry a document title. The new default warning still covers everything else, including image MIME types that the existingimage/*case does not handle (for example,image/heic).Upstream
The same gaps exist in
charmbracelet/fantasy. Tracked in charmbracelet#267.Closes
CODAGT-540 (follow-up to #37)