Releases: pathwaycom/pathway
Releases · pathwaycom/pathway
v0.31.0
Added
pw.io.sqlite.writeconnector, which writes a Pathway table into a SQLite database file. Supports two modes:stream_of_changes(default) appends each event alongsidetime/diffmetadata columns, whilesnapshotmaintains the current state of the table viaINSERT ... ON CONFLICT DO UPDATEon insertions andDELETEon retractions, keyed on theprimary_keyparameter. Values are encoded using the same storage-class mapping thatpw.io.sqlite.readaccepts, sowrite/readround-trips every supported Pathway type losslessly.init_modecontrols whether the destination table is left as-is, auto-created, or replaced on start-up.pw.io.deltalake.readnow accepts Deltadecimal(p, s)columns. The Pathway type declared in the schema chooses the projection:floatconverts each value through f64 (lossy in general — both because f64 is binary and because its mantissa carries only ~15–17 significant decimal digits) and emits a one-time warning at startup naming each affected column;strformats the unscaled integer with the column's scale and passes the resulting decimal text through unchanged, lossless for the full Delta precision range (up to 38 digits).pw.io.deltalake.writeaccepts a Pathwaystrcolumn when writing into an existing Deltadecimal(p, s)column: each row's text is parsed as decimal and stored as the column's fixed-point value. Combined with the losslessdecimal → strread path, a Deltadecimalcolumn can round-trip through a Pathway pipeline with no precision loss. A string that can't be parsed as a decimal of the column's shape fails the write with an error message naming the offending value, the column's precision and scale, and the specific constraint it violated. Tables that don't contain adecimalcolumn (or that are being created fresh by Pathway) are unaffected.pw.io.deltalake.readnow accepts Deltadatecolumns (mapped ontoDateTimeNaive/DateTimeUtcat midnight on the calendar day, since Pathway has no nativeDatetype) andtimestamp_milliscolumns (mapped onto the same Pathway types with millisecond precision preserved).- The panel widget for table visualization now accepts
page_sizeandtable_heightparameters.
Changed
- BREAKING:
pw.io.iceberg.writeto a Glue catalog no longer acceptsDateTimeUtccolumns. Glue's metastore has no timezone-aware timestamp type, so previous versions silently dropped the timezone on read-back; writes now fail with an explicit error instead of corrupting the zone. To store UTC timestamps in Glue, convert toDateTimeNaivewith UTC-normalized values, or write through the REST catalog, which preserves the timezone. pw.io.sqlite.readnow parses every PathwayValuevariant. In addition toint,float,str,bytes,pw.Json, and theirOptionalforms, the reader now acceptsbool,pw.DateTimeNaive,pw.DateTimeUtc,pw.Duration,pw.Pointer,pw.PyObjectWrapper, homogeneoustuple/list, andnp.ndarray. Composite types are stored asTEXTusing the same JSON encoding thatpw.io.jsonlines.writeemits. Booleans additionally accept PostgreSQL-style textual literals (true/false,yes/no,on/off,t/f,y/n; case-insensitive, whitespace-trimmed), andfloatcolumns tolerate values stored withINTEGERstorage class.pw.io.mssql.readandpw.io.mssql.writenow retry transient SQL Server errors automatically.
Fixed
pw.io.http.rest_connectorno longer raisesTypeError: Cannot instantiate typing.Anywhen a request column has the inferred default schema type (Any). The cast step now skips columns typed asAnyinstead of attempting to call the type as a constructor.pw.io.deltalake.readnow accepts Delta tables whose integer columns use any of the standard Parquet integer widths (INT_8,INT_16,INT_32, unsigned variants), and whose floating-point columns useFLOAT(32-bit) orFLOAT16. Previously the row-level reader only matchedINT_64andDOUBLE, so tables produced by Spark / DuckDB / pandas with explicit narrower casts read back as zero rows with per-row conversion errors.pw.io.deltalake.writepartition columns of typepw.Pointer,pw.Duration, andpw.Jsonnow round-trip correctly throughpw.io.deltalake.read. Previously the values were correctly placed in the partition path on write, but the reader had no decoder for those types and produced a conversion error for every row.
v0.30.1
Added
pw.io.rabbitmq.readandpw.io.rabbitmq.writeconnectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row);start_fromparameter ("beginning","end", or"timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.pw.io.mssql.readconnector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).pw.io.mssql.writeconnector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.pw.io.milvus.writeconnector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.pathway spawnnow supports the--addressesand--process-idflags for multi-machine deployments. Pass a comma-separated list ofhost:portaddresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.pw.io.leann.writeconnector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.pw.iteratenow supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.
v0.30.0
Added
pw.io.mongodb.readconnector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.pw.io.postgres.readconnector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).pw.io.postgres.writeandpw.io.postgres.readnow support serialization/deserialization ofnp.ndarray(int/floatelements), homogeneoustupleandlist(via PostgresARRAY; multidimensional rectangular arrays supported).pw.io.airbyte.readnow accepts adependency_overridesparameter, allowing users to pin specific versions of transitive dependencies (e.g.airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.
Changed
- BREAKING:
pw.io.mongodb.writeandpw.io.mongodb.readnow serialize and deserializenp.ndarraycolumns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected. - BREAKING: The dependencies for
pw.io.pyfilesystem.readare no longer included in the default package installation. To install them, please usepip install pathway[pyfilesystem]. - Asynchronous callback for
pw.io.python.writeis now available aspw.io.OnChangeCallbackAsync. pw.runandpw.run_allnow have theevent_loopparameter to support reusing async state across multiple graph runs.
Fixed
pathway web-dashboardnow waits for the metrics database to be created instead of terminating instantly.
v0.29.1
Added
pw.io.kafka.readandpw.io.kafka.writeconnectors now support OAUTHBEARER authentication.pw.io.mongodb.writeconnector now supports anoutput_table_typeparameter with two modes:stream_of_changes(default) andsnapshot. Insnapshotmode, the connector maintains the current state of the Pathway table in MongoDB using the_idfield as the primary key, whilestream_of_changespreserves the existing behavior by writing all events withtimeanddiffflags to reflect transactional minibatches and the nature of each change.- Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via
worker_scaling_enabledandworkload_tracking_window_msinpw.persistence.Config. Please refer to the tutorial for more details. pw.io.postgres.writenow properly supports TLS configuration viasslmodeandsslrootcertconnection string parameters.
Changed
pw.xpacks.connectors.readnow retries initial connection requests.
v0.29.0
Added
- Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
pw.io.kafka.readnow includes message headers in the parsed metadata. The headers are available at the top level of the metadata in theheadersarray. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.pw.xpacks.llm.llms.BedrockChat- Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.pw.xpacks.llm.embedders.BedrockEmbedder- Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.
Changed
- Most Python dependencies are now imported only if the related capabilities are used by a program.
- BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The
Nonevalue is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string"None".
v0.28.0
Added
pw.io.kafka.readandpw.io.redpanda.readnow allow each schema field to be specified as coming from either the message key or the message value.- Connector groups now support the specification of an idle duration. When this is set, if a source does not provide any data for the specified period of time, it will be excluded from the group until it produces data again.
- It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
- Connector groups can now be used in the multiprocess runs.
Changed
- BREAKING: The
__str__anddumpsmethods inpw.Jsonno longer enforce the result to be an ASCII string. This way, the behavior ofpw.debug.compute_and_printis now consistent with other output connectors. - The window functions now internally use deterministic UDFs, where possible.
v0.27.1
[0.27.1] - 2025-12-08
Added
pw.Table.filter_out_results_of_forgettingmethod, allowing to revert the effects of forgetting at a later stage.
Changed
- The MCP server
toolmethod now allows to pass an optionaldescription, default value being kept as the handler's docstring. pw.io.kafka.readandpw.io.redpanda.readnow create akeycolumn storing the contents of the message keys.
v0.27.0
Added
- JetStream extension is now supported in both NATS read and write connectors.
- The Iceberg connectors now support Glue as a catalog backend.
- New
Table.add_update_timestamp_utcfunction for tracking update time of rows in the table
Changed
- BREAKING The API for the Iceberg connectors has changed. The
catalogparameter is now required in bothpw.io.iceberg.readandpw.io.iceberg.write. This parameter can be either of typepw.io.iceberg.RestCatalogorpw.io.iceberg.GlueCatalog, and it must contain the connection parameters. - BREAKING
paddlepaddleis no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To installpaddlepaddlefollow instructions on https://www.paddlepaddle.org.cn/en/install/quick. pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerernow supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.
Fixed
- Endpoints created by
pw.io.http.rest_connectornow accept requests both with and without a trailing slash. For example,/endpoint/and/endpointare now treated equivalently. - Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
- Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
- Fixed unique name autogeneration for the Python connectors.
v0.26.4
Added
- New external integration with Qdrant.
pw.io.mysql.writemethod for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.
Changed
pw.io.deltalake.readnow accepts thestart_from_timestamp_msparameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.- Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to
pw.udfs.ExponentialRetryStrategy() pw.io.postgres.writemethod now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with theoutput_table_typeparameter.pw.io.postgres.write_snapshotmethod has been deprecated.
v0.26.3
Added
- New parser
pathway.xpacks.llm.parsers.PaddleOCRParsersupporting parsing of PDF, PPTX and images.