Skip to content

Releases: pathwaycom/pathway

v0.31.0

25 May 15:37

Choose a tag to compare

Added

  • pw.io.sqlite.write connector, which writes a Pathway table into a SQLite database file. Supports two modes: stream_of_changes (default) appends each event alongside time/diff metadata columns, while snapshot maintains the current state of the table via INSERT ... ON CONFLICT DO UPDATE on insertions and DELETE on retractions, keyed on the primary_key parameter. Values are encoded using the same storage-class mapping that pw.io.sqlite.read accepts, so write / read round-trips every supported Pathway type losslessly. init_mode controls whether the destination table is left as-is, auto-created, or replaced on start-up.
  • pw.io.deltalake.read now accepts Delta decimal(p, s) columns. The Pathway type declared in the schema chooses the projection: float converts each value through f64 (lossy in general — both because f64 is binary and because its mantissa carries only ~15–17 significant decimal digits) and emits a one-time warning at startup naming each affected column; str formats the unscaled integer with the column's scale and passes the resulting decimal text through unchanged, lossless for the full Delta precision range (up to 38 digits).
  • pw.io.deltalake.write accepts a Pathway str column when writing into an existing Delta decimal(p, s) column: each row's text is parsed as decimal and stored as the column's fixed-point value. Combined with the lossless decimal → str read path, a Delta decimal column can round-trip through a Pathway pipeline with no precision loss. A string that can't be parsed as a decimal of the column's shape fails the write with an error message naming the offending value, the column's precision and scale, and the specific constraint it violated. Tables that don't contain a decimal column (or that are being created fresh by Pathway) are unaffected.
  • pw.io.deltalake.read now accepts Delta date columns (mapped onto DateTimeNaive / DateTimeUtc at midnight on the calendar day, since Pathway has no native Date type) and timestamp_millis columns (mapped onto the same Pathway types with millisecond precision preserved).
  • The panel widget for table visualization now accepts page_size and table_height parameters.

Changed

  • BREAKING: pw.io.iceberg.write to a Glue catalog no longer accepts DateTimeUtc columns. Glue's metastore has no timezone-aware timestamp type, so previous versions silently dropped the timezone on read-back; writes now fail with an explicit error instead of corrupting the zone. To store UTC timestamps in Glue, convert to DateTimeNaive with UTC-normalized values, or write through the REST catalog, which preserves the timezone.
  • pw.io.sqlite.read now parses every Pathway Value variant. In addition to int, float, str, bytes, pw.Json, and their Optional forms, the reader now accepts bool, pw.DateTimeNaive, pw.DateTimeUtc, pw.Duration, pw.Pointer, pw.PyObjectWrapper, homogeneous tuple / list, and np.ndarray. Composite types are stored as TEXT using the same JSON encoding that pw.io.jsonlines.write emits. Booleans additionally accept PostgreSQL-style textual literals (true/false, yes/no, on/off, t/f, y/n; case-insensitive, whitespace-trimmed), and float columns tolerate values stored with INTEGER storage class.
  • pw.io.mssql.read and pw.io.mssql.write now retry transient SQL Server errors automatically.

Fixed

  • pw.io.http.rest_connector no longer raises TypeError: Cannot instantiate typing.Any when a request column has the inferred default schema type (Any). The cast step now skips columns typed as Any instead of attempting to call the type as a constructor.
  • pw.io.deltalake.read now accepts Delta tables whose integer columns use any of the standard Parquet integer widths (INT_8, INT_16, INT_32, unsigned variants), and whose floating-point columns use FLOAT (32-bit) or FLOAT16. Previously the row-level reader only matched INT_64 and DOUBLE, so tables produced by Spark / DuckDB / pandas with explicit narrower casts read back as zero rows with per-row conversion errors.
  • pw.io.deltalake.write partition columns of type pw.Pointer, pw.Duration, and pw.Json now round-trip correctly through pw.io.deltalake.read. Previously the values were correctly placed in the partition path on write, but the reader had no decoder for those types and produced a conversion error for every row.

v0.30.1

23 Apr 08:05

Choose a tag to compare

Added

  • pw.io.rabbitmq.read and pw.io.rabbitmq.write connectors for reading from and writing to RabbitMQ Streams. Supports JSON, plaintext, and raw formats; streaming and static modes; persistence with offset recovery; dynamic topics (writing to different streams per row); start_from parameter ("beginning", "end", or "timestamp"); TLS configuration; and message metadata including AMQP 1.0 properties and application properties. Header values are JSON-encoded for round-trip compatibility. Requires a Pathway Scale or Enterprise license.
  • pw.io.mssql.read connector, which reads data from a Microsoft SQL Server table. The connector first delivers a full snapshot of the table and then, if the streaming mode is used, tracks incremental changes via SQL Server Change Data Capture (CDC).
  • pw.io.mssql.write connector, which writes a Pathway table to a Microsoft SQL Server table. Row additions and updates are applied as MERGE (upsert) statements keyed on the configured primary key columns, and row deletions are applied as DELETE statements.
  • pw.io.milvus.write connector, which writes a Pathway table to a Milvus collection. Row additions are sent as upserts and row deletions are sent as deletes keyed on the configured primary key column. Requires a Pathway Scale license.
  • pathway spawn now supports the --addresses and --process-id flags for multi-machine deployments. Pass a comma-separated list of host:port addresses for all processes and the index of the local process; Pathway will connect the cluster over TCP without requiring all processes to run on the same machine.
  • pw.xpacks.llm.parsers.AudioParser, audio transcription parser based on OpenAI Whisper API. Accepts raw audio bytes and returns transcribed text, following the same interface as other Pathway document parsers.
  • pw.io.leann.write connector for writing Pathway tables to LEANN vector indices. LEANN uses graph-based selective recomputation to achieve 97% storage reduction compared to traditional vector databases.
  • pw.iterate now supports operator persistence. On restart, the iterate operator loads its previous input from an operator snapshot and reconverges inside the loop, allowing incremental processing of new data without replaying the full input stream.

v0.30.0

24 Mar 19:09

Choose a tag to compare

Added

  • pw.io.mongodb.read connector, which reads data from a MongoDB collection. The connector first delivers a full snapshot of the collection and then, if the streaming mode is used, subscribes to the change stream to receive incremental updates in real time.
  • pw.io.postgres.read connector, which reads data from a PostgreSQL table directly by parsing the Write-Ahead Log (WAL).
  • pw.io.postgres.write and pw.io.postgres.read now support serialization/deserialization of np.ndarray (int/float elements), homogeneous tuple and list (via Postgres ARRAY; multidimensional rectangular arrays supported).
  • pw.io.airbyte.read now accepts a dependency_overrides parameter, allowing users to pin specific versions of transitive dependencies (e.g. airbyte-cdk) installed into the connector's virtual environment. This unblocks connectors broken by upstream dependency changes without waiting for upstream fixes.

Changed

  • BREAKING: pw.io.mongodb.write and pw.io.mongodb.read now serialize and deserialize np.ndarray columns as nested BSON arrays that preserve the array's shape. Previously, all ndarrays were flattened to a single BSON array regardless of dimensionality, making it impossible to reconstruct the original shape on read-back. For 1-D arrays the representation is identical to before ([1, 2, 3]); only multi-dimensional arrays are affected.
  • BREAKING: The dependencies for pw.io.pyfilesystem.read are no longer included in the default package installation. To install them, please use pip install pathway[pyfilesystem].
  • Asynchronous callback for pw.io.python.write is now available as pw.io.OnChangeCallbackAsync.
  • pw.run and pw.run_all now have the event_loop parameter to support reusing async state across multiple graph runs.

Fixed

  • pathway web-dashboard now waits for the metrics database to be created instead of terminating instantly.

v0.29.1

16 Feb 13:48

Choose a tag to compare

Added

  • pw.io.kafka.read and pw.io.kafka.write connectors now support OAUTHBEARER authentication.
  • pw.io.mongodb.write connector now supports an output_table_type parameter with two modes: stream_of_changes (default) and snapshot. In snapshot mode, the connector maintains the current state of the Pathway table in MongoDB using the _id field as the primary key, while stream_of_changes preserves the existing behavior by writing all events with time and diff flags to reflect transactional minibatches and the nature of each change.
  • Workers can now automatically scale up or down based on pipeline load, using a configurable monitoring window. This feature requires persistence to be enabled and can be configured via worker_scaling_enabled and workload_tracking_window_ms in pw.persistence.Config. Please refer to the tutorial for more details.
  • pw.io.postgres.write now properly supports TLS configuration via sslmode and sslrootcert connection string parameters.

Changed

  • pw.xpacks.connectors.read now retries initial connection requests.

v0.29.0

22 Jan 07:18

Choose a tag to compare

Added

  • Pathway Web Dashboard providing user-friendly interface for monitoring Pathway pipelines in real time with interactive graph plotting and latency/memory metrics.
  • pw.io.kafka.read now includes message headers in the parsed metadata. The headers are available at the top level of the metadata in the headers array. Each element of the array is a pair consisting of a string header name and a base64-encoded header value. If the header is null, the corresponding value is also null.
  • pw.xpacks.llm.llms.BedrockChat - Native AWS Bedrock chat integration using the Converse API. Supports Claude, Llama, Titan, Mistral, and other Bedrock models.
  • pw.xpacks.llm.embedders.BedrockEmbedder - Native AWS Bedrock embedding integration supporting Amazon Titan and Cohere embedding models.

Changed

  • Most Python dependencies are now imported only if the related capabilities are used by a program.
  • BREAKING: Output connectors no longer wrap string header values in double quotes when sending them to Kafka or NATS. The string values are forwarded as-is. The None value is handled differently: in Kafka, it is serialized as a header without a value, while in NATS it becomes the string "None".

v0.28.0

08 Jan 08:06

Choose a tag to compare

Added

  • pw.io.kafka.read and pw.io.redpanda.read now allow each schema field to be specified as coming from either the message key or the message value.
  • Connector groups now support the specification of an idle duration. When this is set, if a source does not provide any data for the specified period of time, it will be excluded from the group until it produces data again.
  • It is now possible to assign priorities to sources within a connector group. When a priority is set, it ensures that at any moment, the source is not lagging behind any other source with a higher priority in terms of the tracked column.
  • Connector groups can now be used in the multiprocess runs.

Changed

  • BREAKING: The __str__ and dumps methods in pw.Json no longer enforce the result to be an ASCII string. This way, the behavior of pw.debug.compute_and_print is now consistent with other output connectors.
  • The window functions now internally use deterministic UDFs, where possible.

v0.27.1

08 Dec 13:01

Choose a tag to compare

[0.27.1] - 2025-12-08

Added

  • pw.Table.filter_out_results_of_forgetting method, allowing to revert the effects of forgetting at a later stage.

Changed

  • The MCP server tool method now allows to pass an optional description, default value ​​being kept as the handler's docstring.
  • pw.io.kafka.read and pw.io.redpanda.read now create a key column storing the contents of the message keys.

v0.27.0

13 Nov 08:44

Choose a tag to compare

Added

  • JetStream extension is now supported in both NATS read and write connectors.
  • The Iceberg connectors now support Glue as a catalog backend.
  • New Table.add_update_timestamp_utc function for tracking update time of rows in the table

Changed

  • BREAKING The API for the Iceberg connectors has changed. The catalog parameter is now required in both pw.io.iceberg.read and pw.io.iceberg.write. This parameter can be either of type pw.io.iceberg.RestCatalog or pw.io.iceberg.GlueCatalog, and it must contain the connection parameters.
  • BREAKING paddlepaddle is no longer a dependency of the Pathway package. The reason is that choosing a specific version for the hardware it will be run on is advantageous from the performance point of view. To install paddlepaddle follow instructions on https://www.paddlepaddle.org.cn/en/install/quick.
  • pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer now supports document reranking. This enables two-stage retrieval where initial vector similarity search is followed by reranking to improve document relevance ordering.

Fixed

  • Endpoints created by pw.io.http.rest_connector now accept requests both with and without a trailing slash. For example, /endpoint/ and /endpoint are now treated equivalently.
  • Schemas that inherit from other schemas now automatically preserve all properties from their parent schemas.
  • Fixed an issue where the persistence configuration failed when provided with a relative filesystem path.
  • Fixed unique name autogeneration for the Python connectors.

v0.26.4

16 Oct 07:20

Choose a tag to compare

Added

  • New external integration with Qdrant.
  • pw.io.mysql.write method for writing to MySQL. It supports two output table types: stream of changes and a realtime-updated data snapshot.

Changed

  • pw.io.deltalake.read now accepts the start_from_timestamp_ms parameter for non-append-only tables. In this case, the connector will replay the history of changes in the table version by version starting from the state of the table at the given timestamp. The differences between versions will be applied atomically.
  • Asynchronous UDFs for connecting to API based llm and embedding models now have by default retry strategy set to pw.udfs.ExponentialRetryStrategy()
  • pw.io.postgres.write method now supports two output table types: stream of changes and realtime-updated data snapshot. The output table type can be chosen with the output_table_type parameter.
  • pw.io.postgres.write_snapshot method has been deprecated.

v0.26.3

03 Oct 09:26

Choose a tag to compare

Added

  • New parser pathway.xpacks.llm.parsers.PaddleOCRParser supporting parsing of PDF, PPTX and images.