Skip to content

feat: add IO stream infrastructure#39

Merged
leaves12138 merged 1 commit into
apache:mainfrom
lszskye:p4-1
Jun 4, 2026
Merged

feat: add IO stream infrastructure#39
leaves12138 merged 1 commit into
apache:mainfrom
lszskye:p4-1

Conversation

@lszskye
Copy link
Copy Markdown
Contributor

@lszskye lszskye commented Jun 2, 2026

Changes

DataInputStream

  • Abstract input stream for reading primitive types (int8, int16, int32, int64, float, double) and byte arrays

DataOutputStream

  • Abstract output stream for writing primitive types and byte arrays

ByteArrayInputStream

  • DataInputStream backed by an in-memory byte array (Bytes), supporting sequential reads with position tracking

BufferedInputStream

  • Buffered wrapper around DataInputStream with configurable buffer size, supporting Seek(), Skip(), and efficient sequential reads

OffsetInputStream

  • DataInputStream that reads from a sub-range of an underlying stream, defined by offset and length, with boundary enforcement

MemorySegmentOutputStream

  • DataOutputStream backed by MemorySegment with auto-growing capacity, producing MemorySlice output via ToSlice()

Tests

  • BufferedInputStreamTest
  • ByteArrayInputStreamTest
  • DataInputOutputStreamTest
  • MemorySegmentOutputStreamTest
  • OffsetInputStreamTest

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found correctness blockers in the new stream implementations: failed seeks can leave stream positions corrupted, and OffsetInputStream boundary checks truncate 64-bit offsets. Please fix these before merging.

Status ByteArrayInputStream::Seek(int64_t offset, SeekOrigin origin) {
switch (origin) {
case SeekOrigin::FS_SEEK_SET: {
position_ = offset;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please compute the target position in a local variable and assign position_ only after validation succeeds. Currently a failed seek (for example, seeking past EOF) returns an error but leaves position_ out of range, so subsequent GetPos()/Read() calls observe corrupted stream state. The same invariant should hold for FS_SEEK_CUR and FS_SEEK_END.

Status OffsetInputStream::Seek(int64_t offset, SeekOrigin origin) {
switch (origin) {
case SeekOrigin::FS_SEEK_SET: {
inner_position_ = offset;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue here: inner_position_ is mutated before both boundary validation and the wrapped seek succeed. A failed seek should not change the logical position. Please calculate a local target position first, validate it, seek the wrapped stream, and only then commit inner_position_.

return length_;
}

Status OffsetInputStream::AssertBoundary(int32_t inner_pos) const {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This boundary check should not take int32_t. The stream API and length_ use 64-bit offsets, so passing values like inner_position_ + size or offset + size through int32_t can truncate offsets above 2 GiB and either reject valid reads or allow out-of-range reads. Please use a 64-bit type consistently here.

Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-reviewed PR #39. The remaining concerns about failed seek position rollback and int64_t file system offsets are planned for a later unified file-system refactor, so they should not block this PR. I did not find other blockers.

@leaves12138 leaves12138 merged commit a175073 into apache:main Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants