Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions src/dstack/_internal/core/models/repos/remote.py
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the PR also partially fixes and partially changes the behavior (from one unexpected to another unexpected) described #1679.

  • Locally committed binary files are now delivered as expected.

  • Locally committed large text files now cause dstack apply to fail (previously, such files would be delivered with no contents).

    $ dstack apply
    GitCommandError: Cmd('git') failed due to: exit code(128)
      cmdline: git diff --binary 623de5df3d3d13007661c7ee1a13a8d5197e734a
      stderr: 'fatal: unable to generate diff for test-file
    '
    $ ls -lh test-file 
    -rw-r--r--. 1 root root 1.1G Jun  1 14:08 test-file

Copy link
Copy Markdown
Collaborator Author

@un-def un-def Jun 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that large ASCII files (sometimes) are detected as binary files.

Note that both produce Binary files ... differ without --binary but with --binary a "truly" binary file is handled correctly despite being larger (2GB vs 1.4GB) while with a base64-encoded file Git fails with fatal: unable to generate diff . Weird.

dd if=/dev/urandom bs=1M count=1000 | base64 > blob.asc
dd if=/dev/urandom bs=1M count=2000 > blob.bin

ls -l blob.*
-rw-rw-r-- 1 def def 1.4G Jun  2 09:33 blob.asc
-rw-rw-r-- 1 def def 2.0G Jun  2 09:10 blob.bin

file blob.asc
blob.asc: ASCII text

git --no-pager diff --no-index -- /dev/null blob.asc
diff --git a/blob.asc b/blob.asc
new file mode 100644
index 0000000..f2d7c9e
Binary files /dev/null and b/blob.asc differ

git --no-pager diff --no-index --binary -- /dev/null blob.asc
diff --git a/blob.asc b/blob.asc
new file mode 100644
index 0000000..f2d7c9e
fatal: unable to generate diff for /dev/null

file blob.bin
blob.bin: data

git --no-pager diff --no-index -- /dev/null blob.bin
diff --git a/blob.bin b/blob.bin
new file mode 100644
index 0000000..f71bad7
Binary files /dev/null and b/blob.bin differ

git --no-pager diff --no-index --binary -- /dev/null blob.bin
diff --git a/blob.bin b/blob.bin
new file mode 100644
index 0000000000000000000000000000000000000000..f71bad7e725b891b8c0260707b8c34481f3da331
GIT binary patch
<ommitted>

literal 0
HcmV?d00001

Another unexpected behavior still present:

  • Uncommitted (untracked) large text files are empty (git diff fails but its exit status is ignored due to ignore_status=True):

    dd if=/dev/random bs=1M count=1000 | base64 > blob.asc
    1000+0 records in
    1000+0 records out
    1048576000 bytes (1.0 GB, 1000 MiB) copied, 4.00187 s, 262 MB/s
    
    git --no-pager diff --no-index --binary -- /dev/null blob.asc
    diff --git a/blob.asc b/blob.asc
    new file mode 100644
    index 0000000..ce045ae
    fatal: unable to generate diff for /dev/null

Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ class RemoteRepoInfo(
class RemoteRunRepoData(RemoteRepoInfo):
repo_branch: Optional[str] = None
repo_hash: Optional[str] = None
repo_diff: Annotated[Optional[str], Field(exclude=True)] = None
repo_diff: Annotated[Optional[bytes], Field(exclude=True)] = None
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Technically a breaking change for SDK users, since RemoteRunRepoData is stored in the public but undocumented run_repo_data attribute of the public RemoteRepo class

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but I'd consider all undocumented fields as private

repo_config_name: Optional[str] = None
repo_config_email: Optional[str] = None

Expand Down Expand Up @@ -183,13 +183,15 @@ def __init__(
def has_code_to_write(self) -> bool:
# repo_diff is:
# * None for RemoteRepo.from_url()
# * an empty string for RemoteRepo.from_dir() if there are no changes ("clean" state)
# * a non-empty string for RemoteRepo.from_dir() if there are changes ("dirty" state)
# * empty bytes for RemoteRepo.from_dir() if there are no changes ("clean" state)
# and untracked files
# * non-empty bytes for RemoteRepo.from_dir() if there are changes ("dirty" state)
# and/or untracked files
return bool(self.run_repo_data.repo_diff)

def write_code_file(self, fp: BinaryIO) -> str:
if self.run_repo_data.repo_diff is not None:
fp.write(self.run_repo_data.repo_diff.encode())
fp.write(self.run_repo_data.repo_diff)
return get_sha256(fp)

def get_repo_info(self) -> RemoteRepoInfo:
Expand Down Expand Up @@ -238,7 +240,7 @@ def __init__(self, warning_time: float, delay: float = 5):
self.delay = delay
self.warned = False
self.start_time = time.monotonic()
self.buffer = io.StringIO()
self.buffer = io.BytesIO()

def timeout(self):
now = time.monotonic()
Expand All @@ -256,9 +258,9 @@ def timeout(self):
)

def write(self, v: bytes):
self.buffer.write(v.decode())
self.buffer.write(v)

def get(self) -> str:
def get(self) -> bytes:
if self.warned:
print()
return self.buffer.getvalue()
Expand Down Expand Up @@ -366,10 +368,10 @@ def _interactive_git_proc(
continue


def _repo_diff_verbose(repo: git.Repo, repo_hash: str, warning_time: float = 5) -> str:
def _repo_diff_verbose(repo: git.Repo, repo_hash: str, warning_time: float = 5) -> bytes:
collector = _DiffCollector(warning_time)
try:
_interactive_git_proc(repo.git.diff(repo_hash, as_process=True), collector)
_interactive_git_proc(repo.git.diff(repo_hash, binary=True, as_process=True), collector)
for filename in repo.untracked_files:
_interactive_git_proc(
repo.git.diff("/dev/null", filename, no_index=True, binary=True, as_process=True),
Expand Down
Loading