-
-
Notifications
You must be signed in to change notification settings - Fork 232
Handle repo patch with non-UTF8 sequences #3918
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -58,7 +58,7 @@ class RemoteRepoInfo( | |
| class RemoteRunRepoData(RemoteRepoInfo): | ||
| repo_branch: Optional[str] = None | ||
| repo_hash: Optional[str] = None | ||
| repo_diff: Annotated[Optional[str], Field(exclude=True)] = None | ||
| repo_diff: Annotated[Optional[bytes], Field(exclude=True)] = None | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. (nit) Technically a breaking change for SDK users, since
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, but I'd consider all undocumented fields as private |
||
| repo_config_name: Optional[str] = None | ||
| repo_config_email: Optional[str] = None | ||
|
|
||
|
|
@@ -183,13 +183,15 @@ def __init__( | |
| def has_code_to_write(self) -> bool: | ||
| # repo_diff is: | ||
| # * None for RemoteRepo.from_url() | ||
| # * an empty string for RemoteRepo.from_dir() if there are no changes ("clean" state) | ||
| # * a non-empty string for RemoteRepo.from_dir() if there are changes ("dirty" state) | ||
| # * empty bytes for RemoteRepo.from_dir() if there are no changes ("clean" state) | ||
| # and untracked files | ||
| # * non-empty bytes for RemoteRepo.from_dir() if there are changes ("dirty" state) | ||
| # and/or untracked files | ||
| return bool(self.run_repo_data.repo_diff) | ||
|
|
||
| def write_code_file(self, fp: BinaryIO) -> str: | ||
| if self.run_repo_data.repo_diff is not None: | ||
| fp.write(self.run_repo_data.repo_diff.encode()) | ||
| fp.write(self.run_repo_data.repo_diff) | ||
| return get_sha256(fp) | ||
|
|
||
| def get_repo_info(self) -> RemoteRepoInfo: | ||
|
|
@@ -238,7 +240,7 @@ def __init__(self, warning_time: float, delay: float = 5): | |
| self.delay = delay | ||
| self.warned = False | ||
| self.start_time = time.monotonic() | ||
| self.buffer = io.StringIO() | ||
| self.buffer = io.BytesIO() | ||
|
|
||
| def timeout(self): | ||
| now = time.monotonic() | ||
|
|
@@ -256,9 +258,9 @@ def timeout(self): | |
| ) | ||
|
|
||
| def write(self, v: bytes): | ||
| self.buffer.write(v.decode()) | ||
| self.buffer.write(v) | ||
|
|
||
| def get(self) -> str: | ||
| def get(self) -> bytes: | ||
| if self.warned: | ||
| print() | ||
| return self.buffer.getvalue() | ||
|
|
@@ -366,10 +368,10 @@ def _interactive_git_proc( | |
| continue | ||
|
|
||
|
|
||
| def _repo_diff_verbose(repo: git.Repo, repo_hash: str, warning_time: float = 5) -> str: | ||
| def _repo_diff_verbose(repo: git.Repo, repo_hash: str, warning_time: float = 5) -> bytes: | ||
| collector = _DiffCollector(warning_time) | ||
| try: | ||
| _interactive_git_proc(repo.git.diff(repo_hash, as_process=True), collector) | ||
| _interactive_git_proc(repo.git.diff(repo_hash, binary=True, as_process=True), collector) | ||
| for filename in repo.untracked_files: | ||
| _interactive_git_proc( | ||
| repo.git.diff("/dev/null", filename, no_index=True, binary=True, as_process=True), | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like the PR also partially fixes and partially changes the behavior (from one unexpected to another unexpected) described #1679.
Locally committed binary files are now delivered as expected.
Locally committed large text files now cause
dstack applyto fail (previously, such files would be delivered with no contents).Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that large ASCII files (sometimes) are detected as binary files.
Note that both produce
Binary files ... differwithout--binarybut with--binarya "truly" binary file is handled correctly despite being larger (2GB vs 1.4GB) while with a base64-encoded file Git fails withfatal: unable to generate diff. Weird.Another unexpected behavior still present:
Uncommitted (untracked) large text files are empty (
git difffails but its exit status is ignored due toignore_status=True):