Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395
Open
Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395
Conversation
…installation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Add support for .tar.gz and .tgz archives
Add .tar.gz / .tgz archive support for extension, preset, and workflow installation
Apr 28, 2026
…mlink rejection test Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0469bac5-296a-46b6-b84e-eb33b0dc0fce Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Contributor
Author
Applied both changes in commit e0495eb:
|
Comment on lines
+111
to
+116
| def _detect_archive_format(url: str, content_type: str = "") -> str: | ||
| """Detect archive format from URL path extension or Content-Type header. | ||
|
|
||
| Args: | ||
| url: URL or file path to inspect. | ||
| content_type: Optional ``Content-Type`` header value from the HTTP response. |
Comment on lines
+4926
to
+4931
| def _extract_workflow_yml(archive_path: Path, archive_fmt: str) -> bytes: | ||
| """Extract ``workflow.yml`` from a ZIP or ``.tar.gz`` archive. | ||
|
|
||
| Searches the archive root and a single nested top-level subdirectory | ||
| (e.g., ``repo-name-1.0/workflow.yml``). | ||
|
|
Collaborator
|
@copilot apply changes based on the comments in this thread |
…add archive CLI tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/845e41d1-75e3-49fb-a580-a7fb805dd716 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Contributor
Author
Applied both changes in commit 0a02369:
|
Contributor
There was a problem hiding this comment.
Copilot's findings
Comments suppressed due to low confidence (1)
src/specify_cli/init.py:3669
archive_path.write_bytes(archive_data)can raiseOSError/IOError(disk full, permission issues), but this block only handlesURLError. That meansspecify extension add --from ...may crash with a traceback instead of emitting a controlled CLI error. Addexcept OSError(or broader I/O handling) around the write/install steps and surface a clear message before exiting.
suffix = ".tar.gz" if archive_fmt == "tar.gz" else ".zip"
safe_name = Path(extension).name or "extension"
archive_path = download_dir / f"{safe_name}-url-download{suffix}"
archive_path.write_bytes(archive_data)
- Files reviewed: 6/6 changed files
- Comments generated: 2
| if f is not None: | ||
| with f: | ||
| manifest_data = yaml.safe_load(f.read()) or {} | ||
| except KeyError: |
Comment on lines
+2653
to
+2654
| archive_path.write_bytes(archive_data) | ||
| manifest = manager.install_from_zip(archive_path, speckit_version, priority) |
- Fix tar.gz extractfile() None fallback in extension_update: nested-directory search now runs whenever manifest_data is still None, not only on KeyError - Add OSError handling around write_bytes in preset --from URL path - Add OSError handling around write_bytes in extension --from URL path
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Comment on lines
+204
to
+208
| # Only allow regular files and directories. | ||
| if not (member.isreg() or member.isdir()): | ||
| raise error_class( | ||
| f"Non-regular file in archive: {member.name}" | ||
| ) |
Comment on lines
+1891
to
+1893
| with __import__("unittest.mock", fromlist=["patch"]).patch.object( | ||
| __import__("pathlib", fromlist=["Path"]).Path, "cwd", return_value=project_dir | ||
| ): |
Collaborator
|
@copilot apply changes based on the comments in this thread |
…ock imports in workflow tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/c1fcc1ff-8766-4d97-90a5-368447980acf Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>
Contributor
Author
Applied both changes in commit 05798a9:
|
Contributor
There was a problem hiding this comment.
Copilot's findings
Comments suppressed due to low confidence (2)
src/specify_cli/extensions.py:231
- In the Python 3.12+ branch,
safe_extract_tarball()validates members intosafe_members(and intentionally skips PAX/GNU metadata entries), but then callstf.extractall(dest_dir, filter="data")without restricting extraction tosafe_members. This makes extraction behavior inconsistent across Python versions and can reintroduce metadata entries (or other unexpected members) into the extraction step. Passmembers=safe_membersin the 3.12+ branch as well (and keep thefilter="data"hardening), or extract only the validated members explicitly.
# Extract — use the "data" filter on Python 3.12+ for extra hardening.
# On older versions pass only the pre-validated members so that no
# unvetted entry (added concurrently or via a race) slips through.
if sys.version_info >= (3, 12):
tf.extractall(dest_dir, filter="data") # type: ignore[call-arg]
else:
tf.extractall(dest_dir, members=safe_members) # noqa: S202 — validated above
src/specify_cli/init.py:3663
extension add --fromdetects archive format from the original URL and optionalContent-Type, but does not validate the final URL after redirects. An HTTPS URL can redirect to non-HTTPS, and format detection can be wrong if the redirect changes the effective filename/extension. Consider validatingresponse.geturl()against the HTTPS/localhost policy and runningdetect_archive_format()against that final URL.
with urllib.request.urlopen(from_url, timeout=60) as response:
if not archive_fmt:
content_type = response.headers.get("Content-Type", "")
archive_fmt = detect_archive_format(from_url, content_type)
archive_data = response.read()
- Files reviewed: 6/6 changed files
- Comments generated: 4
Comment on lines
+176
to
+181
| _TAR_METADATA_TYPES = ( | ||
| tarfile.XHDTYPE, # PAX extended header | ||
| tarfile.XGLTYPE, # PAX global extended header | ||
| tarfile.SOLARIS_XHDTYPE, # Solaris PAX extended header | ||
| *tarfile.GNU_TYPES, # GNU longname / longlink / sparse | ||
| ) |
Comment on lines
+2635
to
+2640
| archive_fmt = _det_fmt(from_url) | ||
| try: | ||
| with urllib.request.urlopen(from_url, timeout=60) as response: | ||
| zip_path.write_bytes(response.read()) | ||
| if not archive_fmt: | ||
| content_type = response.headers.get("Content-Type", "") | ||
| archive_fmt = _det_fmt(from_url, content_type) |
Comment on lines
+2155
to
+2164
| # Detect archive format from URL; resolve via Content-Type when needed. | ||
| archive_fmt = detect_archive_format(download_url) | ||
|
|
||
| # Download the archive | ||
| try: | ||
| with self._open_url(download_url, timeout=60) as response: | ||
| zip_data = response.read() | ||
|
|
||
| zip_path.write_bytes(zip_data) | ||
| return zip_path | ||
| if not archive_fmt: | ||
| content_type = response.headers.get("Content-Type", "") | ||
| archive_fmt = detect_archive_format(download_url, content_type) | ||
| archive_data = response.read() |
Comment on lines
+2316
to
+2324
| # Detect archive format from URL; resolve via Content-Type when needed. | ||
| archive_fmt = detect_archive_format(download_url) | ||
|
|
||
| try: | ||
| with self._open_url(download_url, timeout=60) as response: | ||
| zip_data = response.read() | ||
|
|
||
| zip_path.write_bytes(zip_data) | ||
| return zip_path | ||
| if not archive_fmt: | ||
| content_type = response.headers.get("Content-Type", "") | ||
| archive_fmt = detect_archive_format(download_url, content_type) | ||
| archive_data = response.read() |
Collaborator
|
@copilot apply changes based on the comments in this thread |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The extension, preset, and workflow download pipelines only accepted ZIP archives, blocking use of npm registries and CI artifact stores that serve tarballs natively.
Core utilities (
extensions.py)detect_archive_format(url, content_type="")— infers format from URL path extension (.zip,.tar.gz,.tgz) withContent-Typeheader fallback (application/gzip,application/x-gzip,application/x-tar+gzip)safe_extract_tarball(archive_path, dest_dir, error_class)— safe extraction with:..traversal, symlinks, hard links, devices, and FIFOsXHDTYPE,XGLTYPE,SOLARIS_XHDTYPE) and GNU metadata entries (GNU_TYPES: longname/longlink/sparse) are silently skipped — they carry no extractable payload and are emitted by many common archiving toolssafe_memberslist toextractall()tarfile.data_filterfor additional OS-level protectiontarfile.TarError/OSErrorare caught and re-raised as the caller-suppliederror_classfor consistent error handlingBoth helpers are public (no underscore prefix) and imported directly by
presets.pyand__init__.py.Extensions & presets
install_from_zip()on both managers now detects archive format from the file extension and dispatches to ZIP or tarball extraction accordingly — existing callers are unaffecteddownload_extension()/download_pack()detect format from the download URL (orContent-Typefallback) and persist the archive with the correct extension (.zipor.tar.gz); unknown formats are rejected with a clear error rather than silently defaulting to ZIP__init__.pycall sitesextension add --frompreset add --fromextension updatePath(extension).nameto prevent path traversalworkflow add(URL)workflow.ymlfrom archive when URL points to one; temp-file paths initialized before write to avoidUnboundLocalErroron disk-fullworkflow add(local).tar.gz/.tgz/.ziparchive files (case-insensitive detection)workflow add(catalog)A shared
_extract_workflow_yml(archive_path, fmt)helper handles root-level and single-nested-directory layouts for both formats, withtarfile.extractfile()handles properly closed via context managers.Tests
30 new tests across
test_extensions.py,test_presets.py, andtest_workflows.pycovering:test_extensions.py: format detection (URL + Content-Type), flat and nested tarball install, missing manifest errors, path traversal rejection, symlink rejectiontest_presets.py: flat and nested tarball install, missing manifest errors, path traversal rejection, symlink rejectiontest_workflows.py(TestWorkflowAddArchive, 9 CLI-level tests): local ZIP (flat/nested), local.tar.gz(flat/nested),.tgzalias, missingworkflow.ymlerror cases, URL-based archive download for both ZIP and tar.gz formats