Add .tar.gz / .tgz archive support for extension, preset, and workflow installation by Copilot · Pull Request #2395 · github/spec-kit

Copilot · 2026-04-28T17:49:12Z

The extension, preset, and workflow download pipelines only accepted ZIP archives, blocking use of npm registries and CI artifact stores that serve tarballs natively.

Core utilities (`extensions.py`)

detect_archive_format(url, content_type="") — infers format from URL path extension (.zip, .tar.gz, .tgz) with Content-Type header fallback (application/gzip, application/x-gzip, application/x-tar+gzip)
safe_extract_tarball(archive_path, dest_dir, error_class) — safe extraction with:
- Pre-extraction validation of all members: rejects absolute paths, .. traversal, symlinks, hard links, devices, and FIFOs
- PAX headers (XHDTYPE, XGLTYPE, SOLARIS_XHDTYPE) and GNU metadata entries (GNU_TYPES: longname/longlink/sparse) are silently skipped — they carry no extractable payload and are emitted by many common archiving tools
- Python 3.11: passes pre-validated safe_members list to extractall()
- Python 3.12+: uses tarfile.data_filter for additional OS-level protection
- tarfile.TarError/OSError are caught and re-raised as the caller-supplied error_class for consistent error handling

Both helpers are public (no underscore prefix) and imported directly by presets.py and __init__.py.

Extensions & presets

install_from_zip() on both managers now detects archive format from the file extension and dispatches to ZIP or tarball extraction accordingly — existing callers are unaffected
download_extension() / download_pack() detect format from the download URL (or Content-Type fallback) and persist the archive with the correct extension (.zip or .tar.gz); unknown formats are rejected with a clear error rather than silently defaulting to ZIP

`init.py` call sites

Path	Change
`extension add --from`	Detects format from URL/Content-Type before saving; unknown formats rejected
`preset add --from`	Same
`extension update`	Inline manifest peek handles both ZIP and tar.gz; cache filename sanitized via `Path(extension).name` to prevent path traversal
`workflow add` (URL)	Extracts `workflow.yml` from archive when URL points to one; temp-file paths initialized before write to avoid `UnboundLocalError` on disk-full
`workflow add` (local)	Accepts local `.tar.gz`/`.tgz`/`.zip` archive files (case-insensitive detection)
`workflow add` (catalog)	Same archive detection for catalog-sourced URLs

A shared _extract_workflow_yml(archive_path, fmt) helper handles root-level and single-nested-directory layouts for both formats, with tarfile.extractfile() handles properly closed via context managers.

Tests

30 new tests across test_extensions.py, test_presets.py, and test_workflows.py covering:

test_extensions.py: format detection (URL + Content-Type), flat and nested tarball install, missing manifest errors, path traversal rejection, symlink rejection
test_presets.py: flat and nested tarball install, missing manifest errors, path traversal rejection, symlink rejection
test_workflows.py (TestWorkflowAddArchive, 9 CLI-level tests): local ZIP (flat/nested), local .tar.gz (flat/nested), .tgz alias, missing workflow.yml error cases, URL-based archive download for both ZIP and tar.gz formats

…installation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

…mlink rejection test Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0469bac5-296a-46b6-b84e-eb33b0dc0fce Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot · 2026-05-06T21:27:53Z

@copilot apply changes based on the comments in this thread

Applied both changes in commit e0495eb:

arc_tmp_path unbound local (__init__.py:5082, 5130, 5234): In all three workflow add code paths, the archive path variable is now initialized to None and assigned before the write() call, then the finally guard uses if arc_tmp_path is not None: ...unlink() to ensure safe cleanup even when the write fails.
Missing preset symlink test (test_presets.py): Added test_install_from_tar_gz_rejects_symlinks to TestPresetManager, mirroring the extension-side test — verifies that install_from_zip() raises PresetValidationError with "Symlinks" when the tarball contains a symlink entry.

Copilot

Copilot's findings

Files reviewed: 5/5 changed files
Comments generated: 2

+def _detect_archive_format(url: str, content_type: str = "") -> str:
+    """Detect archive format from URL path extension or Content-Type header.
+
+    Args:
+        url: URL or file path to inspect.
+        content_type: Optional ``Content-Type`` header value from the HTTP response.


+def _extract_workflow_yml(archive_path: Path, archive_fmt: str) -> bytes:
+    """Extract ``workflow.yml`` from a ZIP or ``.tar.gz`` archive.
+
+    Searches the archive root and a single nested top-level subdirectory
+    (e.g., ``repo-name-1.0/workflow.yml``).
+


mnriem · 2026-05-06T21:44:01Z

@copilot apply changes based on the comments in this thread

…add archive CLI tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/845e41d1-75e3-49fb-a580-a7fb805dd716 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot · 2026-05-06T21:50:36Z

@copilot apply changes based on the comments in this thread

Applied both changes in commit 0a02369:

Public API (extensions.py): Renamed _detect_archive_format → detect_archive_format and _safe_extract_tarball → safe_extract_tarball (dropped the underscore prefix). All callers in extensions.py, presets.py, __init__.py, and tests/test_extensions.py have been updated accordingly.
CLI-level tests (test_workflows.py): Added TestWorkflowAddArchive with 9 tests covering local ZIP (flat/nested), local .tar.gz (flat/nested), .tgz alias, missing workflow.yml error cases, and URL-based download for both ZIP and tar.gz formats.

Copilot

Copilot's findings

Comments suppressed due to low confidence (1)

src/specify_cli/init.py:3669

archive_path.write_bytes(archive_data) can raise OSError/IOError (disk full, permission issues), but this block only handles URLError. That means specify extension add --from ... may crash with a traceback instead of emitting a controlled CLI error. Add except OSError (or broader I/O handling) around the write/install steps and surface a clear message before exiting.

                    suffix = ".tar.gz" if archive_fmt == "tar.gz" else ".zip"
                    safe_name = Path(extension).name or "extension"
                    archive_path = download_dir / f"{safe_name}-url-download{suffix}"
                    archive_path.write_bytes(archive_data)

Files reviewed: 6/6 changed files
Comments generated: 2

+                                if f is not None:
+                                    with f:
+                                        manifest_data = yaml.safe_load(f.read()) or {}
+                            except KeyError:


+                archive_path.write_bytes(archive_data)
+                manifest = manager.install_from_zip(archive_path, speckit_version, priority)


- Fix tar.gz extractfile() None fallback in extension_update: nested-directory search now runs whenever manifest_data is still None, not only on KeyError - Add OSError handling around write_bytes in preset --from URL path - Add OSError handling around write_bytes in extension --from URL path

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Copilot

Copilot's findings

Files reviewed: 6/6 changed files
Comments generated: 2

+                # Only allow regular files and directories.
+                if not (member.isreg() or member.isdir()):
+                    raise error_class(
+                        f"Non-regular file in archive: {member.name}"
+                    )


+        with __import__("unittest.mock", fromlist=["patch"]).patch.object(
+            __import__("pathlib", fromlist=["Path"]).Path, "cwd", return_value=project_dir
+        ):


mnriem · 2026-05-07T17:52:04Z

@copilot apply changes based on the comments in this thread

…ock imports in workflow tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/c1fcc1ff-8766-4d97-90a5-368447980acf Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot · 2026-05-07T17:57:16Z

@copilot apply changes based on the comments in this thread

Applied both changes in commit 05798a9:

safe_extract_tarball() metadata members: PAX headers (XHDTYPE, XGLTYPE, SOLARIS_XHDTYPE) and GNU metadata entries (GNU_TYPES: longname/longlink/sparse) are now skipped instead of rejected. The _TAR_METADATA_TYPES tuple is defined once before the loop for efficiency. Symlinks, hard links, devices, and FIFOs are still rejected; only regular files and directories are extracted.
Test imports: Replaced the __import__(...) inline pattern in all 9 TestWorkflowAddArchive tests with from unittest.mock import patch / patch.object(Path, "cwd", ...) — consistent with the rest of the file.

Copilot

Copilot's findings

Comments suppressed due to low confidence (2)

src/specify_cli/extensions.py:231

In the Python 3.12+ branch, safe_extract_tarball() validates members into safe_members (and intentionally skips PAX/GNU metadata entries), but then calls tf.extractall(dest_dir, filter="data") without restricting extraction to safe_members. This makes extraction behavior inconsistent across Python versions and can reintroduce metadata entries (or other unexpected members) into the extraction step. Pass members=safe_members in the 3.12+ branch as well (and keep the filter="data" hardening), or extract only the validated members explicitly.

            # Extract — use the "data" filter on Python 3.12+ for extra hardening.
            # On older versions pass only the pre-validated members so that no
            # unvetted entry (added concurrently or via a race) slips through.
            if sys.version_info >= (3, 12):
                tf.extractall(dest_dir, filter="data")  # type: ignore[call-arg]
            else:
                tf.extractall(dest_dir, members=safe_members)  # noqa: S202 — validated above

src/specify_cli/init.py:3663

extension add --from detects archive format from the original URL and optional Content-Type, but does not validate the final URL after redirects. An HTTPS URL can redirect to non-HTTPS, and format detection can be wrong if the redirect changes the effective filename/extension. Consider validating response.geturl() against the HTTPS/localhost policy and running detect_archive_format() against that final URL.

                    with urllib.request.urlopen(from_url, timeout=60) as response:
                        if not archive_fmt:
                            content_type = response.headers.get("Content-Type", "")
                            archive_fmt = detect_archive_format(from_url, content_type)
                        archive_data = response.read()

Files reviewed: 6/6 changed files
Comments generated: 4

+    _TAR_METADATA_TYPES = (
+        tarfile.XHDTYPE,          # PAX extended header
+        tarfile.XGLTYPE,          # PAX global extended header
+        tarfile.SOLARIS_XHDTYPE,  # Solaris PAX extended header
+        *tarfile.GNU_TYPES,       # GNU longname / longlink / sparse
+    )


+                archive_fmt = _det_fmt(from_url)
                try:
                    with urllib.request.urlopen(from_url, timeout=60) as response:
-                        zip_path.write_bytes(response.read())
+                        if not archive_fmt:
+                            content_type = response.headers.get("Content-Type", "")
+                            archive_fmt = _det_fmt(from_url, content_type)


+        # Detect archive format from URL; resolve via Content-Type when needed.
+        archive_fmt = detect_archive_format(download_url)
+
+        # Download the archive
        try:
            with self._open_url(download_url, timeout=60) as response:
-                zip_data = response.read()
-
-            zip_path.write_bytes(zip_data)
-            return zip_path
+                if not archive_fmt:
+                    content_type = response.headers.get("Content-Type", "")
+                    archive_fmt = detect_archive_format(download_url, content_type)
+                archive_data = response.read()


+        # Detect archive format from URL; resolve via Content-Type when needed.
+        archive_fmt = detect_archive_format(download_url)

        try:
            with self._open_url(download_url, timeout=60) as response:
-                zip_data = response.read()
-
-            zip_path.write_bytes(zip_data)
-            return zip_path
+                if not archive_fmt:
+                    content_type = response.headers.get("Content-Type", "")
+                    archive_fmt = detect_archive_format(download_url, content_type)
+                archive_data = response.read()


mnriem · 2026-05-07T18:46:14Z

@copilot apply changes based on the comments in this thread

Initial plan

1bda2f0

Copilot AI assigned Copilot and mnriem Apr 28, 2026

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 17:49

Copilot started work on behalf of mnriem April 28, 2026 17:49 View session

Copilot AI linked an issue Apr 28, 2026 that may be closed by this pull request

[Enhancement] Support .tar.gz / .tgz archives for extension, preset, and workflow installation #2377

Open

Add .tar.gz/.tgz archive support for extension, preset, and workflow …

a434e5a

…installation Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:04

Address code review: fix import style and rename local aliases

b37f117

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:06

Improve tarball extraction security and cleanup logic

b3a60f5

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:09

Remove unnecessary import aliases, use consistent names

d78ead1

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:12

Fix type hint, add null checks for tf.extractfile() return value

0c6cc45

Agent-Logs-Url: https://github.com/github/spec-kit/sessions/9fb9a8ea-0967-4baf-b95c-7101e423ff58 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:14

Copilot AI changed the title ~~[WIP] Add support for .tar.gz and .tgz archives~~ Add .tar.gz / .tgz archive support for extension, preset, and workflow installation Apr 28, 2026

Copilot finished work on behalf of mnriem April 28, 2026 18:16

Copilot AI requested a review from mnriem April 28, 2026 18:16

github-code-quality Bot found potential problems Apr 28, 2026

View reviewed changes

Comment thread src/specify_cli/__init__.py Fixed

mnriem marked this pull request as ready for review May 5, 2026 22:58

mnriem requested review from Copilot and removed request for Copilot May 5, 2026 22:58

Fix arc_tmp_path UnboundLocalError in workflow install; add preset sy…

e0495eb

…mlink rejection test Agent-Logs-Url: https://github.com/github/spec-kit/sessions/0469bac5-296a-46b6-b84e-eb33b0dc0fce Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot finished work on behalf of mnriem May 6, 2026 21:28

mnriem requested a review from Copilot May 6, 2026 21:29

Copilot started reviewing on behalf of mnriem May 6, 2026 21:30 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

Copilot started work on behalf of mnriem May 6, 2026 21:44 View session

Make detect_archive_format/safe_extract_tarball public; add workflow …

0a02369

…add archive CLI tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/845e41d1-75e3-49fb-a580-a7fb805dd716 Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot finished work on behalf of mnriem May 6, 2026 21:51

mnriem requested a review from Copilot May 6, 2026 21:55

Copilot started reviewing on behalf of mnriem May 6, 2026 21:56 View session

Copilot AI reviewed May 6, 2026

View reviewed changes

github-code-quality Bot found potential problems May 7, 2026

View reviewed changes

Comment thread src/specify_cli/__init__.py Fixed

Potential fix for pull request finding 'Empty except'

bd04937

Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>

Copilot AI review requested due to automatic review settings May 7, 2026 16:39

Copilot started reviewing on behalf of mnriem May 7, 2026 16:40 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Copilot started work on behalf of mnriem May 7, 2026 17:52 View session

Skip PAX/GNU metadata members in safe_extract_tarball; use standard m…

05798a9

…ock imports in workflow tests Agent-Logs-Url: https://github.com/github/spec-kit/sessions/c1fcc1ff-8766-4d97-90a5-368447980acf Co-authored-by: mnriem <15701806+mnriem@users.noreply.github.com>

Copilot finished work on behalf of mnriem May 7, 2026 17:58

mnriem requested a review from Copilot May 7, 2026 18:32

Copilot started reviewing on behalf of mnriem May 7, 2026 18:32 View session

Copilot AI reviewed May 7, 2026

View reviewed changes

Copilot started work on behalf of mnriem May 7, 2026 18:46 View session

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395

Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395
Copilot wants to merge 15 commits intomainfrom
copilot/add-tar-gz-support

Copilot AI commented Apr 28, 2026 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mnriem commented May 6, 2026

Uh oh!

Copilot AI commented May 6, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

mnriem commented May 7, 2026

Uh oh!

Copilot AI commented May 7, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

mnriem commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		archive_path.write_bytes(archive_data)
		manifest = manager.install_from_zip(archive_path, speckit_version, priority)

Conversation

Copilot AI commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Core utilities (extensions.py)

Extensions & presets

__init__.py call sites

Tests

Uh oh!

Uh oh!

Copilot AI commented May 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

mnriem commented May 6, 2026

Uh oh!

Copilot AI commented May 6, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

mnriem commented May 7, 2026

Uh oh!

Copilot AI commented May 7, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Copilot's findings

Uh oh!

mnriem commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Apr 28, 2026 •

edited

Loading

Core utilities (`extensions.py`)

`init.py` call sites