Skip to content

Commit 38d48d9

Browse files
hanouticelinacursoragentWauplin
authored
[CLI] Migrate models, datasets, spaces, papers to out singleton (#4026)
* Migrate models, datasets, spaces, papers to out singleton * handle special case for papers ls * rebase * test: update cli output table tests Co-authored-by: célina <hanouticelina@users.noreply.github.com> * Apply suggestions from code review Co-authored-by: célina <hanouticelina@gmail.com> * fix `papers_ls` JSON output and preserve full `submitted_by` object * fix `_format_table_value_human` treating 0 as empty cell * address review comments * Apply suggestions from code review Co-authored-by: Lucain <lucainp@gmail.com> * missing one --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: célina <hanouticelina@users.noreply.github.com> Co-authored-by: Lucain <lucainp@gmail.com>
1 parent 4e2337d commit 38d48d9

8 files changed

Lines changed: 126 additions & 140 deletions

File tree

docs/source/en/package_reference/cli.md

Lines changed: 15 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -950,7 +950,7 @@ $ hf datasets [OPTIONS] COMMAND [ARGS]...
950950

951951
### `hf datasets info`
952952

953-
Get info about a dataset on the Hub. Output is in JSON format.
953+
Get info about a dataset on the Hub.
954954

955955
**Usage**:
956956

@@ -966,6 +966,7 @@ $ hf datasets info [OPTIONS] DATASET_ID
966966

967967
* `--revision TEXT`: Git revision id which can be a branch name, a tag, or a commit hash.
968968
* `--expand TEXT`: Comma-separated properties to return. When used, only the listed properties (and id) are returned. Example: '--expand=downloads,likes,tags'. Valid: author, cardData, citation, createdAt, description, disabled, downloads, downloadsAllTime, gated, lastModified, likes, paperswithcode_id, private, resourceGroup, sha, siblings, tags, trendingScore, usedStorage.
969+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
969970
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
970971
* `--help`: Show this message and exit.
971972

@@ -996,8 +997,7 @@ $ hf datasets list [OPTIONS]
996997
* `--sort [created_at|downloads|last_modified|likes|trending_score]`: Sort results.
997998
* `--limit INTEGER`: Limit the number of results. [default: 10]
998999
* `--expand TEXT`: Comma-separated properties to return. When used, only the listed properties (and id) are returned. Example: '--expand=downloads,likes,tags'. Valid: author, cardData, citation, createdAt, description, disabled, downloads, downloadsAllTime, gated, lastModified, likes, paperswithcode_id, private, resourceGroup, sha, siblings, tags, trendingScore, usedStorage.
999-
* `--format [table|json]`: Output format (table or json). [default: table]
1000-
* `-q, --quiet`: Print only IDs (one per line).
1000+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
10011001
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
10021002
* `--help`: Show this message and exit.
10031003

@@ -1029,8 +1029,7 @@ $ hf datasets parquet [OPTIONS] DATASET_ID
10291029

10301030
* `--subset TEXT`: Filter parquet entries by subset/config.
10311031
* `--split TEXT`: Filter parquet entries by split.
1032-
* `--format [table|json]`: Output format (table or json). [default: table]
1033-
* `-q, --quiet`: Print only IDs (one per line).
1032+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
10341033
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
10351034
* `--help`: Show this message and exit.
10361035

@@ -1061,7 +1060,7 @@ $ hf datasets sql [OPTIONS] SQL
10611060

10621061
**Options**:
10631062

1064-
* `--format [table|json]`: Output format (table or json). [default: table]
1063+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
10651064
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
10661065
* `--help`: Show this message and exit.
10671066

@@ -2593,7 +2592,7 @@ $ hf models [OPTIONS] COMMAND [ARGS]...
25932592

25942593
### `hf models info`
25952594

2596-
Get info about a model on the Hub. Output is in JSON format.
2595+
Get info about a model on the Hub.
25972596

25982597
**Usage**:
25992598

@@ -2609,6 +2608,7 @@ $ hf models info [OPTIONS] MODEL_ID
26092608

26102609
* `--revision TEXT`: Git revision id which can be a branch name, a tag, or a commit hash.
26112610
* `--expand TEXT`: Comma-separated properties to return. When used, only the listed properties (and id) are returned. Example: '--expand=downloads,likes,tags'. Valid: author, baseModels, cardData, childrenModelCount, config, createdAt, disabled, downloads, downloadsAllTime, evalResults, gated, gguf, inference, inferenceProviderMapping, lastModified, library_name, likes, mask_token, model-index, pipeline_tag, private, resourceGroup, safetensors, sha, siblings, spaces, tags, transformersInfo, trendingScore, usedStorage, widgetData.
2611+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
26122612
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
26132613
* `--help`: Show this message and exit.
26142614

@@ -2640,8 +2640,7 @@ $ hf models list [OPTIONS]
26402640
* `--sort [created_at|downloads|last_modified|likes|trending_score]`: Sort results.
26412641
* `--limit INTEGER`: Limit the number of results. [default: 10]
26422642
* `--expand TEXT`: Comma-separated properties to return. When used, only the listed properties (and id) are returned. Example: '--expand=downloads,likes,tags'. Valid: author, baseModels, cardData, childrenModelCount, config, createdAt, disabled, downloads, downloadsAllTime, evalResults, gated, gguf, inference, inferenceProviderMapping, lastModified, library_name, likes, mask_token, model-index, pipeline_tag, private, resourceGroup, safetensors, sha, siblings, spaces, tags, transformersInfo, trendingScore, usedStorage, widgetData.
2643-
* `--format [table|json]`: Output format (table or json). [default: table]
2644-
* `-q, --quiet`: Print only IDs (one per line).
2643+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
26452644
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
26462645
* `--help`: Show this message and exit.
26472646

@@ -2678,7 +2677,7 @@ $ hf papers [OPTIONS] COMMAND [ARGS]...
26782677

26792678
### `hf papers info`
26802679

2681-
Get info about a paper on the Hub. Output is in JSON format.
2680+
Get info about a paper on the Hub.
26822681

26832682
**Usage**:
26842683

@@ -2692,6 +2691,7 @@ $ hf papers info [OPTIONS] PAPER_ID
26922691

26932692
**Options**:
26942693

2694+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
26952695
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
26962696
* `--help`: Show this message and exit.
26972697

@@ -2721,8 +2721,7 @@ $ hf papers list [OPTIONS]
27212721
* `--submitter TEXT`: Filter by username of the submitter.
27222722
* `--sort [publishedAt|trending]`: Sort results.
27232723
* `--limit INTEGER`: Limit the number of results. [default: 50]
2724-
* `--format [table|json]`: Output format (table or json). [default: table]
2725-
* `-q, --quiet`: Print only IDs (one per line).
2724+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
27262725
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
27272726
* `--help`: Show this message and exit.
27282727

@@ -2783,8 +2782,7 @@ $ hf papers search [OPTIONS] QUERY
27832782
**Options**:
27842783

27852784
* `--limit INTEGER`: Limit the number of results. [default: 20]
2786-
* `--format [table|json]`: Output format (table or json). [default: table]
2787-
* `-q, --quiet`: Print only IDs (one per line).
2785+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
27882786
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
27892787
* `--help`: Show this message and exit.
27902788

@@ -3434,7 +3432,7 @@ Learn more
34343432

34353433
### `hf spaces info`
34363434

3437-
Get info about a space on the Hub. Output is in JSON format.
3435+
Get info about a space on the Hub.
34383436

34393437
**Usage**:
34403438

@@ -3450,6 +3448,7 @@ $ hf spaces info [OPTIONS] SPACE_ID
34503448

34513449
* `--revision TEXT`: Git revision id which can be a branch name, a tag, or a commit hash.
34523450
* `--expand TEXT`: Comma-separated properties to return. When used, only the listed properties (and id) are returned. Example: '--expand=likes,tags'. Valid: author, cardData, createdAt, datasets, disabled, lastModified, likes, models, private, resourceGroup, runtime, sdk, sha, siblings, subdomain, tags, trendingScore, usedStorage.
3451+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
34533452
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
34543453
* `--help`: Show this message and exit.
34553454

@@ -3480,8 +3479,7 @@ $ hf spaces list [OPTIONS]
34803479
* `--sort [created_at|last_modified|likes|trending_score]`: Sort results.
34813480
* `--limit INTEGER`: Limit the number of results. [default: 10]
34823481
* `--expand TEXT`: Comma-separated properties to return. When used, only the listed properties (and id) are returned. Example: '--expand=likes,tags'. Valid: author, cardData, createdAt, datasets, disabled, lastModified, likes, models, private, resourceGroup, runtime, sdk, sha, siblings, subdomain, tags, trendingScore, usedStorage.
3483-
* `--format [table|json]`: Output format (table or json). [default: table]
3484-
* `-q, --quiet`: Print only IDs (one per line).
3482+
* `--format [agent|auto|human|json|quiet]`: Output format. [default: auto]
34853483
* `--token TEXT`: A User Access Token generated from https://huggingface.co/settings/tokens.
34863484
* `--help`: Show this message and exit.
34873485

src/huggingface_hub/cli/_output.py

Lines changed: 44 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
# limitations under the License.
1414
"""Output framework for the `hf` CLI."""
1515

16+
import dataclasses
1617
import datetime
1718
import json
1819
import re
@@ -72,25 +73,33 @@ def text(self, msg: str | None = None, *, human: str | None = None, agent: str |
7273

7374
def table(
7475
self,
75-
headers: list[str],
76-
rows: Sequence[list[Any]],
76+
items: Sequence[dict[str, Any]],
77+
*,
78+
headers: list[str] | None = None,
79+
id_key: str | None = None,
7780
alignments: dict[str, str] | None = None,
7881
) -> None:
7982
"""Print tabular data to stdout.
8083
8184
Args:
82-
headers: Column names.
83-
rows: List of rows, each a list of raw values.
85+
items: List of dicts. Headers are auto-detected from keys if not provided.
86+
headers: Explicit column names. If None, derived from dict keys (all-None columns filtered).
87+
id_key: Key to print in quiet mode. If None, uses the first header.
8488
alignments: Optional mapping of header name to "left" or "right". Defaults to "left".
8589
"""
86-
if not rows:
90+
if not items:
8791
match self.mode:
8892
case OutputFormatWithAuto.agent | OutputFormatWithAuto.human:
8993
print("No results found.")
9094
case OutputFormatWithAuto.json:
9195
print("[]")
9296
return
9397

98+
if headers is None:
99+
all_columns = list(items[0].keys())
100+
headers = [col for col in all_columns if any(item.get(col) is not None for item in items)]
101+
rows = [[item.get(h) for h in headers] for item in items]
102+
94103
match self.mode:
95104
case OutputFormatWithAuto.human: # padded table, truncated cells, SCREAMING_SNAKE headers
96105
formatted_rows: list[list[str | int]] = [[_format_table_cell_human(v) for v in row] for row in rows]
@@ -102,14 +111,19 @@ def table(
102111
for row in rows:
103112
print("\t".join(_format_table_cell_agent(v) for v in row))
104113
case OutputFormatWithAuto.json: # compact JSON array
105-
items = [dict(zip(headers, row)) for row in rows]
106-
print(json.dumps(items, default=str))
107-
case OutputFormatWithAuto.quiet: # first column only, one per line
108-
for row in rows:
109-
print(row[0])
114+
print(json.dumps(list(items), default=str))
115+
case OutputFormatWithAuto.quiet: # id_key column (or first column), one per line
116+
quiet_key = id_key or headers[0]
117+
for item in items:
118+
print(item.get(quiet_key, ""))
119+
120+
def dict(self, data: Any) -> None:
121+
"""Print structured data as JSON in all modes (indented for human, compact otherwise).
110122
111-
def dict(self, data: dict[str, Any]) -> None:
112-
"""Print structured data as JSON in all modes (indented for human, compact otherwise)."""
123+
Accepts a dict or a dataclass.
124+
"""
125+
if dataclasses.is_dataclass(data) and not isinstance(data, type):
126+
data = _dataclass_to_dict(data)
113127
indent = 2 if self.mode == OutputFormatWithAuto.human else None
114128
print(json.dumps(data, indent=indent, default=str))
115129

@@ -156,6 +170,23 @@ def hint(self, message: str) -> None:
156170

157171
# HELPERS
158172

173+
174+
def _serialize_value(v: object) -> object:
175+
"""Recursively serialize a value to be JSON-compatible."""
176+
if isinstance(v, datetime.datetime):
177+
return v.isoformat()
178+
elif isinstance(v, dict):
179+
return {key: _serialize_value(val) for key, val in v.items() if val is not None}
180+
elif isinstance(v, list):
181+
return [_serialize_value(item) for item in v]
182+
return v
183+
184+
185+
def _dataclass_to_dict(info: Any) -> dict[str, Any]:
186+
"""Convert a dataclass to a json-serializable dict."""
187+
return {k: _serialize_value(v) for k, v in dataclasses.asdict(info).items() if v is not None}
188+
189+
159190
_ANSI_RE = re.compile(r"\033\[[0-9;]*m")
160191
_MAX_CELL_LENGTH = 35
161192

@@ -172,7 +203,7 @@ def _to_header(name: str) -> str:
172203

173204
def _format_table_value_human(value: Any) -> str:
174205
"""Convert a value to string for terminal display."""
175-
if not value:
206+
if value is None:
176207
return ""
177208
if isinstance(value, bool):
178209
return "✔" if value else ""

src/huggingface_hub/cli/datasets.py

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@
2525
"""
2626

2727
import enum
28-
import json
2928
from typing import Annotated, get_args
3029

3130
import typer
@@ -37,19 +36,17 @@
3736
from ._cli_utils import (
3837
AuthorOpt,
3938
FilterOpt,
40-
FormatOpt,
39+
FormatWithAutoOpt,
4140
LimitOpt,
42-
OutputFormat,
43-
QuietOpt,
4441
RevisionOpt,
4542
SearchOpt,
4643
TokenOpt,
4744
api_object_to_dict,
4845
get_hf_api,
4946
make_expand_properties_parser,
50-
print_list_output,
5147
typer_factory,
5248
)
49+
from ._output import OutputFormatWithAuto, out
5350

5451

5552
_EXPAND_PROPERTIES = sorted(get_args(ExpandDatasetProperty_T))
@@ -87,8 +84,7 @@ def datasets_ls(
8784
] = None,
8885
limit: LimitOpt = 10,
8986
expand: ExpandOpt = None,
90-
format: FormatOpt = OutputFormat.table,
91-
quiet: QuietOpt = False,
87+
format: FormatWithAutoOpt = OutputFormatWithAuto.auto,
9288
token: TokenOpt = None,
9389
) -> None:
9490
"""List datasets on the Hub."""
@@ -105,7 +101,7 @@ def datasets_ls(
105101
expand=expand, # type: ignore
106102
)
107103
]
108-
print_list_output(results, format=format, quiet=quiet)
104+
out.table(results)
109105

110106

111107
@datasets_cli.command(
@@ -119,17 +115,18 @@ def datasets_info(
119115
dataset_id: Annotated[str, typer.Argument(help="The dataset ID (e.g. `username/repo-name`).")],
120116
revision: RevisionOpt = None,
121117
expand: ExpandOpt = None,
118+
format: FormatWithAutoOpt = OutputFormatWithAuto.auto,
122119
token: TokenOpt = None,
123120
) -> None:
124-
"""Get info about a dataset on the Hub. Output is in JSON format."""
121+
"""Get info about a dataset on the Hub."""
125122
api = get_hf_api(token=token)
126123
try:
127124
info = api.dataset_info(repo_id=dataset_id, revision=revision, expand=expand) # type: ignore
128125
except RepositoryNotFoundError as e:
129126
raise CLIError(f"Dataset '{dataset_id}' not found.") from e
130127
except RevisionNotFoundError as e:
131128
raise CLIError(f"Revision '{revision}' not found on '{dataset_id}'.") from e
132-
print(json.dumps(api_object_to_dict(info), indent=2))
129+
out.dict(info)
133130

134131

135132
@datasets_cli.command(
@@ -145,8 +142,7 @@ def datasets_parquet(
145142
dataset_id: Annotated[str, typer.Argument(help="The dataset ID (e.g. `username/repo-name`).")],
146143
subset: Annotated[str | None, typer.Option("--subset", help="Filter parquet entries by subset/config.")] = None,
147144
split: Annotated[str | None, typer.Option(help="Filter parquet entries by split.")] = None,
148-
format: FormatOpt = OutputFormat.table,
149-
quiet: QuietOpt = False,
145+
format: FormatWithAutoOpt = OutputFormatWithAuto.auto,
150146
token: TokenOpt = None,
151147
) -> None:
152148
"""List parquet file URLs available for a dataset."""
@@ -156,7 +152,7 @@ def datasets_parquet(
156152
results = [
157153
{"subset": entry.config, "split": entry.split, "url": entry.url, "size": entry.size} for entry in filtered
158154
]
159-
print_list_output(results, format=format, quiet=quiet, id_key="url")
155+
out.table(results, headers=["subset", "split", "url", "size"], id_key="url")
160156

161157

162158
@datasets_cli.command(
@@ -168,13 +164,12 @@ def datasets_parquet(
168164
)
169165
def datasets_sql(
170166
sql: Annotated[str, typer.Argument(help="Raw SQL query to execute.")],
171-
format: FormatOpt = OutputFormat.table,
167+
format: FormatWithAutoOpt = OutputFormatWithAuto.auto,
172168
token: TokenOpt = None,
173169
) -> None:
174170
"""Execute a raw SQL query with DuckDB against dataset parquet URLs."""
175171
try:
176172
result = execute_raw_sql_query(sql_query=sql, token=token)
177173
except ImportError as e:
178174
raise CLIError(str(e)) from e
179-
180-
print_list_output(result, format=format, quiet=False)
175+
out.table(result)

0 commit comments

Comments
 (0)