Skip to content

docs: add HF ecosystem context to push-to-hub dev notes#474

Merged
nabinchha merged 3 commits intoNVIDIA-NeMo:nmulepati/docs/dev-notes-push-to-huggingface-hubfrom
davanstrien:devnotes-hub-suggestions
Mar 30, 2026
Merged

docs: add HF ecosystem context to push-to-hub dev notes#474
nabinchha merged 3 commits intoNVIDIA-NeMo:nmulepati/docs/dev-notes-push-to-huggingface-hubfrom
davanstrien:devnotes-hub-suggestions

Conversation

@davanstrien
Copy link
Copy Markdown
Contributor

Summary

  • Add "What You Get on the Hub" section covering Dataset Viewer, streaming, and Dataset Viewer API
  • Link to Hub search filtered by DataDesigner library tag
  • Note that private=True datasets can be flipped to public later

cc @nabinchha

Add section on what datasets get on the Hub (Dataset Viewer, streaming,
Viewer API), link to Hub search for DataDesigner datasets, and note that
private datasets can be flipped to public.
@davanstrien davanstrien requested a review from a team as a code owner March 30, 2026 14:28
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 30, 2026

Greptile Summary

This PR enriches the "Push Datasets to Hugging Face Hub" dev-note with Hugging Face ecosystem context contributed by a Hugging Face ML Librarian, and adds them as a co-author. The changes are purely documentation.

  • Adds a new "What You Get on the Hub" section covering Dataset Viewer, parquet streaming (with a self-contained, importable code snippet), and the Dataset Viewer API.
  • Updates the Hub search link to use the clean ?library=datadesigner query parameter (the doubled library:datadesigner value from the previous draft is resolved).
  • Extends the private=True gotcha note to mention that visibility can be flipped to public later from the dataset settings page.
  • Registers davanstrien as a new author in .authors.yml.

Confidence Score: 5/5

Safe to merge — documentation-only changes with no code impact.

All prior review concerns (missing import, doubled URL prefix) have been addressed in this revision. No logic, security, or correctness issues exist in a docs-only PR of this nature.

No files require special attention.

Important Files Changed

Filename Overview
docs/devnotes/.authors.yml Adds Daniel van Strien (Hugging Face) as a new author entry — correct YAML structure, valid GitHub avatar URL.
docs/devnotes/posts/push-datasets-to-hugging-face-hub.md Adds "What You Get on the Hub" section with Dataset Viewer, streaming snippet (with import), and Dataset Viewer API; updates Hub search URL to the clean ?library=datadesigner form; expands the private=True note. Both previously flagged issues (missing import, doubled URL prefix) are resolved in this revision.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DataDesigner push_to_hub] --> B[Hugging Face Hub]
    B --> C[Dataset Viewer\nbrowsable in browser]
    B --> D[Parquet files\non HF storage]
    D --> E[Streaming\nload_dataset with streaming=True]
    D --> F[Dataset Viewer API\nrow pagination / search / stats]
    B --> G[Hub Search\n?library=datadesigner]
Loading

Reviews (3): Last reviewed commit: "fix: remove doubled library: prefix in H..." | Re-trigger Greptile

Tags default to `["synthetic", "datadesigner"]` plus whatever you pass in.
Size category (`n<1K`, `1K<n<10K`, etc.) is auto-computed.
Size category (`n<1K`, `1K<n<10K`, etc.) is auto-computed. These tags make your
dataset discoverable in [Hub search](https://huggingface.co/datasets?library=library:datadesigner&sort=trending)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Potentially doubled library: prefix in Hub search URL

The library query parameter appears to embed the full internal tag name library:datadesigner as its value, which may be redundant:

https://huggingface.co/datasets?library=library:datadesigner&sort=trending

HF Hub typically strips the library: prefix in the URL query parameter — the standard pattern used elsewhere is just the library name as the value, e.g. ?library=datadesigner. Worth verifying the link resolves to the intended filtered view, since an incorrect URL would return an empty result set to readers.

Prompt To Fix With AI
This is a comment left during a code review.
Path: docs/devnotes/posts/push-datasets-to-hugging-face-hub.md
Line: 246

Comment:
**Potentially doubled `library:` prefix in Hub search URL**

The `library` query parameter appears to embed the full internal tag name `library:datadesigner` as its value, which may be redundant:

```
https://huggingface.co/datasets?library=library:datadesigner&sort=trending
```

HF Hub typically strips the `library:` prefix in the URL query parameter — the standard pattern used elsewhere is just the library name as the value, e.g. `?library=datadesigner`. Worth verifying the link resolves to the intended filtered view, since an incorrect URL would return an empty result set to readers.

How can I resolve this? If you propose a fix, please make it concise.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both https://huggingface.co/datasets?library=library:datadesigner&sort=trending and https://huggingface.co/datasets?library=datadesigner&sort=trending resolve to the same thing. We should perhaps keep the later.

Comment thread docs/devnotes/posts/push-datasets-to-hugging-face-hub.md
Copy link
Copy Markdown
Contributor

@nabinchha nabinchha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @davanstrien! Added one small nit, but lgtm!

You'll need to comment with I have read the DCO document and I hereby sign the DCO. before you can merge though!

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
@davanstrien
Copy link
Copy Markdown
Contributor Author

I have read the DCO document and I hereby sign the DCO.

@nabinchha nabinchha merged commit 9a352b8 into NVIDIA-NeMo:nmulepati/docs/dev-notes-push-to-huggingface-hub Mar 30, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants