Skip to content

Commit 46afdba

Browse files
authored
Merge pull request #24 from browser-use/sync/harness-660827d
sync: harness 660827d
2 parents 29e83d8 + acecdd9 commit 46afdba

12 files changed

Lines changed: 581 additions & 36 deletions

File tree

UPSTREAM.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,7 @@ Each upstream has its own append-only table. Add a row every time you pull.
8787
| 2026-04-28 | `fefca43` | `04f7716` | bcode | 7 upstream commits. Windows fixes (PRs #232, #240) + skill rename (PR #242). Files: `src/browser_harness/_ipc.py` (BH_TMP_DIR override for sock/port/pid/log/screenshot dir; drop DETACHED_PROCESS to suppress empty Windows console window), `src/browser_harness/admin.py` (route `ensure_daemon` warm probe through `ipc.connect` so Windows TCP loopback works; new `_open_inspect=False` flag on `ensure_daemon` used by `run_setup` to prevent chrome://inspect tab flooding; drop unused `_paths()` helper), `src/browser_harness/helpers.py` (`capture_screenshot` and click-debug overlay route through `ipc._TMP` instead of `tempfile.gettempdir()` so BH_TMP_DIR covers them too), `SKILL.md` (`name: browser-harness` → `name: browser`), `install.md` (`name: browser-harness-install` → `name: browser-install`). All in protected `src/browser_harness/*.py` zone — taken verbatim. SKILL/install frontmatter rename only affects how end-users invoke the skill (`/browser` vs `/browser-harness`); our `browser-execute.txt` references SKILL.md by file path, so no integration code changes. Divergences touched: none. PR #240 e2e tested separately on Linux against headless Chrome before sync. |
8888
| 2026-04-28 | `04f7716` | `2125cea` | bcode | 1 upstream commit (PR #243). `src/browser_harness/_ipc.py`: `_TMP.mkdir(parents=True, exist_ok=True)` at module load so a caller-supplied `BH_TMP_DIR` pointing at a non-existent directory no longer fails the first sock/port/pid/log/screenshot write. Prerequisite for browsercode's per-session scratch-dir use case. Protected zone — taken verbatim. Divergences touched: none. |
8989
| 2026-04-29 | `2125cea` | `997ee45` | bcode | 6 upstream commits (PRs #241, #244, #245). `src/browser_harness/_ipc.py`: when `BH_TMP_DIR` is set, drop the `bu-<NAME>` filename prefix (caller-isolated dir means no shared-tmpdir disambiguation needed); without `BH_TMP_DIR` the original `bu-<NAME>` scheme is unchanged. `src/browser_harness/admin.py`: `_daemon_endpoint_names` short-circuits to the local NAME when `BH_TMP_DIR` is set (no glob); plus catch `SystemError` from `os.kill` on Windows during `restart_daemon`. `src/browser_harness/daemon.py`: discover DevToolsActivePort in Comet and Arc profiles on macOS. `tests/unit/test_admin.py`: 2 new tests for the `BH_TMP_DIR` discovery path. All in protected `src/browser_harness/*.py` + tests — taken verbatim. Smoke test + 12 admin unit tests pass. The `_ipc` filename change pairs with our recent per-session BH_TMP_DIR work (browsercode PR #22) — caller isolation now extends to filenames as well as the dir. Divergences touched: none. |
90+
| 2026-04-30 | `997ee45` | `660827d` | bcode | 11 upstream commits (PRs #246, #247, #251, #254, #256, #260). `src/browser_harness/daemon.py`: resolve WS via `/json/version` to avoid stale `DevToolsActivePort` path (PR #260) + report `cdp_disconnected` on stale CDP probe in `connection_status` (PR #254) + cleanup remote browser when daemon startup fails (PR #251). `src/browser_harness/admin.py`: companion changes for the daemon fixes. `tests/unit/test_admin.py`: 7 new tests. New domain skills: `agent-workspace/domain-skills/xiaohongshu/scraping.md` (PR #246), and a top-level `domain-skills/shopify-admin/` tree (PR #247: README, embedded-apps, knowledge-base, polaris-inputs). Note: PR #247 added skills at the top-level `domain-skills/` path, not under `agent-workspace/domain-skills/` as the post-#229 layout would suggest — vendored verbatim to match upstream layout. Doc updates: README operator framing (PR #255), install.md heredoc → `-c` flag (PR #256), profile-sync.md same. All files outside divergences — taken verbatim. Smoke test + 19 admin unit tests pass. Divergences touched: none. |
9091

9192
---
9293

packages/bcode-browser/harness/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
# Browser Harness ♞
44

5-
The simplest, thinnest, **self-healing** harness that gives LLM **complete freedom** to complete any browser task. Built directly on CDP.
5+
Connect an LLM directly to your real browser with a thin, editable CDP harness. For browser tasks where you need **complete freedom**.
66

7-
The agent writes what's missing, mid-task, inside `agent-workspace/`. No framework, no recipes, no rails. One websocket to Chrome, nothing between.
7+
One websocket to Chrome, nothing between. The agent writes what's missing during execution. The harness improves itself every run.
88

99
```
1010
● agent: wants to upload a file
Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
# Xiaohongshu — Search and Sort
2+
3+
URL patterns:
4+
- Home / discovery: `https://www.xiaohongshu.com/explore`
5+
- Search results: `https://www.xiaohongshu.com/search_result?keyword=...`
6+
7+
## Search flow
8+
9+
- Prefer direct navigation to the desktop search results page over automating the home-page search box.
10+
- Reliable primary path: `https://www.xiaohongshu.com/search_result?keyword=<url-encoded keyword>&source=web_explore_feed`
11+
- This route loads the normal desktop results page and avoids home-page input flakiness.
12+
- The search results page can also appear with variants such as `type=51` or other `source` values after in-app navigation; do not treat those as suspicious if the rendered results are correct.
13+
- The top search box on `explore` can work, and searching from the home page has transitioned to `search_result` without a login wall in some sessions.
14+
- The page exposes duplicate search inputs in the DOM with the same placeholder `搜索小红书`.
15+
- The home-page search input can behave like a tightly controlled app field: direct DOM value assignment may be cleared immediately, and harness `type_text()` may fail to populate it even when the input is focused.
16+
- Treat the home-page input as best-effort only. Use it when a human-like interactive flow matters, but for automation default to constructing the `search_result` URL directly.
17+
18+
## Sort behavior
19+
20+
- On the current desktop results layout, `最新` is **not** a top-level tab beside `综合`.
21+
- Open the `筛选` control in the upper-right of the results header to access sort options.
22+
- Inside `筛选`, `排序依据` contains:
23+
- `综合`
24+
- `最新`
25+
- `最多点赞`
26+
- `最多评论`
27+
- `最多收藏`
28+
- The `排序依据` row can render duplicate DOM nodes for the same pill text, including non-interactive clones.
29+
- Raw global text search for `最新` can hit the wrong node first. Scope to the `排序依据` section and then choose the visible interactive `.tags` node.
30+
- Prefer semantic filtering such as `aria-hidden != "true"` or section-scoped visible `.tags` selection over style-specific checks.
31+
- When `最新` is active, the `筛选` trigger changes to `已筛选`.
32+
- The rendered feed and the `已筛选` / active-pill UI are more reliable than `window.__INITIAL_STATE__.search.searchContext.sort` for confirming latest sort.
33+
34+
## Stable cues
35+
36+
- Search channel tabs near the top: `全部`, `图文`, `视频`, `用户`
37+
- Sort panel labels: `筛选`, `排序依据`, `最新`
38+
- Filter sections also visible in the panel: `笔记类型`, `发布时间`, `搜索范围`, `位置距离`
39+
40+
## Interaction notes
41+
42+
- DOM `.click()` opened the `筛选` panel reliably.
43+
- DOM `.click()` on the visible `最新` pill inside the open `排序依据` section reliably activated latest sort.
44+
- The reliable DOM pattern was:
45+
- find the `排序依据` section / `.filters` block
46+
- search within that block for `.tags`
47+
- choose the one whose text is `最新` and which is the visible interactive node
48+
- call `.click()` on that visible node
49+
- Example selector strategy:
50+
- find `.filters` whose first label is `排序依据`
51+
- inside it, pick `.tags` where `textContent.trim() === "最新"` and `el.getAttribute("aria-hidden") !== "true"`
52+
- `getClientRects().length > 0` alone may be insufficient to distinguish the working node from a duplicate.
53+
- A broad `document.querySelectorAll("*")` text match for `最新` is not reliable on this page because it may click the hidden duplicate instead of the visible control.
54+
- Coordinate click on the visible `最新` pill also worked and remains a valid fallback if DOM targeting gets confused by future UI changes.
55+
- After selecting `最新`, the grid briefly showed skeleton placeholders before the refreshed results appeared.
56+
- The search page stores the currently rendered note cards in `window.__INITIAL_STATE__.search.feeds._value` as an array of feed entries. For ordinary note cards, the useful fields were:
57+
- `id`
58+
- `xsecToken`
59+
- `noteCard.displayTitle`
60+
- `noteCard.user.nickname`
61+
- The feed array can contain non-note inserts such as hot-query modules. Filter for entries with `noteCard` before treating an item as a note result.
62+
63+
## Post opening
64+
65+
- Do **not** assume a raw results link like `https://www.xiaohongshu.com/explore/<id>` is directly openable.
66+
- Opening that raw `/explore/<id>` URL in a fresh tab can redirect to the web `404` / app-only gate even when the same post is openable from search results.
67+
- To open a post from search results, click the visible card image / card in-page first.
68+
- That click navigation can land on a tokenized URL like `https://www.xiaohongshu.com/explore/<id>?xsec_token=...&xsec_source=pc_search`, which is a more reliable note URL than the raw `/explore/<id>` form.
69+
- Once the tokenized URL is obtained from the click flow, it can be revisited in-session for extraction.
70+
- If the search results state is already loaded, you can reconstruct the tokenized note URL directly from a feed item without re-clicking:
71+
- `https://www.xiaohongshu.com/explore/<id>?xsec_token=<xsecToken>&xsec_source=pc_search`
72+
73+
## Post extraction
74+
75+
- On tokenized post pages opened via `pc_search`, `document.body.innerText` can be a useful first-pass extraction source because it often includes the rendered note text, hashtags, timestamp, engagement counts, and visible comments.
76+
- Verify that the note content actually rendered before trusting `document.body.innerText`, because the page can also include substantial navigation, footer, and comment noise.
77+
- Prefer `document.body.innerText` as a fallback or initial probe before writing fragile per-element selectors for post content.
78+
79+
## Gotchas
80+
81+
- Do not assume `Enter` alone finished the workflow until you verify the URL changed to `search_result` or the result grid appeared.
82+
- Do not assume the visible `综合` tab controls all sorting; on this layout, time ordering is hidden inside `筛选`.
83+
- Do not assume the first DOM node whose text is `最新` is the clickable one; this panel duplicates pills and the hidden clone can absorb naive text-based targeting without changing state.
84+
- Do not assume a successfully opened post can be reproduced by stripping query params; preserve the `xsec_token` when reopening results-derived post URLs.
Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
# shopify-admin
2+
3+
Browser-harness patterns for `admin.shopify.com` and embedded Shopify apps.
4+
5+
## Files in this folder
6+
7+
- `embedded-apps.md` — every Shopify app runs in an iframe; how to target it
8+
- `polaris-inputs.md` — Polaris React inputs reject synthetic value setters; use CDP type_text
9+
- `knowledge-base.md` — automating the Shopify Knowledge Base App for FAQ entries
10+
11+
## When to use these
12+
13+
You're driving Shopify admin and need to add / edit / configure something. The Shopify admin UI is large and many surfaces are embedded apps — first check whether what you need is in an embedded app (most apps under `admin.shopify.com/store/<store>/apps/<app-slug>/...` are).
14+
15+
## When to skip
16+
17+
- If the operation is read-only product / inventory data → use the **Storefront API** (HTTP) instead, much faster
18+
- If the store has a custom admin app with API token provisioned → use the **Admin API** (GraphQL or REST) instead, no UI scraping
19+
- If you're editing theme code → use the **Shopify CLI** (`shopify theme push`) — don't touch the theme editor UI
20+
21+
The browser is the right tool only when:
22+
- The setting / app exposes no API
23+
- The change is one-time or rare enough not to justify scripting
24+
- You're discovering / exploring the admin (e.g., finding selectors for a future automation)
25+
26+
## Authentication
27+
28+
Mike (or the human owner) must be logged into `admin.shopify.com` in the Chrome session that browser-harness attaches to. The harness does NOT log in — it inherits the human's session.
29+
30+
If you hit `accounts.shopify.com` redirect, stop and ask the human to log in. Don't type credentials.
31+
32+
## Polaris is in transition (Jan 2026 onward)
33+
34+
Shopify is migrating its design system from React-based Polaris to Web-Components-based Polaris. Most legacy admin surfaces are still React. Newer surfaces (Catalog Mapping, parts of Settings) may be web components.
35+
36+
Screenshot first. If you see `<s-text-field>` or `<s-button>` web component tags → use the web component pattern. If you see `[class*="Polaris-"]` React class names → use the CDP keystrokes pattern in `polaris-inputs.md`.
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Shopify embedded apps run in iframes
2+
3+
Every Shopify app surfaced in the admin (first-party like Knowledge Base, third-party like Okendo) renders inside a sandboxed iframe. Your top-level `document` queries find the Shopify chrome (sidebar, header, search bar) but **none of the app's UI**.
4+
5+
## How to target the iframe
6+
7+
```python
8+
from helpers import iframe_target, js, type_text
9+
10+
# 1. Find the iframe by URL substring
11+
tid = iframe_target("qa-pairs-app") # Knowledge Base App
12+
13+
# 2. Run JS inside the iframe by passing target_id
14+
result = js("""
15+
(() => {
16+
const button = Array.from(document.querySelectorAll('button')).find(b => b.textContent.trim() === 'Add FAQ');
17+
if (button) { button.click(); return {clicked: true}; }
18+
return {clicked: false};
19+
})()
20+
""", target_id=tid)
21+
```
22+
23+
## Finding the URL substring
24+
25+
The iframe's URL contains the app slug. Run:
26+
27+
```python
28+
import json
29+
for t in cdp("Target.getTargets")["targetInfos"]:
30+
if t["type"] == "iframe" and "shopify" in t.get("url", "").lower():
31+
print(t["url"])
32+
```
33+
34+
Then pick a substring unique to your target app.
35+
36+
## Known Shopify app iframe slugs
37+
38+
| App | iframe URL substring |
39+
|---|---|
40+
| Shopify Knowledge Base (qa-pairs-app) | `qa-pairs-app` |
41+
| Shopify Online Store editor | `online-store-web.shopifyapps.com` |
42+
| Shopify Hydrogen Storefront | `hydrogen-storefronts` (or similar — verify) |
43+
44+
Add to this table when you discover new ones.
45+
46+
## Why iframes
47+
48+
Shopify uses App Bridge to embed third-party apps with isolation. Your top-level page CAN'T directly access app DOM for security reasons — you need iframe targeting (which the harness does via CDP `Target.attachToTarget`).
49+
50+
## Coordinate clicks vs JS clicks
51+
52+
Coordinate clicks (`click(x, y)`) pass through iframes at the compositor level — they work. But JS clicks scoped to the iframe target are more reliable for routine button taps because:
53+
54+
- Element text content is stable across UI redesigns
55+
- DPR scaling on retina is automatic
56+
- React event handlers are guaranteed to fire (vs. CDP mouse events which sometimes hit a transparent layer above the button)
57+
58+
## Gotcha — multiple iframes from same app
59+
60+
The Online Store editor renders the storefront preview AND the editor toolbar in two separate iframes. Pick the right one by URL substring; don't assume the first match is correct.
61+
62+
```python
63+
# WRONG — picks first match
64+
tid = iframe_target("online-store-web")
65+
66+
# RIGHT — disambiguate
67+
for t in cdp("Target.getTargets")["targetInfos"]:
68+
url = t.get("url", "")
69+
if "online-store-web" in url and "editor" in url:
70+
tid = t["targetId"]
71+
break
72+
```
Lines changed: 109 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,109 @@
1+
# Shopify Knowledge Base App — automating FAQ entries
2+
3+
The Knowledge Base App (Shopify Winter '26 Edition) lets merchants control how AI agents (ChatGPT, Perplexity, Claude, Copilot, Gemini) answer questions about their brand. Each entry is a Question / Answer pair. The app currently has no public API and is English-only as of Winter '26 — browser automation is the canonical path.
4+
5+
## URL pattern
6+
7+
```
8+
https://admin.shopify.com/store/<store-handle>/apps/shopify-knowledge-base/app
9+
```
10+
11+
Sub-routes:
12+
- `/app` — overview (FAQ list, top unanswered questions, query log)
13+
- `/app/new` — Add FAQ form
14+
- `/app/pairs/<id>` — entry detail / edit
15+
16+
## Iframe slug
17+
18+
The app runs at iframe URL containing `qa-pairs-app`:
19+
20+
```python
21+
tid = iframe_target("qa-pairs-app")
22+
```
23+
24+
## Adding a single FAQ
25+
26+
See `polaris-inputs.md` for the full canonical pattern. Quick version:
27+
28+
```python
29+
def add_faq(question, answer):
30+
tid = iframe_target("qa-pairs-app")
31+
# focus question input via JS, type via CDP, focus answer, type, click Save
32+
# poll URL for /pairs/<id> success signal
33+
```
34+
35+
## Batching multiple FAQs
36+
37+
After saving an entry, the success page shows "FAQ created. Add another FAQ" link. Click it via JS to skip navigating back to overview:
38+
39+
```python
40+
def click_add_another():
41+
tid = iframe_target("qa-pairs-app")
42+
js("""
43+
(() => {
44+
const link = Array.from(document.querySelectorAll('a, button'))
45+
.find(x => x.textContent.trim() === 'Add another FAQ');
46+
if (link) link.click();
47+
})()
48+
""", target_id=tid)
49+
```
50+
51+
Loop:
52+
53+
```python
54+
ENTRIES = [(q1, a1), (q2, a2), ...]
55+
for q, a in ENTRIES:
56+
click_add_another()
57+
time.sleep(1.5) # wait for form to render
58+
ok, info = add_faq(q, a)
59+
print(f"{q[:40]} -> {ok} ({info})")
60+
if not ok: break
61+
```
62+
63+
## Brand voice — what to put in answers
64+
65+
This is application-specific (depends on the merchant). For JING the rule was Aesop founder-letter tone — sentence case, no exclamation points, "JING" not "we", specific over generic.
66+
67+
The Shopify guidance "Provide a brief answer in 1 or 2 sentences" is a soft hint. The textarea accepts longer text and AI agents prefer specific multi-sentence answers. Aim for 2-4 short sentences with concrete details.
68+
69+
## What to put in the Knowledge Base
70+
71+
Categories that materially shape AI agent answers about your brand:
72+
73+
1. **Brand voice / DNA** — "What is your brand?" / "What's your tone?"
74+
2. **Specs** — exact materials, dimensions, weights, sizes (NOT marketing prose)
75+
3. **Comparisons** — "How does X compare to <competitor>?" with concrete differences
76+
4. **Policies** — returns, shipping, care, warranty, contact (in brand voice)
77+
5. **Origin** — founder, where made, why brand exists
78+
6. **Limitations** — what you DON'T do (V1 scope, US-only, etc.) — agents that hallucinate availability hurt conversion
79+
80+
Skip: anything marketing-speak. The Knowledge Base is for **truth, in voice**, not pitch copy.
81+
82+
## Top unanswered questions
83+
84+
The overview shows up to 7 "Top unanswered questions" Shopify auto-detected from query logs. **Answer these first** — they're real shopper queries hitting your store right now. Once answered, the section empties.
85+
86+
## Query log
87+
88+
`/admin/apps/shopify-knowledge-base/app/queries` (or "Query log" in app sidebar) shows what shoppers actually asked AI agents about your brand. Read weekly. New patterns become new FAQ entries.
89+
90+
## Verifying entries surface in AI
91+
92+
After adding an entry, allow 24 hours for AI provider indexing, then test:
93+
94+
- ChatGPT: "Tell me about <your brand>'s return policy" → check if your exact wording surfaces
95+
- Perplexity: same
96+
- Claude: "Compare <your brand> vs <competitor>" → see if your comparison framing appears
97+
98+
If the answer doesn't surface, the entry might be too long, too vague, or contradicted by another source (your homepage, an outdated blog post). Tighten the answer.
99+
100+
## Limits
101+
102+
As of Winter '26 Edition:
103+
- English-only
104+
- No bulk import / CSV upload
105+
- No API for read or write
106+
- Each entry maximum ~500 words (soft cap; UI shows guidance "1 or 2 sentences")
107+
- No version history visible to the merchant
108+
109+
Watch Shopify changelogs for API exposure — likely in Spring '26 or Summer '26 Edition. When it ships, switch to API-driven population.

0 commit comments

Comments
 (0)