Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions UPSTREAM.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Each upstream has its own append-only table. Add a row every time you pull.
| 2026-04-30 | `997ee45` | `660827d` | bcode | 11 upstream commits (PRs #246, #247, #251, #254, #256, #260). `src/browser_harness/daemon.py`: resolve WS via `/json/version` to avoid stale `DevToolsActivePort` path (PR #260) + report `cdp_disconnected` on stale CDP probe in `connection_status` (PR #254) + cleanup remote browser when daemon startup fails (PR #251). `src/browser_harness/admin.py`: companion changes for the daemon fixes. `tests/unit/test_admin.py`: 7 new tests. New domain skills: `agent-workspace/domain-skills/xiaohongshu/scraping.md` (PR #246), and a top-level `domain-skills/shopify-admin/` tree (PR #247: README, embedded-apps, knowledge-base, polaris-inputs). Note: PR #247 added skills at the top-level `domain-skills/` path, not under `agent-workspace/domain-skills/` as the post-#229 layout would suggest — vendored verbatim to match upstream layout. Doc updates: README operator framing (PR #255), install.md heredoc → `-c` flag (PR #256), profile-sync.md same. All files outside divergences — taken verbatim. Smoke test + 19 admin unit tests pass. Divergences touched: none. |
| 2026-05-01 | `660827d` | `013097a` | bcode | 8 upstream commits (PRs #261, #265, #266). `src/browser_harness/daemon.py` (PR #265): split `DevToolsActivePort` into port + ws-path lines and fall back to `ws://127.0.0.1:<port><ws_path>` when `/json/version` returns 404 (Chrome 147+ disables `/json/*` HTTP discovery on the default user-data-dir). `src/browser_harness/run.py` (PR #266): when no daemon is alive, no local Chrome is listening on 9222/9223 (probed via `/json/version`, not bare TCP), and `BROWSER_USE_API_KEY` is set, auto-bootstrap a cloud daemon. `tests/unit/test_run.py`: 2 new tests for the cloud bootstrap path. PR #261 moved `domain-skills/shopify-admin/` → `agent-workspace/domain-skills/shopify-admin/` upstream — both paths are excluded from the vendored tree per §3, so this rename is a no-op for browsercode (`script/check-harness-diff.sh` filters both via `IGNORED_PATHS_REGEX`). All in protected `src/browser_harness/*.py` + tests — taken verbatim. Smoke test + 23 unit tests pass. Divergences touched: none. |
| 2026-05-03 | `013097a` | `59a166f` | bcode | 62 upstream commits. **Helper additions** (PRs #258, #279): `helpers.py` adds `fill_input` (raises on missing element, optional timeout for SPA rendering, dispatches select-all without char event so Cmd/Ctrl+A fires on macOS), `wait_for_element` (prefers `checkVisibility`, falls back to computed style), `wait_for_network_idle`. `tests/unit/test_helpers.py`: +253 lines covering the new helpers. `daemon.py`: discover Dia browser profile on macOS. **Windows IPC hardening** (PR #276): `_ipc.py` adds ping handshake, token auth, atomic port file. **Domain-skills opt-in** (PR #274): `helpers.py` gates auto-injected domain skills behind `BH_DOMAIN_SKILLS=1` (default off). Aligns upstream default with browsercode's exclusion policy — no behavior change for us, but the `BH_DOMAIN_SKILLS` env name is now the canonical knob if we ever decide to ship a curated set. **Cloud bootstrap opt-in** (PR #277): `run.py` makes cloud auto-bootstrap opt-in via `BU_AUTOSPAWN` instead of triggering on any `BROWSER_USE_API_KEY` presence. Plus admin tweaks (`tests/unit/test_admin.py` +10 lines), doc canonicalization (`README.md`, `SKILL.md`, `install.md`, `interaction-skills/profile-sync.md` PR #280), and new top-level scaffolding: `AGENTS.md` (repo orientation for coding agents), `.github/ISSUE_TEMPLATE/{bug-report,feature-request,config}.yml`, `.github/VOUCHED.td`, `docs/allow-remote-debugging.png`. All non-excluded paths taken verbatim. **Excluded paths** (per §3): 14 new domain-skills directories added upstream (aa, alaska, articulate-rise, bigbang-hr, bilibili, BOSS-zhipin, claude-ai, ctrip, flipkart, ly-com, manus, perplexity, wehotel, plus amazon under top-level `domain-skills/`) — skipped. **Divergence update**: `.gitignore` now also includes upstream's new `.idea/` and `.claude/` entries while preserving our `.venv/`. Smoke test (imports + `--version`) clean. Divergences touched: `.gitignore` (extended, same intent). |
| 2026-05-06 | `59a166f` | `32d8d515e` | bcode | 52 upstream commits. **PID-reuse safety in `restart_daemon`** (PR #294): `admin.py` gains `_process_start_time` (Linux `/proc/<pid>/stat` field 22, macOS `ps -o lstart=`, Windows `GetProcessTimes` via ctypes) + new IPC `identify()` helper. `_ipc.py` hardens `ping` against non-dict/non-positive-pid responses. **`BH_RUNTIME_DIR` / `BH_TMP_DIR` split** (PR #318): `_ipc.py` introduces `BH_RUNTIME_DIR` for the AF_UNIX-sensitive sock/port/pid (104-byte `sun_path` budget) while `BH_TMP_DIR` keeps the long-path-tolerant log/screenshot files. Backward compatible — `BH_RUNTIME_DIR` falls back to `BH_TMP_DIR` then `/tmp`, so our `BH_TMP_DIR=ctx.bhTmpDir` setup in `browser-execute.ts` continues to work unchanged. (Future browsercode improvement: pass `BH_RUNTIME_DIR` separately so a deeper persistent `bhTmpDir` no longer has to fit the AF_UNIX budget. Tracked for ROADMAP follow-up — out of scope for this sync.) **AF_UNIX umask fix** (PR #309): `_ipc.py` sets `umask 0077` around `bind()` to remove the chmod TOCTOU window. **`current_tab` via daemon meta** (PR #305): `helpers.py` resolves the attached `target_id` server-side via the daemon's session meta instead of `Target.getTargetInfo`, fixing the missing-target case after a page nav. **CDP discovery fallback** (PR #292): `daemon.py` falls back to `ws://127.0.0.1:<port><ws_path>` when `/json/version` returns 404 (Chrome 147+ disables `/json/*` on default user-data-dirs); IPv6 hosts bracketed in the WS URL. **Tab-switch CDP parity** (PR #296): `daemon.py` enables Page/DOM/Runtime/Network on `set_session` to match initial-attach behavior; `helpers.py` filters `wait_for_network_idle` events by `session_id` so a previously-attached background tab doesn't poison idle on the current tab. **Run-time CDP precedence** (PR #300): `run.py` adds `_explicit_cdp_configured()` gate so `BU_CDP_URL` / `BU_CDP_WS` block the cloud auto-bootstrap (was silently overriding user's explicit endpoint and billing for a cloud browser). **Browser discovery additions**: Chrome Canary profile (PR #263, macOS + Windows in `daemon.py`), Brave on Windows (PR #284, `daemon.py`). **README banner** (PR #285): SVG ink-bleed reveal replaces the static R2 PNG. **VOUCHED.td** (PRs #308, #310): two bot/fabricated-profile exclusions. **Excluded paths** (per §3): 8 new domain-skills additions upstream (agentlist, browser-use-cloud, freewheel-mrm, tasksquad-ai, vercel, x — across PRs #281, #282, #283, #288, #301, #302) plus shopify-admin reorg/cleanup — skipped. All in-scope files (`src/browser_harness/*.py`, `tests/unit/*.py`, `README.md`, `.github/VOUCHED.td`) taken verbatim. Two new test files: `tests/unit/test_daemon.py`, `tests/unit/test_ipc.py`. Smoke test: imports ok, `browser-harness --version` → `0.1.0`, `pytest tests/unit/` → 76 passed. Divergences touched: none. |

---

Expand Down
2 changes: 2 additions & 0 deletions packages/bcode-browser/harness/.github/VOUCHED.td
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,5 @@
molesza
rohitdutt108
shaunandrewjackson1977
-nandanadileep # Bot
-web-dev0521 # Fabricated profile, bot PRs
2 changes: 1 addition & 1 deletion packages/bcode-browser/harness/README.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
<img src="https://r2.browser-use.com/github/ajsdlasnnalsgasld.png" alt="Browser Harness" width="100%" />
<img src="https://raw.githubusercontent.com/browser-use/media/main/browser-harness/banner-ink.svg" alt="Browser Harness" width="100%" />

# Browser Harness ♞

Expand Down
83 changes: 69 additions & 14 deletions packages/bcode-browser/harness/src/browser_harness/_ipc.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,22 @@
from pathlib import Path

IS_WINDOWS = sys.platform == "win32"
# BH_TMP_DIR set → caller-isolated dir, bare filenames (avoids AF_UNIX sun_path
# overrun: 104 macOS / 108 Linux). Unset → shared tmpdir, "bu-<NAME>" prefix
# disambiguates daemons. POSIX default is /tmp (gettempdir() returns long
# /var/folders/... on macOS); Windows uses TCP so any tempdir is fine.
# Two caller-supplied dirs:
# BH_RUNTIME_DIR — sock/port/pid. AF_UNIX sun_path is 104 bytes on macOS, so
# the runtime dir must be short. Caller is responsible for keeping it
# within budget. Falls back to BH_TMP_DIR (legacy single-dir callers),
# then to /tmp on POSIX (gettempdir() returns long /var/folders/... on
# macOS — unsafe for AF_UNIX) or tempfile.gettempdir() on Windows (TCP).
# BH_TMP_DIR — screenshots, debug overlays, daemon log. No path-length
# sensitivity; caller can use a deep persistent path.
# When the caller supplies a per-instance dir for either purpose, files use
# bare "bu" stems; otherwise "bu-<NAME>" disambiguates co-tenants.
BH_TMP_DIR = os.environ.get("BH_TMP_DIR")
BH_RUNTIME_DIR = os.environ.get("BH_RUNTIME_DIR") or BH_TMP_DIR
_TMP = Path(BH_TMP_DIR or (tempfile.gettempdir() if IS_WINDOWS else "/tmp"))
_RUNTIME = Path(BH_RUNTIME_DIR or (tempfile.gettempdir() if IS_WINDOWS else "/tmp"))
_TMP.mkdir(parents=True, exist_ok=True)
_RUNTIME.mkdir(parents=True, exist_ok=True)
_NAME_RE = re.compile(r"\A[A-Za-z0-9_-]{1,64}\Z")

# Set by serve() on Windows. Daemon's handle() requires every request to carry
Expand All @@ -25,15 +34,20 @@ def _check(name): # path-traversal guard for BU_NAME
return name


def _stem(name): # "bu" when BH_TMP_DIR isolates us, else "bu-<NAME>"
def _runtime_stem(name): # "bu" when BH_RUNTIME_DIR isolates us, else "bu-<NAME>"
_check(name)
return "bu" if BH_RUNTIME_DIR else f"bu-{name}"


def _tmp_stem(name): # "bu" when BH_TMP_DIR isolates us, else "bu-<NAME>"
_check(name)
return "bu" if BH_TMP_DIR else f"bu-{name}"


def log_path(name): return _TMP / f"{_stem(name)}.log"
def pid_path(name): return _TMP / f"{_stem(name)}.pid"
def port_path(name): return _TMP / f"{_stem(name)}.port" # Windows-only: holds {"port","token"} JSON
def _sock_path(name): return _TMP / f"{_stem(name)}.sock"
def log_path(name): return _TMP / f"{_tmp_stem(name)}.log"
def pid_path(name): return _RUNTIME / f"{_runtime_stem(name)}.pid"
def port_path(name): return _RUNTIME / f"{_runtime_stem(name)}.port" # Windows-only: holds {"port","token"} JSON
def _sock_path(name): return _RUNTIME / f"{_runtime_stem(name)}.sock"


def _read_port_file(name):
Expand All @@ -48,7 +62,7 @@ def _read_port_file(name):
def sock_addr(name): # display-only, used in log lines
if not IS_WINDOWS: return str(_sock_path(name))
port, _ = _read_port_file(name)
return f"127.0.0.1:{port}" if port else f"tcp:{_stem(name)}"
return f"127.0.0.1:{port}" if port else f"tcp:{_runtime_stem(name)}"


def spawn_kwargs(): # subprocess.Popen flags so the daemon detaches from this terminal
Expand Down Expand Up @@ -97,22 +111,63 @@ def ping(name, timeout=1.0):
except (FileNotFoundError, ConnectionRefusedError, TimeoutError, socket.timeout, OSError):
return False
try:
return request(c, token, {"meta": "ping"}).get("pong") is True
except (OSError, ValueError):
resp = request(c, token, {"meta": "ping"})
# request() returns parsed JSON, which may be any valid value (a list,
# scalar, etc. from a stale or hostile endpoint). Anything that isn't
# a {pong: true} dict counts as "not our daemon" — never .get() blindly.
return isinstance(resp, dict) and resp.get("pong") is True
except (OSError, ValueError, AttributeError):
return False
finally:
try: c.close()
except OSError: pass


def identify(name, timeout=1.0):
"""Return the live daemon's PID, or None if unreachable.

Used by restart_daemon() to signal a process whose identity has been
verified end-to-end (live IPC + self-reported PID), instead of trusting
a pid file whose number may have been reused by an unrelated process."""
try:
c, token = connect(name, timeout=timeout)
except (FileNotFoundError, ConnectionRefusedError, TimeoutError, socket.timeout, OSError):
return None
try:
resp = request(c, token, {"meta": "ping"})
# request() returns parsed JSON, which may be any valid value (a list,
# scalar, etc. from a stale or hostile endpoint). Anything that isn't
# a {pong: true} dict gets None — never .get() on a non-dict.
if not isinstance(resp, dict) or resp.get("pong") is not True:
return None
pid = resp.get("pid")
# `type(pid) is int` (not isinstance) intentionally rejects bool: in
# Python, isinstance(True, int) is True, so a hostile/buggy daemon
# could reply with {"pid": True} and we'd treat that as PID 1 (init).
# Also reject 0/negatives — os.kill(0, sig) signals every process in
# the calling process group, os.kill(-1, sig) signals every process
# the caller can. Upper bound is 2**31 because C pid_t is typically
# signed 32-bit and a value outside that range makes os.kill() raise
# OverflowError, which would propagate out of restart_daemon() before
# its cleanup. Linux pid_max is also bounded at 2**22 in practice.
return pid if type(pid) is int and 0 < pid < (1 << 31) else None
except (OSError, ValueError, AttributeError):
return None
finally:
try: c.close()
except OSError: pass


async def serve(name, handler):
"""Run the server until cancelled. handler(reader, writer) sees the same interface either way."""
global _server_token
if not IS_WINDOWS:
path = str(_sock_path(name))
if os.path.exists(path): os.unlink(path)
server = await asyncio.start_unix_server(handler, path=path)
os.chmod(path, 0o600)
# umask 0o077 makes bind() create the socket as 0600 — no TOCTOU window before chmod.
old_umask = os.umask(0o077)
try: server = await asyncio.start_unix_server(handler, path=path)
finally: os.umask(old_umask)
_server_token = None
async with server: await asyncio.Event().wait()
return
Expand Down
Loading
Loading