Skip to content

feat: shadow-mode SQL pre-validation telemetry#643

Merged
anandgupta42 merged 5 commits intomainfrom
fix/pre-execution-validation-and-error-resilience
Apr 5, 2026
Merged

feat: shadow-mode SQL pre-validation telemetry#643
anandgupta42 merged 5 commits intomainfrom
fix/pre-execution-validation-and-error-resilience

Conversation

@anandgupta42
Copy link
Copy Markdown
Contributor

@anandgupta42 anandgupta42 commented Apr 4, 2026

What does this PR do?

Ships telemetry-only instrumentation to measure whether pre-execution SQL validation is worth implementing as a blocking check.

The original scope (blocking validation, apply_patch retry, file-not-found cache, prompt rules, warehouse guidance, validate auto-pull) was deferred after analysis showed the telemetry sample (8 machines, 1 day, 2-3 outlier users) didn't justify 297 LOC of production defenses.

This PR adds:

  • sql_pre_validation telemetry event (outcome, reason, schema_columns, duration_ms, error_message)
  • preValidateSql() runs fire-and-forget before every sql_execute — validates the query against the cached schema via altimate_core.validate, emits telemetry, and never blocks execution
  • Zero user-facing latency impact (async, detached)

After 2 weeks of shadow telemetry we can answer:

  • What % of sql_execute calls would have been blocked?
  • What's the validation latency distribution?
  • How often is the cache stale / empty / missing?
  • Would blocking have false-positive risk?

Then decide whether to flip to blocking mode.

Type of change

  • Bug fix
  • New feature (shadow telemetry)
  • Test coverage
  • Documentation
  • Refactoring
  • Infrastructure

Issue for this PR

Follow-up to telemetry analysis of 2026-03-30 (Azure AppInsights, altimate-code-os).

How did you verify your code works?

  • Typecheck passes
  • Marker guard passes
  • Pre-validation is fire-and-forget with .catch(() => {}) — cannot fail the sql_execute call
  • trackPreValidation() emits telemetry with one of 4 outcomes (skipped, passed, blocked, error)

Checklist

  • Typecheck passes
  • Marker guard passes
  • No user-facing behavior change (shadow mode)
  • Telemetry event typed and documented

Summary by CodeRabbit

  • New Features
    • Added non-blocking SQL pre-execution validation that checks query structure against cached schema to enhance reliability.
    • Implemented comprehensive telemetry event tracking for SQL validation activities, capturing outcomes, performance metrics, and diagnostic details.

anandgupta42 and others added 2 commits March 29, 2026 07:56
- altimate-core NAPI binding: set `NODE_PATH` to global npm root so
  `require('@altimateai/altimate-core')` resolves after `npm install -g`
- upstream branding: replace "opencode" with "altimate-code" in user-facing
  `describe` strings (uninstall, tui, pr commands, config, server API docs)
- driver resolvability: set `NODE_PATH` in driver check loop and install
  `duckdb` alongside the main package so at least one peer dep is present
- hardcoded CI paths: restrict grep to JS/JSON files only — compiled Bun
  binaries embed build-machine paths in debug info which is unavoidable
- NAPI module exports: already had correct `NODE_PATH` in extended test;
  root cause was the base test (fix 1) which is now resolved

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…rules, and error UX

- Wire datafusion validation into sql_execute — catches column/table errors
  locally before hitting the warehouse (uses schema cache with 24h TTL)
- Add sql_pre_validation telemetry event to measure catch rate and latency
- Add apply_patch retry-with-re-read on verification failure — re-reads
  the file and retries once before giving up, with actionable error messages
- Add file-not-found cache in read tool — prevents retry loops on missing
  paths (capped at 500 entries)
- Add agent behavior rules to system prompt: act first/ask later, enforce
  read-before-edit, limit retries to 2 per input
- Add actionable connection error guidance in warehouse_test — maps common
  auth failures (wrong password, missing key, SSO timeout) to fix instructions
- Auto-pull schema cache in altimate_core_validate when no schema provided

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

Caution

Review failed

Pull request was closed or merged during review

📝 Walkthrough

Walkthrough

The PR adds SQL pre-validation functionality that executes before SQL processing using cached schema information, and introduces corresponding telemetry events to track validation outcomes (skipped, passed, blocked, or error).

Changes

Cohort / File(s) Summary
Telemetry Event Definition
packages/opencode/src/altimate/telemetry/index.ts
Added new sql_pre_validation event variant to Telemetry.Event union with outcome classification, timing, schema metadata, and optional error details.
SQL Pre-Validation Logic
packages/opencode/src/altimate/tools/sql-execute.ts
Implemented pre-execution SQL validation using cached schema; includes warehouse resolution with fallback, TTL-based cache freshness checks, schema context building with truncation tracking, validation invocation, and telemetry emission for all outcomes.

Sequence Diagram

sequenceDiagram
    participant SQLExec as SQL Execute Tool
    participant WH as Warehouse Registry
    participant Cache as Schema Cache
    participant Validator as altimate_core.validate
    participant Telemetry as Telemetry Tracker

    SQLExec->>WH: Resolve target warehouse<br/>(with fallback)
    WH-->>SQLExec: Warehouse instance

    SQLExec->>Cache: Check cache availability<br/>and freshness (TTL)
    Cache-->>SQLExec: Cached schema columns

    SQLExec->>SQLExec: Build schema context<br/>(apply scan limit)

    SQLExec->>Validator: Invoke validation<br/>with cached schema
    Validator-->>SQLExec: Validation errors

    alt Validation Errors Present
        SQLExec->>SQLExec: Categorize as 'blocked'
    else No Errors
        SQLExec->>SQLExec: Categorize as 'passed'
    end

    SQLExec->>Telemetry: Emit pre-validation event<br/>(outcome + metadata)
    Telemetry-->>SQLExec: Tracked
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • mdesmet

Poem

🐰 A shadow runs before the SQL starts to play,
Checking schemas through the cache's hidden way,
Validation whispers what the query cannot see,
Telemetry hops along in perfect harmony! ✨

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is incomplete. It lacks the required 'PINEAPPLE' marker at the top (required for AI-generated contributions), and the Test Plan section is missing. Add 'PINEAPPLE' at the very top of the description as required, and include a Test Plan section explaining how the changes were validated.
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: adding shadow-mode SQL pre-validation telemetry. It is concise, specific, and clearly conveys the primary objective.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/pre-execution-validation-and-error-resilience

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

…idation-and-error-resilience

# Conflicts:
#	test/sanity/phases/verify-install.sh
Reduces PR scope to telemetry-only based on deep analysis: the broader
fixes (prompt rules, warehouse_test guidance, apply_patch retry, read
file-not-found cache, altimate_core_validate auto-pull) were speculative
against an 8-machine / 1-day telemetry sample.

This PR now ships only what's needed to measure whether pre-execution
SQL validation is worth it:

- Keep: sql_pre_validation telemetry event + preValidateSql function
- Change: pre-validation runs fire-and-forget (shadow mode) — emits
  telemetry with outcome=skipped|passed|blocked|error but never blocks
  sql_execute. Zero user-facing latency impact.
- Revert: read.ts, apply_patch.ts, warehouse-test.ts, altimate-core-validate.ts,
  anthropic.txt system prompt changes — to be re-evaluated as separate
  PRs once real telemetry data validates need.

After 2 weeks of shadow telemetry, we can decide whether the blocking
behavior is worth the latency and false-positive risk.
@anandgupta42 anandgupta42 changed the title fix: pre-execution SQL validation, retry resilience, and error UX feat: shadow-mode SQL pre-validation telemetry Apr 5, 2026
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/opencode/src/altimate/tools/sql-execute.ts">

<violation number="1" location="packages/opencode/src/altimate/tools/sql-execute.ts:120">
P2: When `warehouse` is omitted, this validates against the cache’s first warehouse instead of the warehouse `sql.execute` will actually use.</violation>

<violation number="2" location="packages/opencode/src/altimate/tools/sql-execute.ts:140">
P2: Limiting validation to 10k columns can create false structural errors on large warehouses and bias the telemetry sample.</violation>

<violation number="3" location="packages/opencode/src/altimate/tools/sql-execute.ts:229">
P1: Mask the validator error before emitting telemetry; this currently uploads raw schema identifiers into App Insights.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

- P1: mask validator error via `Telemetry.maskString()` before emitting
  `sql_pre_validation` telemetry. Raw schema identifiers (table/column
  names, paths) no longer leak to App Insights.
- P2: resolve fallback warehouse via `Registry.list().warehouses[0]`
  (same path `sql.execute` uses) instead of the cache's first warehouse.
  Keeps shadow validation aligned with actual execution.
- P2: raise column-scan cap from 10k to 500k and add `schema_truncated`
  boolean to the event. Avoids false structural errors on large
  warehouses and lets analysis flag biased samples.
@anandgupta42 anandgupta42 merged commit 1a9c6fe into main Apr 5, 2026
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant