Skip to content

Commit c960d48

Browse files
authored
fix(sandbox): canonicalize HTTP request-targets before L7 policy evaluation (#878)
* fix(sandbox): canonicalize HTTP request-targets before L7 policy evaluation The L7 REST proxy evaluated OPA path rules against the raw request-target and forwarded the raw header block byte-for-byte to the upstream. The upstream normalized `..`, `%2e%2e`, and `//` before dispatching, so a compromised agent inside the sandbox could bypass any path-based allow rule by prefixing the blocked path with an allowed one (e.g. `GET /public/../secret` matches `/public/**` at the proxy but the upstream serves `/secret`). The inference-routing detection had the same normalization gap via `normalize_inference_path`, which only stripped scheme+authority and preserved dot-segments verbatim. - Add `crates/openshell-sandbox/src/l7/path.rs` with `canonicalize_request_target`. Percent-decodes (with uppercase hex re-emission), resolves `.` / `..` segments per RFC 3986 5.2.4, collapses doubled slashes, strips trailing `;params`, rejects fragments / raw non-ASCII / control bytes / encoded slashes (unless explicitly enabled per-endpoint), and returns a `rewritten` flag so callers know when to rewrite the outbound request line. - Wire the canonicalizer into `rest.rs::parse_http_request`. The canonical path is the `target` on `L7Request` that OPA sees, and when the canonical form differs from the raw input the request line is rebuilt in `raw_header` so the upstream dispatches on the exact bytes the policy evaluated. - Canonicalize the forward (plain HTTP proxy) path at `proxy.rs`'s second L7 evaluation site. A non-canonical request-target is rejected with 400 Bad Request and an OCSF `NetworkActivity` Fail event rather than silently passed through. - Replace the old `normalize_inference_path` body with a call to the canonicalizer. On canonicalization error the raw path is returned (so inference detection simply misses and the normal forward path handles and rejects). - Document the invariant in `sandbox-policy.rego`: `input.request.path` is pre-canonicalized so rules must not attempt defensive matching against `..` or `%2e%2e`. - Architecture doc (`architecture/sandbox.md`) lists the new module. - 24 new unit tests cover dot segments, percent-encoded dot segments, double-slash collapse, encoded-slash reject/opt-in, null/control byte rejection, legitimate percent-encoded bytes round-tripping, mixed-case percent normalization, fragment rejection, absolute-form stripping, empty / length-bounded / non-ASCII handling, and regression payloads for the specific bypasses above. Tracks OS-99. * fix(sandbox): close forward-proxy parser-differential and wire allow_encoded_slash Addresses two review findings on the L7 path canonicalization change. 1. `handle_forward_proxy` previously evaluated OPA on the canonical path but wrote the raw path to the upstream via `rewrite_forward_request`, leaving the parser-differential open for plain-HTTP forward-proxy traffic even though it was closed for the L7 tunnel path. `path` is now reassigned to the canonical form inside the L7 block before the outbound write, so the bytes dispatched to the upstream match what OPA evaluated. Covered by a new regression test against `rewrite_forward_request`. 2. `allow_encoded_slash` is now wired through the YAML policy schema, the proto `NetworkEndpoint` message, the Rego endpoint config, and the `L7EndpointConfig` → `RestProvider` canonicalization options. Operators can opt in per-endpoint for upstreams like GitLab that embed `%2F` in namespaced resource paths; the default remains strict. The `RestProvider` gains a constructor `RestProvider::with_options` so `relay_rest` constructs a provider with per-endpoint options while `relay_passthrough_with_credentials` retains strict defaults. Integration tests: - `parse_http_request_canonicalizes_target_and_rewrites_raw_header` drives a non-canonical request through the parser and asserts both `L7Request.target` (what OPA evaluates) and `raw_header` (what the upstream receives) are canonical. - `parse_http_request_canonicalization_preserves_query_string`, `parse_http_request_leaves_canonical_input_byte_for_byte`, `parse_http_request_rejects_traversal_above_root`, `parse_http_request_preserves_http_10_version_on_rewrite`. - `parse_http_request_accepts_encoded_slash_when_endpoint_opts_in` / `_rejects_encoded_slash_by_default` verify the `allow_encoded_slash` gate. - `test_rewrite_forward_request_uses_canonical_path_on_the_wire` asserts the fix to the forward-proxy flow. - `parse_l7_config_allow_encoded_slash_{defaults_false,opt_in}` verify the YAML/Rego wiring.
1 parent 9ac725f commit c960d48

10 files changed

Lines changed: 1018 additions & 40 deletions

File tree

architecture/sandbox.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ All paths are relative to `crates/openshell-sandbox/src/`.
3232
| `l7/tls.rs` | Ephemeral CA generation (`SandboxCa`), per-hostname leaf cert cache (`CertCache`), TLS termination/connection helpers, `looks_like_tls()` auto-detection |
3333
| `l7/relay.rs` | Protocol-aware bidirectional relay with per-request OPA evaluation, credential-injection-only passthrough relay |
3434
| `l7/rest.rs` | HTTP/1.1 request/response parsing, body framing (Content-Length, chunked), deny response generation |
35+
| `l7/path.rs` | Request-target canonicalization: percent-decoding, dot-segment resolution, `;params` stripping, encoded-slash policy (opt-in per endpoint via `allow_encoded_slash: true` for upstreams like GitLab that embed `%2F` in paths). Single source of truth for the path both OPA evaluates and the upstream receives. |
3536
| `l7/provider.rs` | `L7Provider` trait and `L7Request`/`BodyLength` types |
3637
| `secrets.rs` | `SecretResolver` credential placeholder system — placeholder generation, multi-location rewriting (headers, query params, path segments, Basic auth), fail-closed scanning, secret validation, percent-encoding |
3738

crates/openshell-policy/src/lib.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,12 @@ struct NetworkEndpointDef {
109109
allowed_ips: Vec<String>,
110110
#[serde(default, skip_serializing_if = "Vec::is_empty")]
111111
deny_rules: Vec<L7DenyRuleDef>,
112+
/// When true, percent-encoded `/` (`%2F`) is preserved in path segments
113+
/// rather than rejected by the L7 path canonicalizer. Required for
114+
/// upstreams like GitLab that embed `%2F` in namespaced resource paths.
115+
/// Defaults to false (strict).
116+
#[serde(default, skip_serializing_if = "std::ops::Not::not")]
117+
allow_encoded_slash: bool,
112118
}
113119

114120
fn is_zero(v: &u16) -> bool {
@@ -261,6 +267,7 @@ fn to_proto(raw: PolicyFile) -> SandboxPolicy {
261267
.collect(),
262268
})
263269
.collect(),
270+
allow_encoded_slash: e.allow_encoded_slash,
264271
}
265272
})
266273
.collect(),
@@ -400,6 +407,7 @@ fn from_proto(policy: &SandboxPolicy) -> PolicyFile {
400407
.collect(),
401408
})
402409
.collect(),
410+
allow_encoded_slash: e.allow_encoded_slash,
403411
}
404412
})
405413
.collect(),

crates/openshell-sandbox/data/sandbox-policy.rego

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -328,6 +328,13 @@ method_matches(actual, expected) if {
328328
}
329329

330330
# Path matching: "**" matches everything; otherwise glob.match with "/" delimiter.
331+
#
332+
# INVARIANT: `input.request.path` is canonicalized by the sandbox before
333+
# policy evaluation — percent-decoded, dot-segments resolved, doubled
334+
# slashes collapsed, `;params` stripped, `%2F` rejected (unless an
335+
# endpoint opts in). Patterns here must therefore match canonical paths;
336+
# do not attempt defensive matching against `..` or `%2e%2e` — those
337+
# inputs are rejected at the L7 parser boundary before this rule runs.
331338
path_matches(_, "**") if true
332339

333340
path_matches(actual, pattern) if {
@@ -394,8 +401,8 @@ command_matches(actual, expected) if {
394401

395402
# --- Matched endpoint config (for L7 and allowed_ips extraction) ---
396403
# Returns the raw endpoint object for the matched policy + host:port.
397-
# Used by Rust to extract L7 config (protocol, tls, enforcement) and/or
398-
# allowed_ips for SSRF allowlist validation.
404+
# Used by Rust to extract L7 config (protocol, tls, enforcement,
405+
# allow_encoded_slash) and/or allowed_ips for SSRF allowlist validation.
399406

400407
# Per-policy helper: returns matching endpoint configs for a single policy.
401408
_policy_endpoint_configs(policy) := [ep |

crates/openshell-sandbox/src/l7/mod.rs

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,7 @@
99
//! evaluated against OPA policy, and either forwarded or denied.
1010
1111
pub mod inference;
12+
pub mod path;
1213
pub mod provider;
1314
pub mod relay;
1415
pub mod rest;
@@ -59,6 +60,10 @@ pub struct L7EndpointConfig {
5960
pub protocol: L7Protocol,
6061
pub tls: TlsMode,
6162
pub enforcement: EnforcementMode,
63+
/// When true, percent-encoded `/` (`%2F`) is preserved in path segments
64+
/// rather than rejected at the parser. Needed by upstreams like GitLab
65+
/// that embed `%2F` in namespaced project paths. Defaults to false.
66+
pub allow_encoded_slash: bool,
6267
}
6368

6469
/// Result of an L7 policy decision for a single request.
@@ -122,10 +127,13 @@ pub fn parse_l7_config(val: &regorus::Value) -> Option<L7EndpointConfig> {
122127
_ => EnforcementMode::Audit,
123128
};
124129

130+
let allow_encoded_slash = get_object_bool(val, "allow_encoded_slash").unwrap_or(false);
131+
125132
Some(L7EndpointConfig {
126133
protocol,
127134
tls,
128135
enforcement,
136+
allow_encoded_slash,
129137
})
130138
}
131139

@@ -141,6 +149,19 @@ pub fn parse_tls_mode(val: &regorus::Value) -> TlsMode {
141149
}
142150
}
143151

152+
/// Extract a bool value from a regorus object. Returns `None` when the key
153+
/// is absent or not a boolean.
154+
fn get_object_bool(val: &regorus::Value, key: &str) -> Option<bool> {
155+
let key_val = regorus::Value::String(key.into());
156+
match val {
157+
regorus::Value::Object(map) => match map.get(&key_val) {
158+
Some(regorus::Value::Bool(b)) => Some(*b),
159+
_ => None,
160+
},
161+
_ => None,
162+
}
163+
}
164+
144165
/// Extract a string value from a regorus object.
145166
fn get_object_str(val: &regorus::Value, key: &str) -> Option<String> {
146167
let key_val = regorus::Value::String(key.into());
@@ -746,6 +767,26 @@ mod tests {
746767
assert!(parse_l7_config(&val).is_none());
747768
}
748769

770+
#[test]
771+
fn parse_l7_config_allow_encoded_slash_defaults_false() {
772+
let val = regorus::Value::from_json_str(
773+
r#"{"protocol": "rest", "host": "api.example.com", "port": 443}"#,
774+
)
775+
.unwrap();
776+
let config = parse_l7_config(&val).unwrap();
777+
assert!(!config.allow_encoded_slash);
778+
}
779+
780+
#[test]
781+
fn parse_l7_config_allow_encoded_slash_opt_in() {
782+
let val = regorus::Value::from_json_str(
783+
r#"{"protocol": "rest", "host": "gitlab.example.com", "port": 443, "allow_encoded_slash": true}"#,
784+
)
785+
.unwrap();
786+
let config = parse_l7_config(&val).unwrap();
787+
assert!(config.allow_encoded_slash);
788+
}
789+
749790
#[test]
750791
fn validate_rules_and_access_mutual_exclusion() {
751792
let data = serde_json::json!({

0 commit comments

Comments
 (0)