Add cDAC no-fallback test leg to runtime-diagnostics pipeline#126752
Add cDAC no-fallback test leg to runtime-diagnostics pipeline#126752max-charlamb wants to merge 12 commits intomainfrom
Conversation
|
Tagging subscribers to this area: @steveisok, @tommcdon, @dotnet/dotnet-diag |
There was a problem hiding this comment.
Pull request overview
Adds a new “cDAC no-fallback” diagnostics test leg and a corresponding runtime switch so diagnostics tests can run with cDAC without delegating unimplemented APIs to the legacy DAC.
Changes:
Entrypoints.cs: honorsCDAC_NO_FALLBACK=1by passingnullas the legacy DAC implementation toSOSDacImpl.runtime-diagnostics.yml: adds acDAC_no_fallbackjob that runs diagnostics tests withuseCdac: trueandCDAC_NO_FALLBACK=1.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
src/native/managed/cdac/mscordaccore_universal/Entrypoints.cs |
Adds an env-var controlled path to disable legacy DAC fallback in CreateSosInterface. |
eng/pipelines/runtime-diagnostics.yml |
Introduces a third diagnostics test leg to run cDAC tests with fallback disabled. |
| int IXCLRDataProcess.Request(uint reqCode, uint inBufferSize, byte* inBuffer, uint outBufferSize, byte* outBuffer) | ||
| => _legacyProcess is not null ? _legacyProcess.Request(reqCode, inBufferSize, inBuffer, outBufferSize, outBuffer) : HResults.E_NOTIMPL; | ||
| { | ||
| int hr = HResults.S_OK; | ||
| try | ||
| { | ||
| switch (reqCode) | ||
| { | ||
| case (uint)CLRDataGeneralRequest.CLRDATA_REQUEST_REVISION: | ||
| if (inBufferSize != 0 || inBuffer is not null || outBufferSize != sizeof(uint) || outBuffer is null) | ||
| return HResults.E_INVALIDARG; | ||
| *(uint*)outBuffer = 9; | ||
| break; | ||
|
|
||
| default: | ||
| if (_legacyProcess is not null) | ||
| return _legacyProcess.Request(reqCode, inBufferSize, inBuffer, outBufferSize, outBuffer); | ||
| return HResults.E_INVALIDARG; | ||
| } |
There was a problem hiding this comment.
This PR also changes IXCLRDataProcess.Request behavior (adds local handling for CLRDATA_REQUEST_REVISION and changes the no-legacy path from E_NOTIMPL to E_INVALIDARG for other reqCodes), but the PR description only mentions Entrypoints.cs and runtime-diagnostics.yml. Please update the PR description (or add a note) to reflect this additional functional change so reviewers understand why it’s needed for the new test leg.
e9ae9e6 to
d2f2469
Compare
This comment has been minimized.
This comment has been minimized.
d2f2469 to
f6db3bc
Compare
| object wrapped = cw.GetOrCreateObjectForComInstance(legacyImplPtr, CreateObjectFlags.None); | ||
| bool noFallback = Environment.GetEnvironmentVariable("CDAC_NO_FALLBACK") == "1"; | ||
| if (noFallback) | ||
| { | ||
| prevent_release = wrapped; | ||
| } | ||
| else | ||
| { | ||
| *obj = IntPtr.Zero; | ||
| return HResults.COR_E_INVALIDCAST; // E_NOINTERFACE | ||
| legacyObj = wrapped; | ||
| if (legacyObj is not Legacy.IDacDbiInterface) | ||
| { | ||
| *obj = IntPtr.Zero; | ||
| return HResults.COR_E_INVALIDCAST; // E_NOINTERFACE | ||
| } | ||
| } |
There was a problem hiding this comment.
In the no-fallback path, this code no longer validates that the provided COM object actually implements Legacy.IDacDbiInterface. Previously an incompatible legacyImplPtr would be rejected with COR_E_INVALIDCAST; now it will silently proceed, which can hide configuration/interop bugs and still leaves native holding a raw legacy pointer. Consider keeping the interface check regardless of CDAC_NO_FALLBACK (and only disabling delegation by passing null into DacDbiImpl).
| object? legacyImpl = null; | ||
| object? prevent_release = null; | ||
| if (legacyImplPtr != IntPtr.Zero) |
There was a problem hiding this comment.
The local variable name prevent_release uses snake_case, which is inconsistent with the rest of this C# file’s naming (camelCase locals like legacyImplPtr/legacyObj/etc.). Renaming to preventRelease would improve readability and consistency.
| // Prevents the legacy DAC RCW from being released when fallback is disabled. | ||
| // The native CDAC class holds a raw pointer (m_legacyImpl) to the legacy DAC | ||
| // that must remain valid for the lifetime of this object. | ||
| private readonly object? _prevent_release; |
There was a problem hiding this comment.
The new field name _prevent_release uses snake_case, which is inconsistent with the surrounding private fields (_target, _stringMethodTable, _legacyImpl*, etc.). Consider renaming to _preventRelease (and updating the corresponding ctor parameter name) to match established style in this file.
| private readonly object? _prevent_release; | |
| private readonly object? _preventRelease; |
| private readonly Target _target; | ||
| private readonly IDacDbiInterface? _legacy; | ||
| private readonly object? _prevent_release; | ||
|
|
||
| // IStringHolder is a native C++ abstract class (not COM) with a single virtual method: |
There was a problem hiding this comment.
The new field name _prevent_release uses snake_case, which is inconsistent with the rest of the private fields in this type (_target, _legacy). Consider renaming to _preventRelease (and aligning the ctor parameter name) to match the existing naming convention.
| jobParameters: | ||
| name: cDAC_no_fallback | ||
| useCdac: true | ||
| isOfficialBuild: ${{ variables.isOfficialBuild }} | ||
| liveRuntimeDir: $(Build.SourcesDirectory)/artifacts/runtime | ||
| timeoutInMinutes: 360 |
There was a problem hiding this comment.
The new cDAC_no_fallback leg doesn’t specify classFilter: SOS, unlike the existing cDAC and DAC jobs. If the intent is to measure SOS test coverage specifically (per PR description), this job will likely run a broader diagnostics test set than intended and increase runtime/cost. Consider adding the same class filter for consistency.
f6db3bc to
c4cc40d
Compare
When CDAC_NO_FALLBACK=1 is set alongside DOTNET_ENABLE_CDAC=1, the managed cDAC entrypoints (CreateSosInterface, CreateDacDbiInterface) skip wiring up the legacy DAC for fallback but hold an RCW reference to prevent use-after-free of the native CDAC::m_legacyImpl raw pointer. - Add _prevent_release field to SOSDacImpl and DacDbiImpl - Add no-fallback pipeline leg to runtime-diagnostics.yml Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c4cc40d to
4dc6bf8
Compare
The cDAC does not implement memory enumeration for dump creation. When CDAC_NO_FALLBACK is set, the legacy impl is stored in prevent_release but not used for delegation. This causes GenGetAuxMemory to return E_NOTIMPL during dump creation, resulting in dumps missing CLR auxiliary memory (e.g. ThreadStore). Extract ICLRDataEnumMemoryRegions from prevent_release so dump creation always delegates to the legacy DAC regardless of fallback mode. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
|
||
| // Prevents the legacy DAC RCW from being released when fallback is disabled. | ||
| // The native CDAC class holds a raw pointer (m_legacyImpl) to the legacy DAC | ||
| // that must remain valid for the lifetime of this object. |
There was a problem hiding this comment.
_prevent_release is assigned but never read, which will trigger CA1823 (unused private field). Since this repo sets TreatWarningsAsErrors=true, that warning will fail the build. Either reference the field (e.g., use it for the ICLRDataEnumMemoryRegions cast / call GC.KeepAlive(_prevent_release) once) or suppress CA1823 around this field with a justification comment.
| // that must remain valid for the lifetime of this object. | |
| // that must remain valid for the lifetime of this object. | |
| [System.Diagnostics.CodeAnalysis.SuppressMessage("Performance", "CA1823:Avoid unused private fields", Justification = "This field intentionally keeps the legacy DAC RCW alive for the lifetime of this instance.")] |
| public sealed unsafe partial class DacDbiImpl : IDacDbiInterface | ||
| { | ||
| private readonly Target _target; | ||
| private readonly IDacDbiInterface? _legacy; |
There was a problem hiding this comment.
_prevent_release is assigned but never read, which will trigger CA1823 (unused private field) and fail the build with TreatWarningsAsErrors=true. If this field is only meant to root the RCW, add a deliberate read (e.g., GC.KeepAlive(_prevent_release) once) or suppress CA1823 around the field with a justification.
| private readonly IDacDbiInterface? _legacy; | |
| private readonly IDacDbiInterface? _legacy; | |
| [System.Diagnostics.CodeAnalysis.SuppressMessage("Performance", "CA1823:Avoid unused private fields", Justification = "Intentionally retains a reference to root the RCW for the lifetime of this instance.")] |
The Address field in AuxiliarySymbolInfo was declared as T_POINTER but is read using ReadCodePointerField in the managed cDAC. Fix the declaration to use TYPE(CodePointer) to match the reader, resolving Debug.Assert type mismatch crashes during SOS tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
🤖 Copilot Code Review — PR #126752Note This review was generated by GitHub Copilot using Claude Opus 4.6, with three parallel explore agents for data descriptor type verification, COM lifecycle analysis, and CI pipeline review. Holistic AssessmentMotivation: The PR adds a no-fallback testing mode for the cDAC and fixes four data descriptor type annotations that cause Approach: The approach is sound. Storing the RCW in a field to prevent GC collection is the correct pattern when native code holds a raw pointer to a COM object. The Summary: Detailed Findings✅ Data Descriptor Type Fixes — All four verified correctAll changes in
Each type now matches both the C++ declaration and the managed reader. These fixes resolve ✅ RCW Lifetime Management — Correct patternThe Without this field in no-fallback mode, the GC could collect the RCW (since no ✅ ICLRDataEnumMemoryRegions Carve-Out — Necessary and correctThe cDAC does not implement memory enumeration for dump creation ( ✅ CLRDataCreateInstanceImpl — No treatment neededVerified that ✅ CI Pipeline — Well-structuredThe new
|
Replace the all-or-nothing CDAC_NO_FALLBACK mechanism with a per-method allowlist-based approach using LegacyFallbackHelper.CanFallback(). Previously, CDAC_NO_FALLBACK=1 nulled out all legacy DAC references at construction time, making every delegation-only API return E_NOTIMPL. This made it impossible to allow fallback for specific APIs while testing the rest standalone. Now: - Legacy refs are always kept alive (remove prevent_release pattern) - Each delegation-only call site gates on LegacyFallbackHelper.CanFallback() - In normal mode: CanFallback() returns true immediately (single bool check) - In no-fallback mode: only allowlisted methods can delegate - Initial allowlist: EnumMemoryRegion/EnumMemoryRegionsWrapper (dump creation) This enables granular testing where specific APIs can be selectively allowed to fall back while verifying all other APIs work standalone. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The allowlist had two bugs: 'EnumMemoryRegion' (singular) and 'EnumMemoryRegionsWrapper' (nonexistent) would never match the actual CallerMemberName 'EnumMemoryRegions'. Use nameof() to prevent this class of error. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove redundant null/type checks from CreateDacDbiInterface. The DacDbiImpl constructor handles interface mismatches gracefully via 'as' cast, and LegacyFallbackHelper gates all call sites. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The ICustomQueryInterface.GetInterface method in ClrDataModule delegates IMetaDataImport QIs to the legacy module pointer. Gate this with CanFallback() so no-fallback mode blocks it, allowing PR #127028's managed MetadataReader wrapper to provide metadata instead. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
When CDAC_NO_FALLBACK=1, blocked legacy delegation attempts are now counted per-method in a ConcurrentDictionary. On process exit, a summary is written to ~/cdac_blocked_fallbacks.log showing which APIs were called, how many times, and were denied fallback. This makes it easy to identify which unimplemented APIs are actually exercised during test runs without separate instrumentation. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ICustomQueryInterface.GetInterface to the allowlist (needed until managed MetadataReader wrapper lands in PR #127028) - Replace file-based logging with Console.Error.WriteLine so blocked fallback calls appear directly in test output (captured by ProcessRunner) - Simplify tracking to ConcurrentDictionary<string, bool> with TryAdd - Remove Flush() CanFallback gate (cache management, not data retrieval) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove deduplication — log every call for full visibility. Also log allowed fallback calls so the test output shows the complete picture of legacy DAC usage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add per-method allowlist entries for IXCLRDataModule.GetMethodDefinitionByToken, ISOSDacInterface11.IsTrackedType, and ISOSDacInterface.TraverseLoaderHeap. Add file-level allowlist for DacDbiImpl.cs since the entire DBI interface (122 delegation-only methods) is deferred and not implemented in the cDAC. SOS test results with CDAC_NO_FALLBACK=1: 24 passed, 0 blocked fallbacks. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| // Methods that are allowed to fall back even in no-fallback mode. | ||
| // Use the method name as it appears via [CallerMemberName]. | ||
| private static readonly HashSet<string> s_allowlist = new(StringComparer.Ordinal) | ||
| { | ||
| // Dump creation — the cDAC does not implement memory enumeration. | ||
| nameof(ICLRDataEnumMemoryRegions.EnumMemoryRegions), | ||
|
|
||
| // IMetaDataImport QI — needed until managed MetadataReader wrapper lands (PR #127028). | ||
| nameof(ICustomQueryInterface.GetInterface), |
There was a problem hiding this comment.
The allowlist relies on [CallerMemberName] matching entries like nameof(ICLRDataEnumMemoryRegions.EnumMemoryRegions) ("EnumMemoryRegions"). For explicit interface implementations, CallerMemberName may include the interface-qualified member name (e.g., "ICLRDataEnumMemoryRegions.EnumMemoryRegions"), which would cause allowlisted fallbacks to be incorrectly blocked (breaking dump creation / IMetaDataImport QI in no-fallback mode). Consider normalizing name before lookup (e.g., strip everything up to the last '.') or passing an explicit constant name from call sites.
| if (s_allowlist.Contains(name) || s_fileAllowlist.Contains(Path.GetFileName(file))) | ||
| { | ||
| try | ||
| { | ||
| Console.Error.WriteLine($"[cDAC] Allowed fallback: {name} at {Path.GetFileName(file)}:{line}"); | ||
| } | ||
| catch | ||
| { | ||
| // Best-effort logging — don't crash the debugger process. | ||
| } | ||
|
|
||
| return true; | ||
| } | ||
|
|
||
| try | ||
| { | ||
| Console.Error.WriteLine($"[cDAC] Blocked fallback: {name} at {Path.GetFileName(file)}:{line}"); | ||
| } |
There was a problem hiding this comment.
CanFallback logs to Console.Error on every call in no-fallback mode, including allowed fallbacks. Given how frequently these shims can be hit, this can significantly slow down diagnostics runs and produce very large logs. Consider logging only blocked fallbacks by default (or gating allowed logging behind a separate env var / sampling), and/or avoid calling CanFallback unless a legacy implementation is actually present and delegation would occur.
| { | ||
| ppv = default; | ||
| if (_legacyModulePointer == 0) | ||
| if (!LegacyFallbackHelper.CanFallback() || _legacyModulePointer == 0) |
There was a problem hiding this comment.
LegacyFallbackHelper.CanFallback() is evaluated before checking _legacyModulePointer == 0, which means no-fallback mode will still log/compute fallback decisions even when fallback is impossible (no legacy COM instance). Reorder the condition to check _legacyModulePointer == 0 first so CanFallback() is only invoked when delegation is actually possible.
| if (!LegacyFallbackHelper.CanFallback() || _legacyModulePointer == 0) | |
| if (_legacyModulePointer == 0 || !LegacyFallbackHelper.CanFallback()) |
| ComWrappers cw = new StrategyBasedComWrappers(); | ||
| Target? target = GCHandle.FromIntPtr(handle).Target as Target; |
There was a problem hiding this comment.
CreateDacDbiInterface no longer validates obj or handle before dereferencing/using them. In particular, GCHandle.FromIntPtr(handle).Target can throw if handle is IntPtr.Zero or invalid, and writing *obj = ... will AV if obj is null. Consider restoring the previous argument checks and ensuring *obj is set to null/zero on failure to avoid propagating garbage pointers to native callers.
| ComWrappers cw = new StrategyBasedComWrappers(); | |
| Target? target = GCHandle.FromIntPtr(handle).Target as Target; | |
| if (obj is null) | |
| return -1; | |
| *obj = 0; | |
| if (handle == IntPtr.Zero) | |
| return -1; | |
| ComWrappers cw = new StrategyBasedComWrappers(); | |
| Target? target; | |
| try | |
| { | |
| target = GCHandle.FromIntPtr(handle).Target as Target; | |
| } | |
| catch (InvalidOperationException) | |
| { | |
| return -1; | |
| } |
Note
This PR description was generated with the assistance of GitHub Copilot.
Summary
Replace the binary
CDAC_NO_FALLBACKon/off switch with a granular, per-method allowlist (LegacyFallbackHelper) that controls which delegation-only APIs may fall back to the legacy DAC. This enables selective no-fallback testing — blocking fallback for most APIs while allowing specific APIs that are known to not yet be implemented in the cDAC.All fallback attempts (both allowed and blocked) are logged to stderr with method name, file, and line number for capture by the diagnostics test infrastructure.
Changes
LegacyFallbackHelper.cs — Granular fallback control
New static helper that every delegation-only call site invokes via
CanFallback(). Uses[CallerMemberName],[CallerFilePath], and[CallerLineNumber]to identify the call site.CDAC_NO_FALLBACKunset): Always returnstrue(singleboolcheck,[AggressiveInlining])CDAC_NO_FALLBACK=1): Checks method name against aHashSet<string>allowlist and file name against a file-level allowlistPer-method allowlist:
EnumMemoryRegionsGetInterfaceGetMethodDefinitionByTokenIsTrackedTypeTraverseLoaderHeapFile-level allowlist:
DacDbiImpl.csEntrypoints.cs — Simplified creation
Both
CreateSosInterfaceandCreateDacDbiInterfacenow follow the same pattern: the legacy implementation is always passed through, andLegacyFallbackHelper.CanFallback()at each call site decides whether to delegate. Removedprevent_release,noFallbackenv var check, and null-legacy-ref logic.13 Legacy wrapper files — Instrumented delegation sites
All 296 delegation-only methods across all legacy wrapper files now call
LegacyFallbackHelper.CanFallback():SOSDacImpl.cs(12 methods)SOSDacImpl.IXCLRDataProcess.cs(38 methods,Flush()intentionally excluded — cache management)ClrDataModule.cs(29 methods + IMetaDataImport QI)DacDbiImpl.cs(122 methods)ClrDataTask.cs,ClrDataExceptionState.cs,ClrDataFrame.cs,ClrDataValue.cs,ClrDataTypeInstance.cs,ClrDataMethodInstance.cs,ClrDataStackWalk.cs,ClrDataProcess.csStderr logging
Every fallback attempt is logged to stderr in the format:
The diagnostics test infrastructure (
ProcessRunner) captures stderr and routes it to xunit test output withSTDERROR:prefix, making fallback usage visible in test results.Test Results
With
CDAC_NO_FALLBACK=1and the current allowlist, running the full SOS test suite against a private runtime build:Motivation
The existing cDAC test leg always has the legacy DAC as a fallback, so unimplemented APIs are silently handled. The granular no-fallback mode makes gaps visible per-method, helping track progress toward full cDAC coverage while keeping tests green for known-deferred APIs.