Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 13 minutes and 10 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughAdds a new Kubernetes Pipeline resource Changes
Sequence DiagramsequenceDiagram
participant Handler as API Handler
participant Calc as Capacity Calculator
participant K8sAPI as Kubernetes API
participant Scheduler as Scheduler Service
participant Report as Capacity Report
Handler->>Calc: NewCapacityCalculator(client, config)
Handler->>Calc: CalculateCapacity(ctx)
Calc->>K8sAPI: List HostDetailsKnowledge CRDs
K8sAPI-->>Calc: Host→AZ mappings
Calc->>K8sAPI: List Hypervisor CRDs
K8sAPI-->>Calc: Hypervisor capacity data
Note over Calc: deduplicate & sort AZs
loop per AZ
Calc->>Scheduler: Query eligible hosts (pipeline=config.ReportCapacityTotalPipeline, AZ)
Scheduler-->>Calc: Host list
Note over Calc: map hosts → Hypervisors, read EffectiveCapacity/Capacity
Note over Calc: compute floor(EffectiveMemory / smallestFlavorMemory)
end
Calc->>Report: Build per-AZ capacity report
Report-->>Handler: ServiceCapacityReport
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
…nt64 casts, combine return types
…t-report-capacity
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
internal/scheduling/reservations/commitments/config.go (1)
69-97:⚠️ Potential issue | 🟠 MajorMissing default application for
ReportCapacityTotalPipeline.The
ApplyDefaults()method does not set a default for the newReportCapacityTotalPipelinefield. WhileDefaultConfig()defines it as"kvm-report-capacity", configurations that don't specify this field will use an empty string, which would cause scheduler calls to fail.🐛 Proposed fix
if c.ChangeAPIWatchReservationsPollInterval == 0 { c.ChangeAPIWatchReservationsPollInterval = defaults.ChangeAPIWatchReservationsPollInterval } + if c.ReportCapacityTotalPipeline == "" { + c.ReportCapacityTotalPipeline = defaults.ReportCapacityTotalPipeline + } // Note: EnableChangeCommitmentsAPI, EnableReportUsageAPI, EnableReportCapacityAPI // are booleans where false is a valid value, so we don't apply defaults for them }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/config.go` around lines 69 - 97, ApplyDefaults on Config is missing setting a default for the new ReportCapacityTotalPipeline field; add a check like the other string fields so that if c.ReportCapacityTotalPipeline == "" it gets assigned DefaultConfig().ReportCapacityTotalPipeline. Update the ApplyDefaults method (referencing Config.ApplyDefaults and DefaultConfig) to set ReportCapacityTotalPipeline alongside PipelineDefault and SchedulerURL to ensure scheduler calls receive the default "kvm-report-capacity" when unspecified.
🧹 Nitpick comments (5)
helm/bundles/cortex-nova/templates/pipelines_kvm.yaml (1)
577-594: Consider addingfilter_capabilitiesfor flavor-accurate capacity.Other scheduling pipelines (e.g.,
kvm-general-purpose-load-balancing,kvm-hana-bin-packing) includefilter_capabilitiesto ensure hosts meet compute capabilities requested by flavor extra specs. Without this filter, hosts that cannot actually run certain flavors (due to architecture or capability mismatches) may be incorrectly included in capacity calculations.♻️ Consider adding filter_capabilities
- name: filter_status_conditions description: | Excludes hosts that are not ready or are disabled. + - name: filter_capabilities + description: | + Filters hosts that do not meet the compute capabilities + requested by the nova flavor extra specs. weighers: [] {{- end }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@helm/bundles/cortex-nova/templates/pipelines_kvm.yaml` around lines 577 - 594, The pipeline is missing the filter_capabilities filter which enforces flavor extra-spec compute capabilities; add a filter entry named filter_capabilities into the filters list (e.g., immediately after filter_has_requested_traits or before filter_has_enough_capacity) so capacity calculations honor architecture/capability constraints; ensure the new entry uses the same description style and any needed params to consult flavor extra_specs (match how other pipelines like kvm-general-purpose-load-balancing declare filter_capabilities) so hosts without required capabilities are excluded.internal/scheduling/reservations/commitments/api_report_capacity_test.go (1)
325-326: Minor: Variable names shadow thehv1package import.The variables
hv1Objandhv2Objmay cause confusion sincehv1is also an imported package alias. Consider more descriptive names.♻️ Suggested rename
- hv1Obj := createTestHypervisor("host-1", "256Gi", "128Gi") - hv2Obj := createTestHypervisor("host-2", "128Gi", "0") + hvHost1 := createTestHypervisor("host-1", "256Gi", "128Gi") + hvHost2 := createTestHypervisor("host-2", "128Gi", "0")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_report_capacity_test.go` around lines 325 - 326, The test uses variable names hv1Obj and hv2Obj which shadow/confuse the imported package alias hv1; rename these local variables to more descriptive, non-conflicting identifiers (e.g., host1HV, host2HV or hypervisorHost1, hypervisorHost2) where they are created by createTestHypervisor and used thereafter in api_report_capacity_test.go so references to hv1 (the package) remain unambiguous.internal/scheduling/reservations/commitments/capacity.go (3)
133-147: Silent failure may hide scheduler issues.When the scheduler call fails for an AZ, the code returns
capacity=0without logging the error. This could mask configuration or connectivity issues in production. Consider logging the error before continuing.♻️ Proposed fix to log scheduler errors
for _, az := range azs { capacity, err := c.calculateInstanceCapacity(ctx, groupName, groupData, az, hvByName) if err != nil { - // On failure, report az with capacity=0 rather than aborting entirely. + // On failure, log and report az with capacity=0 rather than aborting entirely. + // Using context logger if available, or consider adding logger to CapacityCalculator. result[liquid.AvailabilityZone(az)] = &liquid.AZResourceCapacityReport{ Capacity: 0, Usage: None[uint64](),Note: You may want to add a logger to the
CapacityCalculatorstruct to properly log this error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/capacity.go` around lines 133 - 147, The loop over azs silences errors from calculateInstanceCapacity by setting Capacity=0 without logging; modify CapacityCalculator to include a logger (or use an existing logger field), and in the az iteration where err != nil log the error with context (include groupName, az, and err) before assigning the AZResourceCapacityReport and continuing; update CapacityCalculator's constructor/usage sites as needed to supply the logger so calculateInstanceCapacity failures are observable in logs.
219-248: Consider filtering by scheduling domain for host details.The
getHostAZMapfunction only filters by extractor name (sap_host_details_extractor) but doesn't verify the scheduling domain matches Nova. This could potentially pick up host details from other scheduling domains if they exist.♻️ Add scheduling domain check
for _, knowledge := range knowledgeList.Items { + if knowledge.Spec.SchedulingDomain != v1alpha1.SchedulingDomainNova { + continue + } if knowledge.Spec.Extractor.Name != "sap_host_details_extractor" { continue }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/capacity.go` around lines 219 - 248, getHostAZMap currently only filters Knowledge CRDs by knowledge.Spec.Extractor.Name == "sap_host_details_extractor" and may pull host details from other scheduling domains; update the loop in getHostAZMap to also check knowledge.Spec.SchedulingDomain (e.g., equals "nova") before unboxing features so only Nova scheduling-domain entries are used, keeping the existing hostAZEntry parsing and hostAZMap population logic intact.
172-185: Consider using a more stable instance UUID for capacity checks.The
InstanceUUIDincludestime.Now().UnixNano()which could cause issues if capacity is calculated rapidly (e.g., duplicate UUIDs within the same nanosecond). Since this is only used for capacity queries (not actual reservations), consider using a UUID generator or a deterministic identifier.♻️ Alternative using UUID
+ "github.com/google/uuid" ... resp, err := c.schedulerClient.ScheduleReservation(ctx, reservations.ScheduleReservationRequest{ - InstanceUUID: fmt.Sprintf("capacity-%s-%s-%d", groupName, az, time.Now().UnixNano()), + InstanceUUID: fmt.Sprintf("capacity-%s", uuid.New().String()),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/capacity.go` around lines 172 - 185, The InstanceUUID for ScheduleReservationRequest in c.schedulerClient.ScheduleReservation currently uses time.Now().UnixNano(), which is unstable for rapid capacity checks; replace that timestamp with a proper UUID or deterministic identifier (e.g., generate a uuid with your project’s UUID lib or a stable hash of groupName+az+pipeline) when building the InstanceUUID field so each capacity query is unique and collision-free; update the construction inside the ScheduleReservationRequest (refer to InstanceUUID and ScheduleReservationRequest in capacity.go) to use the UUID generator and ensure imports and error handling for the generator are added accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/scheduling/reservations/commitments/config.go`:
- Around line 69-97: ApplyDefaults on Config is missing setting a default for
the new ReportCapacityTotalPipeline field; add a check like the other string
fields so that if c.ReportCapacityTotalPipeline == "" it gets assigned
DefaultConfig().ReportCapacityTotalPipeline. Update the ApplyDefaults method
(referencing Config.ApplyDefaults and DefaultConfig) to set
ReportCapacityTotalPipeline alongside PipelineDefault and SchedulerURL to ensure
scheduler calls receive the default "kvm-report-capacity" when unspecified.
---
Nitpick comments:
In `@helm/bundles/cortex-nova/templates/pipelines_kvm.yaml`:
- Around line 577-594: The pipeline is missing the filter_capabilities filter
which enforces flavor extra-spec compute capabilities; add a filter entry named
filter_capabilities into the filters list (e.g., immediately after
filter_has_requested_traits or before filter_has_enough_capacity) so capacity
calculations honor architecture/capability constraints; ensure the new entry
uses the same description style and any needed params to consult flavor
extra_specs (match how other pipelines like kvm-general-purpose-load-balancing
declare filter_capabilities) so hosts without required capabilities are
excluded.
In `@internal/scheduling/reservations/commitments/api_report_capacity_test.go`:
- Around line 325-326: The test uses variable names hv1Obj and hv2Obj which
shadow/confuse the imported package alias hv1; rename these local variables to
more descriptive, non-conflicting identifiers (e.g., host1HV, host2HV or
hypervisorHost1, hypervisorHost2) where they are created by createTestHypervisor
and used thereafter in api_report_capacity_test.go so references to hv1 (the
package) remain unambiguous.
In `@internal/scheduling/reservations/commitments/capacity.go`:
- Around line 133-147: The loop over azs silences errors from
calculateInstanceCapacity by setting Capacity=0 without logging; modify
CapacityCalculator to include a logger (or use an existing logger field), and in
the az iteration where err != nil log the error with context (include groupName,
az, and err) before assigning the AZResourceCapacityReport and continuing;
update CapacityCalculator's constructor/usage sites as needed to supply the
logger so calculateInstanceCapacity failures are observable in logs.
- Around line 219-248: getHostAZMap currently only filters Knowledge CRDs by
knowledge.Spec.Extractor.Name == "sap_host_details_extractor" and may pull host
details from other scheduling domains; update the loop in getHostAZMap to also
check knowledge.Spec.SchedulingDomain (e.g., equals "nova") before unboxing
features so only Nova scheduling-domain entries are used, keeping the existing
hostAZEntry parsing and hostAZMap population logic intact.
- Around line 172-185: The InstanceUUID for ScheduleReservationRequest in
c.schedulerClient.ScheduleReservation currently uses time.Now().UnixNano(),
which is unstable for rapid capacity checks; replace that timestamp with a
proper UUID or deterministic identifier (e.g., generate a uuid with your
project’s UUID lib or a stable hash of groupName+az+pipeline) when building the
InstanceUUID field so each capacity query is unique and collision-free; update
the construction inside the ScheduleReservationRequest (refer to InstanceUUID
and ScheduleReservationRequest in capacity.go) to use the UUID generator and
ensure imports and error handling for the generator are added accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 6b45a729-7995-4218-9095-06ea2ee4dcc0
📒 Files selected for processing (7)
helm/bundles/cortex-nova/templates/pipelines_kvm.yamlhelm/bundles/cortex-nova/values.yamlinternal/scheduling/nova/plugins/filters/filter_has_enough_capacity.gointernal/scheduling/reservations/commitments/api_report_capacity.gointernal/scheduling/reservations/commitments/api_report_capacity_test.gointernal/scheduling/reservations/commitments/capacity.gointernal/scheduling/reservations/commitments/config.go
| func (c *CapacityCalculator) getHostAZMap(ctx context.Context) (map[string]string, error) { | ||
| var knowledgeList v1alpha1.KnowledgeList | ||
| if err := c.client.List(ctx, &knowledgeList); err != nil { | ||
| return nil, fmt.Errorf("failed to list Knowledge CRDs: %w", err) | ||
| } | ||
|
|
||
| type hostAZEntry struct { | ||
| ComputeHost string `json:"ComputeHost"` | ||
| AvailabilityZone string `json:"AvailabilityZone"` | ||
| } | ||
|
|
||
| hostAZMap := make(map[string]string) | ||
| for _, knowledge := range knowledgeList.Items { | ||
| if knowledge.Spec.SchedulingDomain != v1alpha1.SchedulingDomainNova { | ||
| continue | ||
| } | ||
| if knowledge.Spec.Extractor.Name != "sap_host_details_extractor" { | ||
| continue | ||
| } | ||
| features, err := v1alpha1.UnboxFeatureList[hostAZEntry](knowledge.Status.Raw) | ||
| if err != nil { | ||
| continue | ||
| } | ||
| for _, feature := range features { | ||
| if feature.ComputeHost != "" && feature.AvailabilityZone != "" { | ||
| hostAZMap[feature.ComputeHost] = feature.AvailabilityZone | ||
| } | ||
| } | ||
| } | ||
|
|
||
| return hostAZMap, nil | ||
| } | ||
|
|
||
| func (c *CapacityCalculator) getAvailabilityZones(ctx context.Context) ([]string, error) { | ||
| hostAZMap, err := c.getHostAZMap(ctx) | ||
| if err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| azSet := make(map[string]struct{}) | ||
| for _, az := range hostAZMap { | ||
| azSet[az] = struct{}{} | ||
| } | ||
|
|
||
| azs := make([]string, 0, len(azSet)) | ||
| for az := range azSet { | ||
| azs = append(azs, az) | ||
| } | ||
| sort.Strings(azs) | ||
|
|
||
| return azs, nil | ||
| } |
There was a problem hiding this comment.
Some comments on that:
-
Why do we first create a host to az map that we then reduce to an AZ set in another function. Wouldn't it make much more sense to have that in one function? Or do we need that map somewhere else.
-
Not very obvious by the name (will change that), but the Host Details Knowledge was basically my approach to have an VMware hypervisor CRD. All the hosts inside this knowledge only refer to vmware hosts! Since this feature is intended to be used for kvm I would suggest that we use the Hypervisor CRD for this. The availability zone there is available via a label (
topology.kubernetes.io/zone).
There was a problem hiding this comment.
Does this look better to you now?
| // All flavor groups are included, not just those with fixed RAM/core ratio. | ||
| // The request provides the list of all AZs from Limes that must be included in the report. | ||
| func (c *CapacityCalculator) CalculateCapacity(ctx context.Context, req liquid.ServiceCapacityRequest) (liquid.ServiceCapacityReport, error) { | ||
| // AZs are derived from HostDetails Knowledge CRDs. |
There was a problem hiding this comment.
See comment about the host details knowledge
| resp, err := c.schedulerClient.ScheduleReservation(ctx, reservations.ScheduleReservationRequest{ | ||
| InstanceUUID: uuid.New().String(), | ||
| ProjectID: "cortex-capacity-check", | ||
| FlavorName: smallestFlavor.Name, | ||
| MemoryMB: smallestFlavor.MemoryMB, | ||
| VCPUs: smallestFlavor.VCPUs, | ||
| FlavorExtraSpecs: map[string]string{"hw_version": groupName}, | ||
| AvailabilityZone: az, | ||
| Pipeline: c.totalPipeline, | ||
| }) | ||
| if err != nil { | ||
| return 0, fmt.Errorf("scheduler call failed: %w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Lets have another iteration about this in the tech sync. It feels very wrong to call our external scheduler api for something like that. We should be able to directly use the pipeline here
There was a problem hiding this comment.
Decision controller + decision crd? ;)
There was a problem hiding this comment.
apiVersion: cortex.cloud/v1alpha1
kind: Decision
metadata:
name: reservation-candidates-<uuid>
spec:
constraints:
affinityRules:
# Reservation on same host as reference vm?
# (for failover)
- avoid: {vm: {uuid: "<vmuuid>"}}
flavor:
name: "hana-small"
extraSpecs: {} # ...
resources:
memory: "1Gi"
cpu: "128"
# After reconciliation by decision controller
status:
candidates: ["host-1", "host-2", "host-3"]
conditions:
- type: "Ready"
status: "True"|
Can you add a description of your changes to the PR? |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (1)
internal/scheduling/reservations/commitments/capacity.go (1)
55-59:⚠️ Potential issue | 🟠 MajorUse Hypervisor CRDs as AZ source-of-truth, not HostDetails Knowledge.
Line 55 and Lines 221-273 still couple AZ discovery to
sap_host_details_extractor. For this KVM path, that can drift or go empty independently of hypervisors and produce incompletePerAZoutput. You already fetchhv1.HypervisorListon Line 62; derive AZs from Hypervisor labels in the same pass and drop the HostDetails dependency here.#!/bin/bash # Verify current AZ derivation path and whether Hypervisor AZ labels are used in this file. rg -n "getAvailabilityZones|getHostAZMap|sap_host_details_extractor|topology.kubernetes.io/zone|HypervisorList" internal/scheduling/reservations/commitments/capacity.goAlso applies to: 221-273
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/capacity.go` around lines 55 - 59, Replace the HostDetails-based AZ discovery with hypervisor-label-derived AZs: remove the call to getAvailabilityZones and any usage of sap_host_details_extractor in capacity.go and instead iterate the existing hv1.HypervisorList (the same pass that fetches hypervisors) to build the AZ set from labels like topology.kubernetes.io/zone (or the agreed Hypervisor label key), then use that AZ map for PerAZ output and anywhere currently relying on getHostAZMap; ensure all code paths that previously used HostDetails (including the block around lines 221-273) consume this new hypervisor-derived AZ map.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/capacity.go`:
- Around line 133-143: The loop over azs currently swallows AZ-level errors from
c.calculateInstanceCapacity and writes zero-capacity reports for each failure;
change this so that you still log and write a zero-capacity report for
individual AZ failures, but if every AZ call fails (i.e., all iterations
returned an error) the parent function returns an error instead of returning a
successful report of all zeros. Implement this by tracking success/failure
counts inside the loop that calls c.calculateInstanceCapacity (referenced
symbol: c.calculateInstanceCapacity), preserving the existing
result[liquid.AvailabilityZone(az)] = &liquid.AZResourceCapacityReport{...}
behavior for single AZ errors, and after the loop check if failures == len(azs)
and return a propagated error (with context) rather than the result map; keep
partial-zero behavior only when at least one AZ succeeded.
- Line 225: The error message in the fmt.Errorf call currently uses "Knowledge"
with an uppercase K; change the string to all-lowercase (e.g., "failed to list
knowledge CRDs: %w") to satisfy lint rules — update the fmt.Errorf invocation in
capacity.go (the return statement that reads fmt.Errorf("failed to list
Knowledge CRDs: %w", err)) so the word "knowledge" is lowercase.
- Around line 66-69: The capacity calculation silently skips scheduler hosts
that don't match Hypervisor CRD names in the hvByName map; update the lookup
block where hvByName[host] is used to log a warning when a host is not found,
including the host identifier, availability zone (az) and flavor group context
(e.g., flavorGroup variable) so operators can detect naming/configuration
mismatches; use the package logger (or existing logger variable) to emit a
structured message with host, az, flavorGroup and any relevant hypervisor list
context before continuing.
---
Duplicate comments:
In `@internal/scheduling/reservations/commitments/capacity.go`:
- Around line 55-59: Replace the HostDetails-based AZ discovery with
hypervisor-label-derived AZs: remove the call to getAvailabilityZones and any
usage of sap_host_details_extractor in capacity.go and instead iterate the
existing hv1.HypervisorList (the same pass that fetches hypervisors) to build
the AZ set from labels like topology.kubernetes.io/zone (or the agreed
Hypervisor label key), then use that AZ map for PerAZ output and anywhere
currently relying on getHostAZMap; ensure all code paths that previously used
HostDetails (including the block around lines 221-273) consume this new
hypervisor-derived AZ map.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 91643779-45a4-40ec-b2aa-eafec87980ac
📒 Files selected for processing (4)
helm/bundles/cortex-nova/templates/pipelines_kvm.yamlinternal/scheduling/reservations/commitments/api_report_capacity_test.gointernal/scheduling/reservations/commitments/capacity.gointernal/scheduling/reservations/commitments/config.go
🚧 Files skipped from review as they are similar to previous changes (3)
- helm/bundles/cortex-nova/templates/pipelines_kvm.yaml
- internal/scheduling/reservations/commitments/config.go
- internal/scheduling/reservations/commitments/api_report_capacity_test.go
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 3 unresolved review comments. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 1 file(s) based on 3 unresolved review comments. Co-authored-by: CodeRabbit <noreply@coderabbit.ai>
Test Coverage ReportTest Coverage 📊: 68.9% |
No description provided.