Skip to content

Commit b82a0ea

Browse files
committed
PCI: hv: Only reuse existing IRTE allocation for Multi-MSI
Jeffrey's 4 recent patches added Multi-MSI support to the pci-hyperv driver. Unluckily, one of the patches, i.e., b4b7777, causes a regression to a fio test for the Azure VM SKU Standard L64s v2 (64 AMD vCPUs, 8 NVMe drives): when fio runs against all the 8 NVMe drives, it runs fine with a low io-depth (e.g., 2 or 4); when fio runs with a high io-depth (e.g., 256), somehow queue-29 of each NVMe drive suddenly no longer receives any interrupts, and the NVMe core code has to abort the queue after a timeout of 30 seconds, and then queue-29 starts to receive interrupts again for several seconds, and later queue-29 no longer receives interrupts again, and this pattern repeats: [ 223.891249] nvme nvme2: I/O 320 QID 29 timeout, aborting [ 223.896231] nvme nvme0: I/O 320 QID 29 timeout, aborting [ 223.898340] nvme nvme4: I/O 832 QID 29 timeout, aborting [ 259.471309] nvme nvme2: I/O 320 QID 29 timeout, aborting [ 259.476493] nvme nvme0: I/O 321 QID 29 timeout, aborting [ 259.482967] nvme nvme0: I/O 322 QID 29 timeout, aborting Some other symptoms are: the throughput of the NVMe drives drops due to commit b4b7777. When the fio test is running, the kernel prints some soft lock-up messages from time to time. Commit b4b7777 itself looks good, and at the moment it's unclear where the issue is. While the issue is being investigated, restore the old behavior in hv_compose_msi_msg(), i.e., don't reuse the existing IRTE allocation for single-MSI and MSI-X. This is a stopgap for the above NVMe issue. *** Note *** As of 11:30 8/9/2022 PDT, the patch has not been accepted into the upstream. Fixes: b4b7777 ("PCI: hv: Reuse existing IRTE allocation in compose_msi_msg()") Link: https://lwn.net/ml/linux-kernel/20220804025104.15673-1-decui@microsoft.com/ Signed-off-by: Dexuan Cui <decui@microsoft.com> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com> Cc: Carl Vanderlip <quic_carlv@quicinc.com>
1 parent e5df656 commit b82a0ea

1 file changed

Lines changed: 19 additions & 4 deletions

File tree

drivers/pci/controller/pci-hyperv.c

Lines changed: 19 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1682,6 +1682,7 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
16821682
struct compose_comp_ctxt comp;
16831683
struct tran_int_desc *int_desc;
16841684
struct msi_desc *msi_desc;
1685+
bool multi_msi;
16851686
u8 vector, vector_count;
16861687
struct {
16871688
struct pci_packet pci_pkt;
@@ -1695,16 +1696,23 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
16951696
u32 size;
16961697
int ret;
16971698

1698-
/* Reuse the previous allocation */
1699-
if (data->chip_data) {
1699+
msi_desc = irq_data_get_msi_desc(data);
1700+
multi_msi = !msi_desc->msi_attrib.is_msix &&
1701+
msi_desc->nvec_used > 1;
1702+
/*
1703+
* Reuse the previous allocation for Multi-MSI. This is required for
1704+
* Multi-MSI and is optional for single-MSI and MSI-X. Note: for now,
1705+
* don't reuse the previous allocation for MSI-X because this causes
1706+
* unreliable interrupt delivery for some NVMe devices.
1707+
*/
1708+
if (data->chip_data && multi_msi) {
17001709
int_desc = data->chip_data;
17011710
msg->address_hi = int_desc->address >> 32;
17021711
msg->address_lo = int_desc->address & 0xffffffff;
17031712
msg->data = int_desc->data;
17041713
return;
17051714
}
17061715

1707-
msi_desc = irq_data_get_msi_desc(data);
17081716
pdev = msi_desc_to_pci_dev(msi_desc);
17091717
dest = irq_data_get_effective_affinity_mask(data);
17101718
pbus = pdev->bus;
@@ -1714,11 +1722,18 @@ static void hv_compose_msi_msg(struct irq_data *data, struct msi_msg *msg)
17141722
if (!hpdev)
17151723
goto return_null_message;
17161724

1725+
/* Free any previous message that might have already been composed. */
1726+
if (data->chip_data && !multi_msi) {
1727+
int_desc = data->chip_data;
1728+
data->chip_data = NULL;
1729+
hv_int_desc_free(hpdev, int_desc);
1730+
}
1731+
17171732
int_desc = kzalloc(sizeof(*int_desc), GFP_ATOMIC);
17181733
if (!int_desc)
17191734
goto drop_reference;
17201735

1721-
if (!msi_desc->msi_attrib.is_msix && msi_desc->nvec_used > 1) {
1736+
if (multi_msi) {
17221737
/*
17231738
* If this is not the first MSI of Multi MSI, we already have
17241739
* a mapping. Can exit early.

0 commit comments

Comments
 (0)