Skip to content

Commit 0107042

Browse files
drakenclimberjohnstultz-work
authored andcommitted
sysrq: Reset the watchdog timers while displaying high-resolution timers
On systems with a large number of CPUs, running sysrq-<q> can cause watchdog timeouts. There are two slow sections of code in the sysrq-<q> path in timer_list.c. 1. print_active_timers() - This function is called by print_cpu() and contains a slow goto loop. On a machine with hundreds of CPUs, this loop took approximately 100ms for the first CPU in a NUMA node. (Subsequent CPUs in the same node ran much quicker.) The total time to print all of the CPUs is ultimately long enough to trigger the soft lockup watchdog. 2. print_tickdevice() - This function outputs a large amount of textual information. This function also took approximately 100ms per CPU. Since sysrq-<q> is not a performance critical path, there should be no harm in touching the nmi watchdog in both slow sections above. Touching it in just one location was insufficient on systems with hundreds of CPUs as occasional timeouts were still observed during testing. This issue was observed on an Oracle T7 machine with 128 CPUs, but I anticipate it may affect other systems with similarly large numbers of CPUs. Signed-off-by: Tom Hromatka <tom.hromatka@oracle.com> Reviewed-by: Rob Gardner <rob.gardner@oracle.com> Signed-off-by: John Stultz <john.stultz@linaro.org>
1 parent 1b8955b commit 0107042

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

kernel/time/timer_list.c

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@
1616
#include <linux/sched.h>
1717
#include <linux/seq_file.h>
1818
#include <linux/kallsyms.h>
19+
#include <linux/nmi.h>
1920

2021
#include <linux/uaccess.h>
2122

@@ -86,6 +87,9 @@ print_active_timers(struct seq_file *m, struct hrtimer_clock_base *base,
8687

8788
next_one:
8889
i = 0;
90+
91+
touch_nmi_watchdog();
92+
8993
raw_spin_lock_irqsave(&base->cpu_base->lock, flags);
9094

9195
curr = timerqueue_getnext(&base->active);
@@ -197,6 +201,8 @@ print_tickdevice(struct seq_file *m, struct tick_device *td, int cpu)
197201
{
198202
struct clock_event_device *dev = td->evtdev;
199203

204+
touch_nmi_watchdog();
205+
200206
SEQ_printf(m, "Tick Device: mode: %d\n", td->mode);
201207
if (cpu < 0)
202208
SEQ_printf(m, "Broadcast device\n");

0 commit comments

Comments
 (0)