Skip to content

Commit 03cbc73

Browse files
Wanpeng LiIngo Molnar
authored andcommitted
sched/cputime: Resync steal time when guest & host lose sync
Commit: 5743021 ("sched/cputime: Count actually elapsed irq & softirq time") ... fixed a bug but also triggered a regression: On an i5 laptop, 4 pCPUs, 4vCPUs for one full dynticks guest, there are four CPU hog processes(for loop) running in the guest, I hot-unplug the pCPUs on host one by one until there is only one left, then observe CPU utilization via 'top' in the guest, it shows: 100% st for cpu0(housekeeping) 75% st for other CPUs (nohz full mode) However, w/o this commit it shows the correct 75% for all four CPUs. When a guest is interrupted for a longer amount of time, missed clock ticks are not redelivered later. Because of that, we should not limit the amount of steal time accounted to the amount of time that the calling functions think have passed. However, the interval returned by account_other_time() is NOT rounded down to the nearest jiffy, while the base interval in get_vtime_delta() it is subtracted from is, so the max cputime limit is required to avoid underflow. This patch fixes the regression by limiting the account_other_time() from get_vtime_delta() to avoid underflow, and lets the other three call sites (in account_other_time() and steal_account_process_time()) account however much steal time the host told us elapsed. Suggested-by: Rik van Riel <riel@redhat.com> Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krcmar <rkrcmar@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kvm@vger.kernel.org Link: http://lkml.kernel.org/r/1471399546-4069-1-git-send-email-wanpeng.li@hotmail.com [ Improved the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
1 parent 173be9a commit 03cbc73

File tree

1 file changed

+15
-3
lines changed

1 file changed

+15
-3
lines changed

kernel/sched/cputime.c

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,11 @@ void account_idle_time(cputime_t cputime)
263263
cpustat[CPUTIME_IDLE] += (__force u64) cputime;
264264
}
265265

266+
/*
267+
* When a guest is interrupted for a longer amount of time, missed clock
268+
* ticks are not redelivered later. Due to that, this function may on
269+
* occasion account more time than the calling functions think elapsed.
270+
*/
266271
static __always_inline cputime_t steal_account_process_time(cputime_t maxtime)
267272
{
268273
#ifdef CONFIG_PARAVIRT
@@ -371,7 +376,7 @@ static void irqtime_account_process_tick(struct task_struct *p, int user_tick,
371376
* idle, or potentially user or system time. Due to rounding,
372377
* other time can exceed ticks occasionally.
373378
*/
374-
other = account_other_time(cputime);
379+
other = account_other_time(ULONG_MAX);
375380
if (other >= cputime)
376381
return;
377382
cputime -= other;
@@ -486,7 +491,7 @@ void account_process_tick(struct task_struct *p, int user_tick)
486491
}
487492

488493
cputime = cputime_one_jiffy;
489-
steal = steal_account_process_time(cputime);
494+
steal = steal_account_process_time(ULONG_MAX);
490495

491496
if (steal >= cputime)
492497
return;
@@ -516,7 +521,7 @@ void account_idle_ticks(unsigned long ticks)
516521
}
517522

518523
cputime = jiffies_to_cputime(ticks);
519-
steal = steal_account_process_time(cputime);
524+
steal = steal_account_process_time(ULONG_MAX);
520525

521526
if (steal >= cputime)
522527
return;
@@ -699,6 +704,13 @@ static cputime_t get_vtime_delta(struct task_struct *tsk)
699704
unsigned long now = READ_ONCE(jiffies);
700705
cputime_t delta, other;
701706

707+
/*
708+
* Unlike tick based timing, vtime based timing never has lost
709+
* ticks, and no need for steal time accounting to make up for
710+
* lost ticks. Vtime accounts a rounded version of actual
711+
* elapsed time. Limit account_other_time to prevent rounding
712+
* errors from causing elapsed vtime to go negative.
713+
*/
702714
delta = jiffies_to_cputime(now - tsk->vtime_snap);
703715
other = account_other_time(delta);
704716
WARN_ON_ONCE(tsk->vtime_snap_whence == VTIME_INACTIVE);

0 commit comments

Comments
 (0)