Skip to content

Commit 99c19e6

Browse files
amlutoKAGA-KOKO
authored andcommitted
x86/vdso: Rearrange do_hres() to improve code generation
vgetcyc() is full of barriers, so fetching values out of the vvar page before vgetcyc() for use after vgetcyc() results in poor code generation. Put vgetcyc() first to avoid this problem. Also, pull the tv_sec division into the loop and put all the ts writes together. The old code wrote ts->tv_sec on each iteration before the syscall fallback check and then added in the offset afterwards, which forced the compiler to pointlessly copy base->sec to ts->tv_sec on each iteration. The new version seems to generate sensible code. Saves several cycles. With this patch applied, the result is faster than before the clock_gettime() rewrite. Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/3c05644d010b72216aa286a6d20b5078d5fae5cd.1538762487.git.luto@kernel.org
1 parent bcc4a62 commit 99c19e6

File tree

1 file changed

+8
-4
lines changed

1 file changed

+8
-4
lines changed

arch/x86/entry/vdso/vclock_gettime.c

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -142,23 +142,27 @@ notrace static inline u64 vgetcyc(int mode)
142142
notrace static int do_hres(clockid_t clk, struct timespec *ts)
143143
{
144144
struct vgtod_ts *base = &gtod->basetime[clk];
145-
u64 cycles, last, ns;
145+
u64 cycles, last, sec, ns;
146146
unsigned int seq;
147147

148148
do {
149149
seq = gtod_read_begin(gtod);
150-
ts->tv_sec = base->sec;
150+
cycles = vgetcyc(gtod->vclock_mode);
151151
ns = base->nsec;
152152
last = gtod->cycle_last;
153-
cycles = vgetcyc(gtod->vclock_mode);
154153
if (unlikely((s64)cycles < 0))
155154
return vdso_fallback_gettime(clk, ts);
156155
if (cycles > last)
157156
ns += (cycles - last) * gtod->mult;
158157
ns >>= gtod->shift;
158+
sec = base->sec;
159159
} while (unlikely(gtod_read_retry(gtod, seq)));
160160

161-
ts->tv_sec += __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
161+
/*
162+
* Do this outside the loop: a race inside the loop could result
163+
* in __iter_div_u64_rem() being extremely slow.
164+
*/
165+
ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns);
162166
ts->tv_nsec = ns;
163167

164168
return 0;

0 commit comments

Comments
 (0)