Skip to content

Commit cdf072a

Browse files
committed
Merge tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt: "Major changes: - Changed location of tracing repo from personal git repo to: git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git - Added Masami Hiramatsu as co-maintainer - Updated MAINTAINERS file to separate out FTRACE as it is more than just TRACING. Minor changes: - Added Mark Rutland as FTRACE reviewer - Updated user_events to make it on its way to remove the BROKEN tag. The changes should now be acceptable but will run it through a cycle and hopefully we can remove the BROKEN tag next release. - Added filtering to eprobes - Added a delta time to the benchmark trace event - Have the histogram and filter callbacks called via a switch statement instead of indirect functions. This speeds it up to avoid retpolines. - Add a way to wake up ring buffer waiters waiting for the ring buffer to fill up to its watermark. - New ioctl() on the trace_pipe_raw file to wake up ring buffer waiters. - Wake up waiters when the ring buffer is disabled. A reader may block when the ring buffer is disabled, but if it was blocked when the ring buffer is disabled it should then wake up. Fixes: - Allow splice to read partially read ring buffer pages. This fixes splice never moving forward. - Fix inverted compare that made the "shortest" ring buffer wait queue actually the longest. - Fix a race in the ring buffer between resetting a page when a writer goes to another page, and the reader. - Fix ftrace accounting bug when function hooks are added at boot up before the weak functions are set to "disabled". - Fix bug that freed a user allocated snapshot buffer when enabling a tracer. - Fix possible recursive locks in osnoise tracer - Fix recursive locking direct functions - Other minor clean ups and fixes" * tag 'trace-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (44 commits) ftrace: Create separate entry in MAINTAINERS for function hooks tracing: Update MAINTAINERS to reflect new tracing git repo tracing: Do not free snapshot if tracer is on cmdline ftrace: Still disable enabled records marked as disabled tracing/user_events: Move pages/locks into groups to prepare for namespaces tracing: Add Masami Hiramatsu as co-maintainer tracing: Remove unused variable 'dups' MAINTAINERS: add myself as a tracing reviewer ring-buffer: Fix race between reset page and reading page tracing/user_events: Update ABI documentation to align to bits vs bytes tracing/user_events: Use bits vs bytes for enabled status page data tracing/user_events: Use refcount instead of atomic for ref tracking tracing/user_events: Ensure user provided strings are safely formatted tracing/user_events: Use WRITE instead of READ for io vector import tracing/user_events: Use NULL for strstr checks tracing: Fix spelling mistake "preapre" -> "prepare" tracing: Wake up waiters when tracing is disabled tracing: Add ioctl() to force ring buffer waiters to wake up tracing: Wake up ring buffer waiters on closing of the file ring-buffer: Add ring_buffer_wake_waiters() ...
2 parents dc55342 + 4f881a6 commit cdf072a

File tree

34 files changed

+1299
-486
lines changed

34 files changed

+1299
-486
lines changed

Documentation/trace/user_events.rst

Lines changed: 58 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,14 @@ dynamic_events is the same as the ioctl with the u: prefix applied.
2020

2121
Typically programs will register a set of events that they wish to expose to
2222
tools that can read trace_events (such as ftrace and perf). The registration
23-
process gives back two ints to the program for each event. The first int is the
24-
status index. This index describes which byte in the
23+
process gives back two ints to the program for each event. The first int is
24+
the status bit. This describes which bit in little-endian format in the
2525
/sys/kernel/debug/tracing/user_events_status file represents this event. The
26-
second int is the write index. This index describes the data when a write() or
26+
second int is the write index which describes the data when a write() or
2727
writev() is called on the /sys/kernel/debug/tracing/user_events_data file.
2828

29-
The structures referenced in this document are contained with the
30-
/include/uap/linux/user_events.h file in the source tree.
29+
The structures referenced in this document are contained within the
30+
/include/uapi/linux/user_events.h file in the source tree.
3131

3232
**NOTE:** *Both user_events_status and user_events_data are under the tracefs
3333
filesystem and may be mounted at different paths than above.*
@@ -38,18 +38,18 @@ Registering within a user process is done via ioctl() out to the
3838
/sys/kernel/debug/tracing/user_events_data file. The command to issue is
3939
DIAG_IOCSREG.
4040

41-
This command takes a struct user_reg as an argument::
41+
This command takes a packed struct user_reg as an argument::
4242

4343
struct user_reg {
4444
u32 size;
4545
u64 name_args;
46-
u32 status_index;
46+
u32 status_bit;
4747
u32 write_index;
4848
};
4949

5050
The struct user_reg requires two inputs, the first is the size of the structure
5151
to ensure forward and backward compatibility. The second is the command string
52-
to issue for registering. Upon success two outputs are set, the status index
52+
to issue for registering. Upon success two outputs are set, the status bit
5353
and the write index.
5454

5555
User based events show up under tracefs like any other event under the
@@ -111,15 +111,56 @@ in realtime. This allows user programs to only incur the cost of the write() or
111111
writev() calls when something is actively attached to the event.
112112

113113
User programs call mmap() on /sys/kernel/debug/tracing/user_events_status to
114-
check the status for each event that is registered. The byte to check in the
115-
file is given back after the register ioctl() via user_reg.status_index.
114+
check the status for each event that is registered. The bit to check in the
115+
file is given back after the register ioctl() via user_reg.status_bit. The bit
116+
is always in little-endian format. Programs can check if the bit is set either
117+
using a byte-wise index with a mask or a long-wise index with a little-endian
118+
mask.
119+
116120
Currently the size of user_events_status is a single page, however, custom
117121
kernel configurations can change this size to allow more user based events. In
118122
all cases the size of the file is a multiple of a page size.
119123

120-
For example, if the register ioctl() gives back a status_index of 3 you would
121-
check byte 3 of the returned mmap data to see if anything is attached to that
122-
event.
124+
For example, if the register ioctl() gives back a status_bit of 3 you would
125+
check byte 0 (3 / 8) of the returned mmap data and then AND the result with 8
126+
(1 << (3 % 8)) to see if anything is attached to that event.
127+
128+
A byte-wise index check is performed as follows::
129+
130+
int index, mask;
131+
char *status_page;
132+
133+
index = status_bit / 8;
134+
mask = 1 << (status_bit % 8);
135+
136+
...
137+
138+
if (status_page[index] & mask) {
139+
/* Enabled */
140+
}
141+
142+
A long-wise index check is performed as follows::
143+
144+
#include <asm/bitsperlong.h>
145+
#include <endian.h>
146+
147+
#if __BITS_PER_LONG == 64
148+
#define endian_swap(x) htole64(x)
149+
#else
150+
#define endian_swap(x) htole32(x)
151+
#endif
152+
153+
long index, mask, *status_page;
154+
155+
index = status_bit / __BITS_PER_LONG;
156+
mask = 1L << (status_bit % __BITS_PER_LONG);
157+
mask = endian_swap(mask);
158+
159+
...
160+
161+
if (status_page[index] & mask) {
162+
/* Enabled */
163+
}
123164

124165
Administrators can easily check the status of all registered events by reading
125166
the user_events_status file directly via a terminal. The output is as follows::
@@ -137,29 +178,18 @@ For example, on a system that has a single event the output looks like this::
137178

138179
Active: 1
139180
Busy: 0
140-
Max: 4096
181+
Max: 32768
141182

142183
If a user enables the user event via ftrace, the output would change to this::
143184

144185
1:test # Used by ftrace
145186

146187
Active: 1
147188
Busy: 1
148-
Max: 4096
149-
150-
**NOTE:** *A status index of 0 will never be returned. This allows user
151-
programs to have an index that can be used on error cases.*
152-
153-
Status Bits
154-
^^^^^^^^^^^
155-
The byte being checked will be non-zero if anything is attached. Programs can
156-
check specific bits in the byte to see what mechanism has been attached.
157-
158-
The following values are defined to aid in checking what has been attached:
159-
160-
**EVENT_STATUS_FTRACE** - Bit set if ftrace has been attached (Bit 0).
189+
Max: 32768
161190

162-
**EVENT_STATUS_PERF** - Bit set if perf has been attached (Bit 1).
191+
**NOTE:** *A status bit of 0 will never be returned. This allows user programs
192+
to have a bit that can be used on error cases.*
163193

164194
Writing Data
165195
------------

MAINTAINERS

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -8433,6 +8433,19 @@ L: platform-driver-x86@vger.kernel.org
84338433
S: Maintained
84348434
F: drivers/platform/x86/fujitsu-tablet.c
84358435

8436+
FUNCTION HOOKS (FTRACE)
8437+
M: Steven Rostedt <rostedt@goodmis.org>
8438+
M: Masami Hiramatsu <mhiramat@kernel.org>
8439+
R: Mark Rutland <mark.rutland@arm.com>
8440+
S: Maintained
8441+
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
8442+
F: Documentation/trace/ftrace*
8443+
F: kernel/trace/ftrace*
8444+
F: kernel/trace/fgraph.c
8445+
F: arch/*/*/*/*ftrace*
8446+
F: arch/*/*/*ftrace*
8447+
F: include/*/ftrace.h
8448+
84368449
FUNGIBLE ETHERNET DRIVERS
84378450
M: Dimitris Michailidis <dmichail@fungible.com>
84388451
L: netdev@vger.kernel.org
@@ -11422,7 +11435,7 @@ M: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
1142211435
M: "David S. Miller" <davem@davemloft.net>
1142311436
M: Masami Hiramatsu <mhiramat@kernel.org>
1142411437
S: Maintained
11425-
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
11438+
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
1142611439
F: Documentation/trace/kprobes.rst
1142711440
F: include/asm-generic/kprobes.h
1142811441
F: include/linux/kprobes.h
@@ -20771,14 +20784,11 @@ F: drivers/hwmon/pmbus/tps546d24.c
2077120784

2077220785
TRACING
2077320786
M: Steven Rostedt <rostedt@goodmis.org>
20774-
M: Ingo Molnar <mingo@redhat.com>
20787+
M: Masami Hiramatsu <mhiramat@kernel.org>
2077520788
S: Maintained
20776-
T: git git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace.git
20777-
F: Documentation/trace/ftrace.rst
20778-
F: arch/*/*/*/*ftrace*
20779-
F: arch/*/*/*ftrace*
20789+
T: git git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace.git
20790+
F: Documentation/trace/*
2078020791
F: fs/tracefs/
20781-
F: include/*/ftrace.h
2078220792
F: include/linux/trace*.h
2078320793
F: include/trace/
2078420794
F: kernel/trace/
@@ -20787,7 +20797,7 @@ F: tools/testing/selftests/ftrace/
2078720797

2078820798
TRACING MMIO ACCESSES (MMIOTRACE)
2078920799
M: Steven Rostedt <rostedt@goodmis.org>
20790-
M: Ingo Molnar <mingo@kernel.org>
20800+
M: Masami Hiramatsu <mhiramat@kernel.org>
2079120801
R: Karol Herbst <karolherbst@gmail.com>
2079220802
R: Pekka Paalanen <ppaalanen@gmail.com>
2079320803
L: linux-kernel@vger.kernel.org

arch/x86/include/asm/ftrace.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,6 @@
2323
#define HAVE_FUNCTION_GRAPH_RET_ADDR_PTR
2424

2525
#ifndef __ASSEMBLY__
26-
extern atomic_t modifying_ftrace_code;
2726
extern void __fentry__(void);
2827

2928
static inline unsigned long ftrace_call_adjust(unsigned long addr)

arch/x86/include/asm/kprobes.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,8 +50,6 @@ extern const int kretprobe_blacklist_size;
5050

5151
void arch_remove_kprobe(struct kprobe *p);
5252

53-
extern void arch_kprobe_override_function(struct pt_regs *regs);
54-
5553
/* Architecture specific copy of original instruction*/
5654
struct arch_specific_insn {
5755
/* copy of the original instruction */

arch/x86/kernel/kprobes/core.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -59,8 +59,6 @@
5959
DEFINE_PER_CPU(struct kprobe *, current_kprobe) = NULL;
6060
DEFINE_PER_CPU(struct kprobe_ctlblk, kprobe_ctlblk);
6161

62-
#define stack_addr(regs) ((unsigned long *)regs->sp)
63-
6462
#define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\
6563
(((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \
6664
(b4##UL << 0x4)|(b5##UL << 0x5)|(b6##UL << 0x6)|(b7##UL << 0x7) | \

include/linux/ftrace.h

Lines changed: 0 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1122,47 +1122,6 @@ static inline void unpause_graph_tracing(void) { }
11221122
#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
11231123

11241124
#ifdef CONFIG_TRACING
1125-
1126-
/* flags for current->trace */
1127-
enum {
1128-
TSK_TRACE_FL_TRACE_BIT = 0,
1129-
TSK_TRACE_FL_GRAPH_BIT = 1,
1130-
};
1131-
enum {
1132-
TSK_TRACE_FL_TRACE = 1 << TSK_TRACE_FL_TRACE_BIT,
1133-
TSK_TRACE_FL_GRAPH = 1 << TSK_TRACE_FL_GRAPH_BIT,
1134-
};
1135-
1136-
static inline void set_tsk_trace_trace(struct task_struct *tsk)
1137-
{
1138-
set_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace);
1139-
}
1140-
1141-
static inline void clear_tsk_trace_trace(struct task_struct *tsk)
1142-
{
1143-
clear_bit(TSK_TRACE_FL_TRACE_BIT, &tsk->trace);
1144-
}
1145-
1146-
static inline int test_tsk_trace_trace(struct task_struct *tsk)
1147-
{
1148-
return tsk->trace & TSK_TRACE_FL_TRACE;
1149-
}
1150-
1151-
static inline void set_tsk_trace_graph(struct task_struct *tsk)
1152-
{
1153-
set_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace);
1154-
}
1155-
1156-
static inline void clear_tsk_trace_graph(struct task_struct *tsk)
1157-
{
1158-
clear_bit(TSK_TRACE_FL_GRAPH_BIT, &tsk->trace);
1159-
}
1160-
1161-
static inline int test_tsk_trace_graph(struct task_struct *tsk)
1162-
{
1163-
return tsk->trace & TSK_TRACE_FL_GRAPH;
1164-
}
1165-
11661125
enum ftrace_dump_mode;
11671126

11681127
extern enum ftrace_dump_mode ftrace_dump_on_oops;

include/linux/ring_buffer.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ __ring_buffer_alloc(unsigned long size, unsigned flags, struct lock_class_key *k
101101
int ring_buffer_wait(struct trace_buffer *buffer, int cpu, int full);
102102
__poll_t ring_buffer_poll_wait(struct trace_buffer *buffer, int cpu,
103103
struct file *filp, poll_table *poll_table);
104-
104+
void ring_buffer_wake_waiters(struct trace_buffer *buffer, int cpu);
105105

106106
#define RING_BUFFER_ALL_CPUS -1
107107

include/linux/sched.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1390,9 +1390,6 @@ struct task_struct {
13901390
#endif
13911391

13921392
#ifdef CONFIG_TRACING
1393-
/* State flags for use by tracers: */
1394-
unsigned long trace;
1395-
13961393
/* Bitmask and counter of trace recursion: */
13971394
unsigned long trace_recursion;
13981395
#endif /* CONFIG_TRACING */

include/linux/trace_events.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,7 @@ struct trace_iterator {
9292
unsigned int temp_size;
9393
char *fmt; /* modified format holder */
9494
unsigned int fmt_size;
95+
long wait_index;
9596

9697
/* trace_seq for __print_flags() and __print_symbolic() etc. */
9798
struct trace_seq tmp_seq;

include/linux/user_events.h

Lines changed: 3 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,6 @@
2020
#define USER_EVENTS_SYSTEM "user_events"
2121
#define USER_EVENTS_PREFIX "u:"
2222

23-
/* Bits 0-6 are for known probe types, Bit 7 is for unknown probes */
24-
#define EVENT_BIT_FTRACE 0
25-
#define EVENT_BIT_PERF 1
26-
#define EVENT_BIT_OTHER 7
27-
28-
#define EVENT_STATUS_FTRACE (1 << EVENT_BIT_FTRACE)
29-
#define EVENT_STATUS_PERF (1 << EVENT_BIT_PERF)
30-
#define EVENT_STATUS_OTHER (1 << EVENT_BIT_OTHER)
31-
3223
/* Create dynamic location entry within a 32-bit value */
3324
#define DYN_LOC(offset, size) ((size) << 16 | (offset))
3425

@@ -45,12 +36,12 @@ struct user_reg {
4536
/* Input: Pointer to string with event name, description and flags */
4637
__u64 name_args;
4738

48-
/* Output: Byte index of the event within the status page */
49-
__u32 status_index;
39+
/* Output: Bitwise index of the event within the status page */
40+
__u32 status_bit;
5041

5142
/* Output: Index of the event to use when writing data */
5243
__u32 write_index;
53-
};
44+
} __attribute__((__packed__));
5445

5546
#define DIAG_IOC_MAGIC '*'
5647

0 commit comments

Comments
 (0)