Skip to content

stm32/profiler.c: Add gprof-compatible profiling for STM32. #5926

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jimmo
Copy link
Member

@jimmo jimmo commented Apr 15, 2020

Possibly useful idea from experiments during a long weekend in lockdown... tidyed up into PR form.

This allows you to instrument code for profiling with GCC's -pg flag, and analyse the results using gprof. This gives you periodic (1kHz) instruction sampling, and call count and call graph data.

This implements the very minimum to collect call arc data and the sampling histogram and save it to a file in flash. Tested on F4.

Obviously this is very easy to do on Unix, but the results are so noisy it can be hard to really understand. And optimising for x86 can be very different to optimising for ARM, so getting the "truth" from the real device is useful.

The main limitation is to avoid using too much RAM, the instruction sampling is a bit coarse, so occasionally will report the wrong function. (This only applies to caller and time, not to callee count which is accurate).

Credit and many thanks to @ErichStyger for this excellent post that explains much of the background required to implement this https://mcuoneclipse.com/2015/08/23/tutorial-using-gnu-profiling-gprof-with-arm-cortex-m/

import profiler
profiler.active(True)
# your code here
profiler.write()

Then

pyboard --device /dev/ttyACM1 -f cp :gmon.out .
gprof build-BOARD/firmware.elf gmon.out

gprof output (of a program that does a lot of string creation and hits the O(n) QSTR pool lookup)

Flat profile:

Each sample counts as 0.001 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 68.27      1.24     1.24    10412     0.12     0.12  qstr_find_strn
  7.80      1.39     0.14                             _mcount_internal
  3.83      1.46     0.07                             __gnu_mcount_nc
  2.95      1.51     0.05    54681     0.00     0.00  mp_map_lookup
  2.14      1.55     0.04     5206     0.01     0.02  mp_execute_bytecode
  1.81      1.58     0.03                             profiler_write
  1.28      1.60     0.02                             HAL_InitTick
  0.92      1.62     0.02    18228     0.00     0.00  gc_alloc
  0.54      1.63     0.01                             memset
  0.47      1.64     0.01                             mp_obj_str_format_helper
  0.45      1.65     0.01    18221     0.00     0.00  qstr_compute_hash
  0.41      1.65     0.01    26036     0.00     0.00  mp_load_global
  0.38      1.66     0.01    13020     0.00     0.00  module_attr
  0.34      1.67     0.01    23434     0.00     0.00  mp_call_function_n_kw
  0.33      1.67     0.01    54677     0.00     0.00  mp_obj_get_type
  0.33      1.68     0.01        3     2.00     2.00  gc_collect_end
  0.30      1.69     0.01     7810     0.00     0.00  mp_binary_op
  0.27      1.69     0.01     5206     0.00     0.00  mp_setup_code_state
  0.27      1.70     0.01     2603     0.00     0.00  mp_obj_str_binary_op
  0.27      1.70     0.01     2603     0.00     0.00  uni_print
  0.24      1.70     0.00    44286     0.00     0.00  find_qstr
  0.24      1.71     0.00    10412     0.00     0.00  mp_obj_new_str_from_vstr
  0.23      1.71     0.00    10413     0.00     0.00  mp_stack_check
  0.21      1.72     0.00    15622     0.00     0.00  mp_load_method
  0.21      1.72     0.00     5208     0.00     0.00  nlr_push
  0.21      1.72     0.00     5206     0.00     0.00  gc_realloc
  0.16      1.73     0.00                             mp_int_format_size
  0.16      1.73     0.00                             qstr_len
  0.16      1.73     0.00    15622     0.00     0.00  mp_call_method_n_kw
  0.16      1.74     0.00     7813     0.00     0.00  mp_obj_dict_store
  0.16      1.74     0.00                             mp_unary_op
  0.15      1.74     0.00    36477     0.00     0.00  qstr_hash
  0.14      1.74     0.00     7809     0.00     0.00  vstr_init_print
  0.14      1.75     0.00     5206     0.00     0.00  m_realloc
  0.14      1.75     0.00                             vstr_add_strn
  0.14      1.75     0.00    20830     0.00     0.00  mp_load_name
  0.14      1.75     0.00                             gc_dump_alloc_table
  0.13      1.76     0.00                             mp_obj_int_formatted
  0.13      1.76     0.00                             qstr_from_str
  0.12      1.76     0.00     5208     0.00     0.00  nlr_push_tail
  0.12      1.76     0.00                             gc_info
  0.12      1.77     0.00                             mp_map_clear
  0.12      1.77     0.00                             memmove
  0.11      1.77     0.00     7809     0.00     0.00  arg_looks_integer
  0.11      1.77     0.00     2654     0.00     0.00  gc_mark_subtree
  0.11      1.77     0.00                             mp_identity
  0.11      1.78     0.00                             slice_indices
  0.10      1.78     0.00     5208     0.00     0.00  nlr_pop
  0.10      1.78     0.00     2603     0.00     0.00  mp_obj_str_get_data
  0.10      1.78     0.00                             qstr_init
  0.09      1.78     0.00                             memcpy
  0.08      1.79     0.00     2603     0.00     0.00  mp_obj_str_format
  0.08      1.79     0.00     7812     0.00     0.00  fun_builtin_0_call
  0.07      1.79     0.00                             nlr_jump
  0.06      1.79     0.00    18221     0.00     0.00  vstr_ensure_extra
  0.06      1.79     0.00                             memcmp
  0.06      1.79     0.00                             mp_import_name
  0.06      1.79     0.00                             mp_obj_dict_len
  0.06      1.79     0.00                             vstr_null_terminated_str
  0.06      1.79     0.00     2603     0.00     0.00  mp_obj_int_print
  0.05      1.80     0.00    15624     0.00     0.00  mp_load_method_maybe
  0.05      1.80     0.00     5207     0.00     0.00  mp_obj_print_helper
  0.05      1.80     0.00     5206     0.00     0.00  fun_bc_call
  0.05      1.80     0.00     2603     0.00     0.00  fun_builtin_var_call
  0.05      1.80     0.00     2603     0.00     0.00  mp_obj_str_make_new
  0.05      1.80     0.00                             float_binary_op
  0.05      1.80     0.00                             module_print
  0.05      1.80     0.00                             mp_obj_dict_delete
  0.05      1.80     0.00     7812     0.00     0.00  mp_store_name
  0.05      1.80     0.00                             strlen
  0.04      1.81     0.00                             qstr_str
  0.04      1.81     0.00    18228     0.00     0.00  m_malloc
  0.04      1.81     0.00                             mp_obj_len_maybe
  0.04      1.81     0.00                             time_ticks_ms
  0.04      1.81     0.00                             mp_call_function_2
  0.03      1.81     0.00        3     0.21     0.21  fun_builtin_1_call
  0.03      1.81     0.00                             mp_obj_instance_attr
  0.03      1.81     0.00                             uart_wait_flag_set
  0.03      1.81     0.00    10412     0.00     0.00  vstr_init
  0.03      1.81     0.00     7809     0.00     0.00  qstr_data
  0.03      1.81     0.00     5207     0.00     0.00  fun_builtin_2_call
  0.03      1.81     0.00    10412     0.00     0.00  vstr_add_byte
  0.03      1.81     0.00                             mp_obj_int_binary_op_extra_cases
  0.03      1.81     0.00                             mp_obj_new_str_via_qstr
  0.03      1.81     0.00                             mp_raise_ValueError
  0.03      1.81     0.00                             mp_stack_set_limit
  0.03      1.81     0.00                             snprintf
  0.03      1.82     0.00                             systick_wait_at_least
  0.02      1.82     0.00     2603     0.00     0.00  vstr_init_len
  0.02      1.82     0.00                             fun_builtin_3_call
  0.02      1.82     0.00    10412     0.00     0.00  vstr_add_len
  0.02      1.82     0.00     2603     0.00     0.00  type_call
  0.02      1.82     0.00     2603     0.00     0.00  vstr_clear
  0.02      1.82     0.00                             mp_hal_ticks_ms
  0.02      1.82     0.00                             mp_make_raise_obj
  0.02      1.82     0.00                             mp_stack_usage
  0.02      1.82     0.00                             pyb_spi_make_new
  0.02      1.82     0.00                             qstr_from_strn
  0.02      1.82     0.00                             vstr_ins_blank_bytes.constprop.0
  0.01      1.82     0.00     2604     0.00     0.00  mp_obj_is_true
  0.01      1.82     0.00                             gc_nbytes
  0.01      1.82     0.00                             mp_obj_str_get_str
  0.01      1.82     0.00                             time_ticks_us
  0.01      1.82     0.00        4     0.05     0.98  m_malloc0
  0.01      1.82     0.00                             m_realloc_maybe
  0.01      1.82     0.00                             mp_delete_name
  0.01      1.82     0.00                             mp_hal_ticks_us
  0.01      1.82     0.00                             str_join
  0.00      1.82     0.00    18228     0.00     0.00  mp_arg_check_num_sig
  0.00      1.82     0.00     7813     0.00     0.00  mp_ensure_not_fixed
  0.00      1.82     0.00     5206     0.00     0.00  arg_looks_numeric
  0.00      1.82     0.00     2606     0.00     0.00  gc_free
  0.00      1.82     0.00     2606     0.00     0.00  m_free
  0.00      1.82     0.00     2605     0.00     0.00  mp_convert_member_lookup
  0.00      1.82     0.00     2603     0.00     0.00  mp_map_init_fixed_table
  0.00      1.82     0.00        3     0.00     0.00  gc_collect_root
  0.00      1.82     0.00        3     0.00     0.00  gc_collect_start
  0.00      1.82     0.00        3     0.00     0.00  mp_map_rehash
  0.00      1.82     0.00        2     0.00     0.00  mp_call_function_1
  0.00      1.82     0.00        2     0.00     0.00  mp_call_function_1_protected
  0.00      1.82     0.00        2     0.00     0.00  mp_make_function_from_raw_code
  0.00      1.82     0.00        2     0.00     0.00  mp_obj_new_fun_bc
  0.00      1.82     0.00        2     0.00     0.00  mp_sched_lock
  0.00      1.82     0.00        2     0.00     0.00  mp_sched_unlock
  0.00      1.82     0.00        2     0.00     0.00  mp_stream_close
  0.00      1.82     0.00        2     0.00     0.00  mp_stream_rw
  0.00      1.82     0.00        1     0.00     0.00  bool_print
  0.00      1.82     0.00        1     0.00     0.00  mp_builtin___repl_print__
  0.00      1.82     0.00        1     0.00     0.00  mp_map_init
  0.00      1.82     0.00        1     0.00     0.00  mp_obj_new_dict
...

This allows you to profile code from py/*.c with GCC's `-pg` flag, and analyse the results using gprof.
@tve
Copy link
Contributor

tve commented Apr 15, 2020

Very cool! How much here is actually ARM-specific? Could profile.c be placed in a port-independent place so it can be reused?

@jimmo
Copy link
Member Author

jimmo commented Apr 15, 2020

Technically, just the __gnu_mcount_nc function and the way the systick handler gets the PC. As you can see in the header of profiler.c, this originally comes from Cygwin, so definitely not ARM or even embedded-specific (but it's pretty much a complete rewrite). And both parts (call graph and instruction sampling) work completely independently.

tannewt added a commit to tannewt/circuitpython that referenced this pull request Jan 26, 2022
Everything else should raise NotImplementedError.

First step in micropython#5926
tannewt added a commit to tannewt/circuitpython that referenced this pull request Jan 28, 2022
Services and characteristics still won't work.

Progress on micropython#5926
tannewt added a commit to tannewt/circuitpython that referenced this pull request Feb 10, 2022
This allows you to connect to GATT services on the other device.
It also adds connection initiation (GAP central).

More progress on micropython#5926
tannewt added a commit to tannewt/circuitpython that referenced this pull request Feb 10, 2022
This allows you to connect to GATT services on the other device.
It also adds connection initiation (GAP central).

More progress on micropython#5926
@projectgus
Copy link
Contributor

This is an automated heads-up that we've just merged a Pull Request
that removes the STATIC macro from MicroPython's C API.

See #13763

A search suggests this PR might apply the STATIC macro to some C code. If it
does, then next time you rebase the PR (or merge from master) then you should
please replace all the STATIC keywords with static.

Although this is an automated message, feel free to @-reply to me directly if
you have any questions about this.

tannewt added a commit to tannewt/circuitpython that referenced this pull request May 3, 2024
Enable BLE where we can. Switch 4MB, non-USB board default partitioning
over to a single 2MB firmware bank. Old boards override this setting to
keep the same behavior.

This also adds alpha support for the ESP32-C2 (aka ESP8684).

Fixes micropython#5926 and micropython#7170
tannewt added a commit to tannewt/circuitpython that referenced this pull request May 17, 2024
Enable BLE where we can. Switch 4MB, non-USB board default partitioning
over to a single 2MB firmware bank. Old boards override this setting to
keep the same behavior.

This also adds alpha support for the ESP32-C2 (aka ESP8684).

Fixes micropython#5926 and fixes micropython#7170
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants