[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How can I find problematic uses of error_report() with vrc?
|
From: |
Paolo Bonzini |
|
Subject: |
Re: How can I find problematic uses of error_report() with vrc? |
|
Date: |
Mon, 8 May 2023 12:41:52 +0200 |
|
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0 |
On 5/8/23 10:32, Markus Armbruster wrote:
1. Find functions using Error
Doesn't have to be perfect. I have a simple Coccinelle script
(appended) that spits out some 4400 functions. I run it like
$ spatch --sp-file find-error-fns.cocci --macro-file
scripts/cocci-macro-file.h `git-grep -Fl 'Error **' \*.[ch]`
Nice. For the following tests I just did a "grep 'Error \*\*' include"
and massaged it into a list, so I have substantially less information,
but it was enough to find _something_.
2. Find call chains from these functions to error_report()
I'm hoping vrc can do that for me. How?
The good news is that's relatively easy. The bad news is that there are
so many possible call chains that you'll often find _other_ refactorings
to do first (typically very coccinelle-friendly stuff, as you will see).
Anyhow, let's start with vrc being already installed ("pip install ." is
enough from a vrc checkout; right now it's tested with libclang 15) and
a QEMU build directory where qemu-system-x86_64 has been built. Because
QEMU is huge and most opportunities are in common code, using a single
binary is easiest.
The attached "errorpp" file is my own small list of ~200 functions using
"Error **". It adds to them a "label" called ErrorPP. Labels can be
added manually or with __attribute__(annotate("label")).
Let's build vrc and run it from the QEMU build directory
$ meson setup +build
$ ninja -C+build
[17/17] Linking target vrc/cython_graph.cpython-311-x86_64-linux-gnu.so
$ meson devenv -C+build
$ cd ~/work/upstream/qemu/+build
$ python -m vrc
Loading compile_commands.json
(vrc) load */*.c.o
(vrc) source errorpp
I've seen occasional segfaults here on larger machines and I haven't yet
confirmed why they happen. This BTW was the main reason why I haven't
published vrc more widely, but now you're forcing my hand. Give me an
hour or two and I will push the tentative fix to github. ;)
Most of this email is a story of false positives. If you want to see a
little good stuff, search for "[here]". Because finding all paths is
not doable, so let's start by finding all functions that may end up
calling error_report. Here's our first query:
(vrc) paths [ErrorPP,"error_report":all_callers]
Pretty legible IMNSHO, but to be clear:
- Inside brackets, bare words are "labels" and function names are
quoted. Outside brackets, function names can but need not be quoted.
Seems weird but it's pretty handy actually.
- The colon starts a modifier, which is like a postfix operator applied
to what comes before it. "all_callers" means to walk recursively up
from error_report.
- The comma is an AND, so the printed functions are marked ErrorPP and
able to reach all_callers.
The output is pretty long, but "socket_listen" jumped to me as something
that should not have this issue.
(vrc) paths --limit 1 socket_listen [!ErrorPP]* error_report
error_report <- replay_write_error <- ... <- aio_poll <- bdrv_poll_co <-
bdrv_debug_event <- qcow2_co_save_vmstate <- ... error_vreport <-
error_exit <- qemu_mutex_unlock_impl <- monitor_cur <- socket_get_fd <-
socket_listen
Here we start seeing the problem: there are many functions with an error
path that ultimately leads to error_report in very roundabout ways.
error_exit is okay-ish, as it's more like an assertion failure, so let's
prune it; I expect other calls to error_report to come from
record-replay, so I'm cutting away that path as well using a regular
expression.
(vrc) omit error_exit replay_write_error /^replay_/
Note that the arguments to "omit" can use the full syntax that was used
above in "paths". You can see why it's handy to omit quotes outside
brackets even if it makes the syntax a bit less orthogonal.
(vrc) paths --limit 1 socket_listen [!ErrorPP]* error_report
<no output>
Cool, those were all false positives.
Let's run the original query again and try another one
(vrc) paths [ErrorPP,"error_report":all_callers]
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- icount_get_raw_locked <- icount_get_raw <-
qemu_clock_get_ns <- qemu_clock_get_ms <- mux_chr_write <-
ChardevClass::chr_write <- qemu_chr_write_buffer <- qemu_chr_write <-
qemu_chr_fe_write <- monitor_flush_locked <- monitor_puts <-
monitor_vprintf <- error_vprintf <- vreport <- error_report_once_cond <-
kvm_log_clear <- MemoryListener::log_clear <-
memory_region_clear_dirty_bitmap <-
cpu_physical_memory_test_and_clear_dirty <-
cpu_physical_memory_clear_dirty_range <- qemu_ram_resize
Ok, once we're in "vreport" any nested calls are kinda irrelevant.
Other functions that we can filter out are "error_printf" (called from
warn_report_err) and error_propagate (which can call error_report when
called with &error_fatal and friends)
(vrc) omit vreport error_printf error_propagate
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- kvm_log_clear_one_slot <- kvm_physical_log_clear <-
kvm_log_clear <- MemoryListener::log_clear <-
memory_region_clear_dirty_bitmap <-
cpu_physical_memory_test_and_clear_dirty <-
cpu_physical_memory_clear_dirty_range <- qemu_ram_resize
Hrm, this should probably be a warn_report. Probably applies to most
cases were "error_report" isn't followed by a return/goto/exit. First
example of a refactoring _other_ than the one you were looking for.
Let's hide this one and, to keep the email short, the next one:
(vrc) omit kvm_log_clear_one_slot kvm_flush_coalesced_mmio_buffer
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- type_initialize <- object_class_get_parent <-
object_property_iter_next <- object_property_del_all <- object_finalize
<- object_unref <- memory_region_unref <- flatview_simplify <-
generate_memory_topology <- flatviews_init <- flatviews_reset <-
memory_region_transaction_commit <- memory_region_set_size <-
qemu_ram_resize
Another issue: error_report used before exit() or abort() is _also_ a
kind of assertion failure, similar to error_exit(). A similar issue
occurs in object_initialize, object_dynamic_cast_assert,
object_class_dynamic_class_assert.
(vrc) omit type_initialize object_initialize object_dynamic_cast_assert
object_class_dynamic_class_assert
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- icount_get_raw_locked <- icount_get_raw <-
qemu_clock_get_ns <- timerlist_deadline_ns <- timerlistgroup_deadline_ns
<- aio_compute_timeout <- aio_poll <- bdrv_poll_co <- bdrv_flush <-
qio_channel_block_close <- QIOChannelClass::io_close <-
qio_channel_close <- qio_net_listener_disconnect <-
qio_net_listener_finalize <- TypeInfo::instance_finalize <-
TypeImpl::instance_finalize <- object_deinit <- object_finalize <-
object_unref <- qdev_init_clocklist <- qdev_init_clock_in <-
systick_instance_init <- TypeInfo::instance_init <-
TypeImpl::instance_init <- object_init_with_type <-
object_initialize_with_type <- object_new_with_type <- object_new <-
container_get <- memory_region_do_init <- memory_region_init <-
memory_region_init_io <- subpage_init <- register_subpage <-
flatview_add_to_dispatch <- generate_memory_topology <- flatviews_init
<- flatviews_reset <- memory_region_transaction_commit <-
memory_region_set_size <- qemu_ram_resize
Ouch, this is getting... a bit specific. Let's just cut out function
pointers for simplicity:
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP,function_pointer]*
error_report
<no output>
Okay, so this one was a false positive as well.
So you can see the good and the bad here. The tool is powerful and
finds what you asked. The problem is that there's _a lot_ of hay in
which you have to find the needle. For coroutines it works bettr
because we have already cleaned it up, you can get there but it takes
some sweat.
[here]
Let's try a more precise (but also more restrictive) query that only has
a single function that does not take Error** but calls error_report:
(vrc) paths [ErrorPP] [ErrorPP]* [!ErrorPP] error_report
error_report <- qemu_open_old <- qmp_chardev_open_file_source
error_report <- runstate_set <- qemu_system_wakeup_request
error_report <- machine_consume_memdev <- machine_run_board_init
error_report <- numa_complete_configuration <- machine_run_board_init
error_report <- egl_rendernode_init <- egl_init
error_report <- parse_numa_node <- set_numa_options
I checked parse_numa_node and numa_complete_configuration, and they're
genuine issues.
Let's add a couple labels by hand to see if it finds your example:
(vrc) label ErrorPP qmp_migrate rdma_start_outgoing_migration
qemu_rdma_source_init
(vrc) paths qmp_migrate [ErrorPP]* [!ErrorPP] error_report
error_report <- migrate_fd_connect <- rdma_start_outgoing_migration <-
qmp_migrate
error_report <- qemu_rdma_cleanup <- rdma_start_outgoing_migration <-
qmp_migrate
error_report <- qemu_rdma_resolve_host <- qemu_rdma_source_init <-
rdma_start_outgoing_migration <- qmp_migrate
error_report <- qemu_rdma_alloc_pd_cq <- qemu_rdma_source_init <-
rdma_start_outgoing_migration <- qmp_migrate
error_report <- qemu_rdma_cleanup <- qemu_rdma_source_init <-
rdma_start_outgoing_migration <- qmp_migrate
error_report <- qemu_rdma_reg_control <- qemu_rdma_source_init <-
rdma_start_outgoing_migration <- qmp_migrate
error_report <- qemu_rdma_connect <- rdma_start_outgoing_migration <-
qmp_migrate
Mission accomplished. :)
Paolo
Here's my find-error-fns.cocci:
@r@
identifier fn, errp;
position p;
@@
fn@p(..., Error **errp, ...)
{
...
}
@script:python@
fn << r.fn;
p << r.p;
@@
print(f'{p[0].file}:{p[0].line}:{p[0].column}:{fn}')
errorpp
Description: Text document