qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How can I find problematic uses of error_report() with vrc?


From: Paolo Bonzini
Subject: Re: How can I find problematic uses of error_report() with vrc?
Date: Mon, 8 May 2023 12:41:52 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.10.0

On 5/8/23 10:32, Markus Armbruster wrote:
1. Find functions using Error

    Doesn't have to be perfect.  I have a simple Coccinelle script
    (appended) that spits out some 4400 functions.  I run it like

     $ spatch --sp-file find-error-fns.cocci --macro-file 
scripts/cocci-macro-file.h `git-grep -Fl 'Error **' \*.[ch]`

Nice. For the following tests I just did a "grep 'Error \*\*' include" and massaged it into a list, so I have substantially less information, but it was enough to find _something_.

2. Find call chains from these functions to error_report()

    I'm hoping vrc can do that for me.  How?

The good news is that's relatively easy. The bad news is that there are so many possible call chains that you'll often find _other_ refactorings to do first (typically very coccinelle-friendly stuff, as you will see).

Anyhow, let's start with vrc being already installed ("pip install ." is enough from a vrc checkout; right now it's tested with libclang 15) and a QEMU build directory where qemu-system-x86_64 has been built. Because QEMU is huge and most opportunities are in common code, using a single binary is easiest.

The attached "errorpp" file is my own small list of ~200 functions using "Error **". It adds to them a "label" called ErrorPP. Labels can be added manually or with __attribute__(annotate("label")).

Let's build vrc and run it from the QEMU build directory

$ meson setup +build
$ ninja -C+build
[17/17] Linking target vrc/cython_graph.cpython-311-x86_64-linux-gnu.so
$ meson devenv -C+build
$ cd ~/work/upstream/qemu/+build
$ python -m vrc
Loading compile_commands.json
(vrc) load */*.c.o
(vrc) source errorpp

I've seen occasional segfaults here on larger machines and I haven't yet confirmed why they happen. This BTW was the main reason why I haven't published vrc more widely, but now you're forcing my hand. Give me an hour or two and I will push the tentative fix to github. ;)

Most of this email is a story of false positives. If you want to see a little good stuff, search for "[here]". Because finding all paths is not doable, so let's start by finding all functions that may end up calling error_report. Here's our first query:

(vrc) paths [ErrorPP,"error_report":all_callers]

Pretty legible IMNSHO, but to be clear:

- Inside brackets, bare words are "labels" and function names are quoted. Outside brackets, function names can but need not be quoted. Seems weird but it's pretty handy actually.

- The colon starts a modifier, which is like a postfix operator applied to what comes before it. "all_callers" means to walk recursively up from error_report.

- The comma is an AND, so the printed functions are marked ErrorPP and able to reach all_callers.

The output is pretty long, but "socket_listen" jumped to me as something that should not have this issue.

(vrc) paths --limit 1 socket_listen [!ErrorPP]* error_report
error_report <- replay_write_error <- ... <- aio_poll <- bdrv_poll_co <- bdrv_debug_event <- qcow2_co_save_vmstate <- ... error_vreport <- error_exit <- qemu_mutex_unlock_impl <- monitor_cur <- socket_get_fd <- socket_listen

Here we start seeing the problem: there are many functions with an error path that ultimately leads to error_report in very roundabout ways.

error_exit is okay-ish, as it's more like an assertion failure, so let's prune it; I expect other calls to error_report to come from record-replay, so I'm cutting away that path as well using a regular expression.

(vrc) omit error_exit replay_write_error /^replay_/

Note that the arguments to "omit" can use the full syntax that was used above in "paths". You can see why it's handy to omit quotes outside brackets even if it makes the syntax a bit less orthogonal.

(vrc) paths --limit 1 socket_listen [!ErrorPP]* error_report
<no output>

Cool, those were all false positives.

Let's run the original query again and try another one

(vrc) paths [ErrorPP,"error_report":all_callers]
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- icount_get_raw_locked <- icount_get_raw <- qemu_clock_get_ns <- qemu_clock_get_ms <- mux_chr_write <- ChardevClass::chr_write <- qemu_chr_write_buffer <- qemu_chr_write <- qemu_chr_fe_write <- monitor_flush_locked <- monitor_puts <- monitor_vprintf <- error_vprintf <- vreport <- error_report_once_cond <- kvm_log_clear <- MemoryListener::log_clear <- memory_region_clear_dirty_bitmap <- cpu_physical_memory_test_and_clear_dirty <- cpu_physical_memory_clear_dirty_range <- qemu_ram_resize

Ok, once we're in "vreport" any nested calls are kinda irrelevant. Other functions that we can filter out are "error_printf" (called from warn_report_err) and error_propagate (which can call error_report when called with &error_fatal and friends)

(vrc) omit vreport error_printf error_propagate
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- kvm_log_clear_one_slot <- kvm_physical_log_clear <- kvm_log_clear <- MemoryListener::log_clear <- memory_region_clear_dirty_bitmap <- cpu_physical_memory_test_and_clear_dirty <- cpu_physical_memory_clear_dirty_range <- qemu_ram_resize

Hrm, this should probably be a warn_report. Probably applies to most cases were "error_report" isn't followed by a return/goto/exit. First example of a refactoring _other_ than the one you were looking for. Let's hide this one and, to keep the email short, the next one:

(vrc) omit kvm_log_clear_one_slot kvm_flush_coalesced_mmio_buffer
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- type_initialize <- object_class_get_parent <- object_property_iter_next <- object_property_del_all <- object_finalize <- object_unref <- memory_region_unref <- flatview_simplify <- generate_memory_topology <- flatviews_init <- flatviews_reset <- memory_region_transaction_commit <- memory_region_set_size <- qemu_ram_resize

Another issue: error_report used before exit() or abort() is _also_ a kind of assertion failure, similar to error_exit(). A similar issue occurs in object_initialize, object_dynamic_cast_assert, object_class_dynamic_class_assert.

(vrc) omit type_initialize object_initialize object_dynamic_cast_assert object_class_dynamic_class_assert
(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP]* error_report
error_report <- icount_get_raw_locked <- icount_get_raw <- qemu_clock_get_ns <- timerlist_deadline_ns <- timerlistgroup_deadline_ns <- aio_compute_timeout <- aio_poll <- bdrv_poll_co <- bdrv_flush <- qio_channel_block_close <- QIOChannelClass::io_close <- qio_channel_close <- qio_net_listener_disconnect <- qio_net_listener_finalize <- TypeInfo::instance_finalize <- TypeImpl::instance_finalize <- object_deinit <- object_finalize <- object_unref <- qdev_init_clocklist <- qdev_init_clock_in <- systick_instance_init <- TypeInfo::instance_init <- TypeImpl::instance_init <- object_init_with_type <- object_initialize_with_type <- object_new_with_type <- object_new <- container_get <- memory_region_do_init <- memory_region_init <- memory_region_init_io <- subpage_init <- register_subpage <- flatview_add_to_dispatch <- generate_memory_topology <- flatviews_init <- flatviews_reset <- memory_region_transaction_commit <- memory_region_set_size <- qemu_ram_resize

Ouch, this is getting... a bit specific. Let's just cut out function pointers for simplicity:

(vrc) paths --limit 1 qemu_ram_resize [!ErrorPP,function_pointer]* error_report
<no output>

Okay, so this one was a false positive as well.

So you can see the good and the bad here. The tool is powerful and finds what you asked. The problem is that there's _a lot_ of hay in which you have to find the needle. For coroutines it works bettr because we have already cleaned it up, you can get there but it takes some sweat.

[here]

Let's try a more precise (but also more restrictive) query that only has a single function that does not take Error** but calls error_report:

(vrc) paths [ErrorPP] [ErrorPP]* [!ErrorPP] error_report
error_report <- qemu_open_old <- qmp_chardev_open_file_source
error_report <- runstate_set <- qemu_system_wakeup_request
error_report <- machine_consume_memdev <- machine_run_board_init
error_report <- numa_complete_configuration <- machine_run_board_init
error_report <- egl_rendernode_init <- egl_init
error_report <- parse_numa_node <- set_numa_options

I checked parse_numa_node and numa_complete_configuration, and they're genuine issues.

Let's add a couple labels by hand to see if it finds your example:

(vrc) label ErrorPP qmp_migrate rdma_start_outgoing_migration qemu_rdma_source_init
(vrc) paths qmp_migrate [ErrorPP]* [!ErrorPP] error_report
error_report <- migrate_fd_connect <- rdma_start_outgoing_migration <- qmp_migrate error_report <- qemu_rdma_cleanup <- rdma_start_outgoing_migration <- qmp_migrate error_report <- qemu_rdma_resolve_host <- qemu_rdma_source_init <- rdma_start_outgoing_migration <- qmp_migrate error_report <- qemu_rdma_alloc_pd_cq <- qemu_rdma_source_init <- rdma_start_outgoing_migration <- qmp_migrate error_report <- qemu_rdma_cleanup <- qemu_rdma_source_init <- rdma_start_outgoing_migration <- qmp_migrate error_report <- qemu_rdma_reg_control <- qemu_rdma_source_init <- rdma_start_outgoing_migration <- qmp_migrate error_report <- qemu_rdma_connect <- rdma_start_outgoing_migration <- qmp_migrate

Mission accomplished. :)

Paolo

Here's my find-error-fns.cocci:

@r@
identifier fn, errp;
position p;
@@
  fn@p(..., Error **errp, ...)
  {
      ...
  }
@script:python@
fn << r.fn;
p << r.p;
@@
print(f'{p[0].file}:{p[0].line}:{p[0].column}:{fn}')

Attachment: errorpp
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]