qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migrati


From: Gonglei (Arei)
Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
Date: Mon, 8 Aug 2016 13:05:10 +0000

Hi Dave,


> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:address@hidden
> Sent: Monday, August 08, 2016 8:13 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; address@hidden; address@hidden;
> address@hidden; Huangweidong (C); address@hidden;
> address@hidden
> Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> migration?
> 
> * Gonglei (Arei) (address@hidden) wrote:
> > Hi all,
> >
> > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> vcpus are started
> > before we invoke send_control_event(),which queue a message in the virtio
> ring and trigger a irq.
> >
> > In actually, we can't attach our initial intention under a special 
> > situation:
> >
> > 1. the process of migration destination is in a coroutine
> >  Commit 82a4da79fd6
> > 2. The corountine will yiled the cpu (calling 
> > yield_until_fd_readable(s->fd))
> >  if the Qemu encounter a EAGIN while reading QEMUFile.
> >  commit 595ab64169b
> >
> > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> >                                  size_t size)
> > {
> >     QEMUFileSocket *s = opaque;
> >     ssize_t len;
> >
> >     for (;;) {
> >         len = qemu_recv(s->fd, buf, size, 0);
> >         if (len != -1) {
> >             break;
> >         }
> >         if (socket_error() == EAGAIN) {
> >             yield_until_fd_readable(s->fd);
> >         } else if (socket_error() != EINTR) {
> >             break;
> >         }
> >     }
> >
> >     if (len == -1) {
> >         len = -socket_error();
> >     }
> >     return len;
> > }
> >
> > 3. The main thread can get the cpu and timer will run, asumes that we get
> EAGIN
> >   after invoking fetch_active_ports_list().
> 
> I don't understand the details of this interrupt injection, or why the timer
> will run if we're still not finished migration; it doesn't sound right that a
> QEMU_CLOCK_VIRTUAL
> timer should trigger while we're still receiving the VM and the guest is 
> paused.
> 

Because the migration coroutine will yiled cpu if it get a EAGAIN error from 
unix socket.
You can see the above step 2)

For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in 
channel_get_buffer()

static ssize_t channel_get_buffer(void *opaque,
                                  uint8_t *buf,
                                  int64_t pos,
                                  size_t size)
{
    QIOChannel *ioc = QIO_CHANNEL(opaque);
    ssize_t ret;

    do {
        ret = qio_channel_read(ioc, (char *)buf, size, NULL);
        if (ret < 0) {
            if (ret == QIO_CHANNEL_ERR_BLOCK) {
                qio_channel_yield(ioc, G_IO_IN);   // yiled cpu
            } else {
                /* XXX handle Error * object */
                return -EIO;
            }
        }
    } while (ret == QIO_CHANNEL_ERR_BLOCK);

    return ret;
}

> However, would it be fixed by using a vm_change_state_handler like
> ui/spice-core.c does?
> 

I'll check this method, thanks!

Regards,
-Gonglei

> Dave
> 
> 
> > 4. reproduce the problem by fault injection.
> >
> > The debug logs:
> >
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": 
> false,
> "capability": "xbzrle"}]}
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": 
> false,
> "capability": "rdma-pin-all"}]}
> > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> process_incoming_migration_co()    // Enter corountine
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load 
> > will
> init virtqueue
> > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> begin to inject debug **********    // fault injection begin, migration
> coroutine yiled cpu
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> current_time: 9750771061592    //vm_clock timer boom
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> virtio_serial_post_load_timer_cb   //calling the timer callback
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 0 // send a message to the guest, raise the
> irq line.
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 64056}, "event":
> "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 0
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750770788478, current_time: 9750771061592
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> //Enter migration coroutine again, and the restore the lapic register, the
> previous lapic's info will be covered :(
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> //enter vm_start here
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 
> > that
> means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> that means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 
> > that
> means DRIVER OK
> > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> old: 1
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750772061592, current_time: 9750794269805
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750795284148, current_time: 9750795312685
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750796329656, current_time: 9750796369458
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750797396208, current_time: 9750797426752
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750798445707, current_time: 9750798477520
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750799496162, current_time: 9750799532183
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750800556817, current_time: 9750800588741
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750801606596, current_time: 9750801637961
> >
> >
> > The below is my debugging diff (Not based on the newest master branch)
> >
> > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > index 6a45af9..a00b02f 100644
> > --- a/hw/char/virtio-serial-bus.c
> > +++ b/hw/char/virtio-serial-bus.c
> > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> >      if (!s->post_load) {
> >          return;
> >      }
> > +    QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> >      for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> >          port = s->post_load->connected[i].port;
> >          host_connected = s->post_load->connected[i].host_connected;
> > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> >               */
> >              send_control_event(s, port->id,
> VIRTIO_CONSOLE_PORT_OPEN,
> >                                 port->host_connected);
> > +            QEMU_LOG("gonglei: port->host_connected: %u\n",
> port->host_connected);
> >          }
> >          vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> >          if (vsc->set_guest_connected) {
> > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> version_id,
> >          }
> >      }
> >      timer_mod(s->post_load->timer, 1);
> > +    QEMU_LOG("gonglei: ************** begin to inject debug
> **********\n");
> > +    yield_until_fd_readable(qemu_get_fd(f));
> >      return 0;
> >  }
> >
> > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> *opaque, int version_id)
> >      if (version_id > 3) {
> >          return -EINVAL;
> >      }
> > -
> > +    QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> >      /* The virtio device */
> >      return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> >  }
> > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > index 5b47056..2d988cb 100644
> > --- a/hw/i386/kvm/apic.c
> > +++ b/hw/i386/kvm/apic.c
> > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> kvm_lapic_state *kapic)
> >  {
> >      APICCommonState *s = APIC_COMMON(dev);
> >      int i;
> > -
> > +    QEMU_LOG("gonglei: kvm_put_apic_state\n");
> >      memset(kapic, 0, sizeof(*kapic));
> >      kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> >      kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > index d2a6c4c..07699ec 100644
> > --- a/hw/i386/kvm/ioapic.c
> > +++ b/hw/i386/kvm/ioapic.c
> > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> >
> >      chip.chip_id = KVM_IRQCHIP_IOAPIC;
> >      kioapic = &chip.chip.ioapic;
> > -
> > +    QEMU_LOG("gonglei: kvm_ioapic_put\n");
> >      kioapic->id = s->id;
> >      kioapic->ioregsel = s->ioregsel;
> >      kioapic->base_address = s->busdev.mmio[0].addr;
> > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > index 042e960..110be20 100644
> > --- a/hw/intc/apic_common.c
> > +++ b/hw/intc/apic_common.c
> > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> version_id)
> >      if (info->post_load) {
> >          info->post_load(s);
> >      }
> > +    QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> >      return 0;
> >  }
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 92c1427..d074c0a 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> *opaque)
> >      const char *arpGuestCMD =
> "{\"execute\":\"guest-write-flag-arp\"}\n";
> >      const char *portName = "charchannel1";
> >      size_t uRet;
> > -
> > +    QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> >      ret = qemu_loadvm_state(f);
> >      qemu_fclose(f);
> >      free_xbzrle_decoded_buf();
> > diff --git a/qemu-timer.c b/qemu-timer.c
> > index 5741f0d..dd36bc9 100644
> > --- a/qemu-timer.c
> > +++ b/qemu-timer.c
> > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> enabled)
> >      QEMUTimerList *tl;
> >      bool old = clock->enabled;
> >      clock->enabled = enabled;
> > +    QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> old);
> >      if (enabled && !old) {
> >          qemu_clock_notify(type);
> >      } else if (!enabled && old) {
> > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> >      bool progress = false;
> >      QEMUTimerCB *cb;
> >      void *opaque;
> > +    static int count = 10;
> >
> >      qemu_event_reset(&timer_list->timers_done_ev);
> >      if (!timer_list->clock->enabled) {
> > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> >              qemu_mutex_unlock(&timer_list->active_timers_lock);
> >              break;
> >          }
> > -
> > +   if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > +       QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > +             ts->expire_time, current_time);
> > +   }
> >          /* remove timer from the list before calling the callback */
> >          timer_list->active_timers = ts->next;
> >          ts->next = NULL;
> > diff --git a/vl.c b/vl.c
> > index 06f34fe..824b246 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -978,7 +978,7 @@ void vm_start(void)
> >  {
> >      RunState requested;
> >      int64_t start_time, end_time;
> > -
> > +    QEMU_LOG("gonglei: enter vm_start\n");
> >      qemu_vmstop_requested(&requested);
> >      if (runstate_is_running() && requested == RUN_STATE_MAX) {
> >          return;
> >
> >
> > Can we wait looply the migration process finished when EAGAIN?
> >
> > Regards,
> > -Gonglei
> >
> >
> > > -----Original Message-----
> > > From: Gonglei (Arei)
> > > Sent: Friday, August 05, 2016 5:14 PM
> > > To: Paolo Bonzini; 'address@hidden'; 'address@hidden'
> > > Cc: address@hidden; address@hidden; Huangweidong (C)
> > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > >
> > > Hi Paolo , Jan, Amit
> > >
> > > Recently we encountered a problem that the virtio-serial can't work after
> > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > >
> > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > because
> > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). 
> > > The new
> > > interrupt
> > > can't be injected to VM because the irq line bit had been set to 1, but 
> > > the
> > > frontend
> > > driver never handle it or never know it.
> > >
> > >  Bug 867366 - virtio-serial misses irq delivery on migration
> > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > >
> > > But my qemu is the newest qemu, Both commit
> 80dcfb8532"virtio-serial-bus:
> > > post_load
> > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > propagate
> > > guest_connected to the port on post_load" are applied.
> > >
> > > I noticed that Paolo posted another problem maybe have a pertential
> problem
> > > about
> > > apic in Comment 23. But this patch
> > >  https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > haven't merged into qemu master.
> > >
> > > Would you give me some clues please? Thanks!
> > >
> > > Regards,
> > > -Gonglei
> > >
> >
> >
> --
> Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]