[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 00/33] migration: capture error reports into Error object
From: |
Daniel P . Berrangé |
Subject: |
Re: [PATCH 00/33] migration: capture error reports into Error object |
Date: |
Tue, 16 Feb 2021 09:30:52 +0000 |
User-agent: |
Mutt/2.0.5 (2021-01-21) |
On Mon, Feb 15, 2021 at 07:01:28PM +0000, Dr. David Alan Gilbert wrote:
> * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > On Mon, Feb 15, 2021 at 06:38:05PM +0000, Dr. David Alan Gilbert wrote:
> > > One thing to check, and I *think* you're OK, but we have one place where
> > > we actually check the error number:
> > >
> > > migration.c:
> > > 3414 static MigThrError migration_detect_error(MigrationState *s)
> > > ...
> > > 3426 /* Try to detect any file errors */
> > > 3427 ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > > 3428 if (!ret) {
> > > 3429 /* Everything is fine */
> > > 3430 assert(!local_error);
> > > 3431 return MIG_THR_ERR_NONE;
> > > 3432 }
> > > 3433
> > > 3434 if (local_error) {
> > > 3435 migrate_set_error(s, local_error);
> > > 3436 error_free(local_error);
> > > 3437 }
> > > 3438
> > > 3439 if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> > > 3440 /*
> > > 3441 * For postcopy, we allow the network to be down for a
> > > 3442 * while. After that, it can be continued by a
> > > 3443 * recovery phase.
> > > 3444 */
> > > 3445 return postcopy_pause(s);
> > > 3446 } else {
> > >
> > > This is to go into postcopy pause if the network connection broke (but
> > > not if for example a device moaned about being in an invalid state)
> > >
> > > If I read this correctly, file errors are still being preserved - is
> > > that correct?
> >
> > Yes, in places where QemuFile is reporting an actual I/O error I've
> > tried to preserve that. Only removed setting of fake I/O errors. So
> > if anything, we ought to get more accurate at detecting the recoverable
> > scenarios once we fully cleanup errors.
>
> OK, good.
One scenario to possibly check though is that in a few places we used
error_report_err() but didn't immediately return an error code back to
the caller, instead carrying on doing other calls. It is possible that
we thus reported an error about bad data, and then later hit the EIO
check for QemuFile.
Regards,
Daniel
--
|: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o- https://fstop138.berrange.com :|
|: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- [PATCH 33/33] migration: push Error **errp into postcopy_do_resume(), (continued)
- [PATCH 33/33] migration: push Error **errp into postcopy_do_resume(), Daniel P . Berrangé, 2021/02/04
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/04
- Re: [PATCH 00/33] migration: capture error reports into Error object, Daniel P . Berrangé, 2021/02/04
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/15
- Re: [PATCH 00/33] migration: capture error reports into Error object, Daniel P . Berrangé, 2021/02/15
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/15
- Re: [PATCH 00/33] migration: capture error reports into Error object,
Daniel P . Berrangé <=
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/16