[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 00/33] migration: capture error reports into Error object
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH 00/33] migration: capture error reports into Error object |
Date: |
Tue, 16 Feb 2021 19:32:04 +0000 |
User-agent: |
Mutt/2.0.5 (2021-01-21) |
* Daniel P. Berrangé (berrange@redhat.com) wrote:
> On Mon, Feb 15, 2021 at 07:01:28PM +0000, Dr. David Alan Gilbert wrote:
> > * Daniel P. Berrangé (berrange@redhat.com) wrote:
> > > On Mon, Feb 15, 2021 at 06:38:05PM +0000, Dr. David Alan Gilbert wrote:
> > > > One thing to check, and I *think* you're OK, but we have one place where
> > > > we actually check the error number:
> > > >
> > > > migration.c:
> > > > 3414 static MigThrError migration_detect_error(MigrationState *s)
> > > > ...
> > > > 3426 /* Try to detect any file errors */
> > > > 3427 ret = qemu_file_get_error_obj(s->to_dst_file, &local_error);
> > > > 3428 if (!ret) {
> > > > 3429 /* Everything is fine */
> > > > 3430 assert(!local_error);
> > > > 3431 return MIG_THR_ERR_NONE;
> > > > 3432 }
> > > > 3433
> > > > 3434 if (local_error) {
> > > > 3435 migrate_set_error(s, local_error);
> > > > 3436 error_free(local_error);
> > > > 3437 }
> > > > 3438
> > > > 3439 if (state == MIGRATION_STATUS_POSTCOPY_ACTIVE && ret == -EIO) {
> > > > 3440 /*
> > > > 3441 * For postcopy, we allow the network to be down for a
> > > > 3442 * while. After that, it can be continued by a
> > > > 3443 * recovery phase.
> > > > 3444 */
> > > > 3445 return postcopy_pause(s);
> > > > 3446 } else {
> > > >
> > > > This is to go into postcopy pause if the network connection broke (but
> > > > not if for example a device moaned about being in an invalid state)
> > > >
> > > > If I read this correctly, file errors are still being preserved - is
> > > > that correct?
> > >
> > > Yes, in places where QemuFile is reporting an actual I/O error I've
> > > tried to preserve that. Only removed setting of fake I/O errors. So
> > > if anything, we ought to get more accurate at detecting the recoverable
> > > scenarios once we fully cleanup errors.
> >
> > OK, good.
>
> One scenario to possibly check though is that in a few places we used
> error_report_err() but didn't immediately return an error code back to
> the caller, instead carrying on doing other calls. It is possible that
> we thus reported an error about bad data, and then later hit the EIO
> check for QemuFile.
That's generally OK; it gets pretty painful to do the qemu file checks
after every read.
Dave
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
- Re: [PATCH 00/33] migration: capture error reports into Error object, (continued)
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/04
- Re: [PATCH 00/33] migration: capture error reports into Error object, Daniel P . Berrangé, 2021/02/04
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/15
- Re: [PATCH 00/33] migration: capture error reports into Error object, Daniel P . Berrangé, 2021/02/15
- Re: [PATCH 00/33] migration: capture error reports into Error object, Dr. David Alan Gilbert, 2021/02/15
- Re: [PATCH 00/33] migration: capture error reports into Error object, Daniel P . Berrangé, 2021/02/16
- Re: [PATCH 00/33] migration: capture error reports into Error object,
Dr. David Alan Gilbert <=