qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually wor


From: Dmitry Fomichev
Subject: RE: [PATCH v4 2/2] file-posix: add sg_get_max_segments that actually works with sg
Date: Sun, 6 Sep 2020 19:50:12 +0000


> -----Original Message-----
> From: Qemu-block <qemu-block-
> bounces+dmitry.fomichev=wdc.com@nongnu.org> On Behalf Of Maxim
> Levitsky
> Sent: Sunday, September 6, 2020 1:06 PM
> To: Tom Yan <tom.ty89@gmail.com>; eblake@redhat.com;
> pbonzini@redhat.com; fam@euphon.net; anielhb@linux.vnet.ibm.com;
> kwolf@redhat.com; mreitz@redhat.com
> Cc: qemu-devel@nongnu.org; qemu-block@nongnu.org
> Subject: Re: [PATCH v4 2/2] file-posix: add sg_get_max_segments that
> actually works with sg
> 
> On Sun, 2020-09-06 at 23:26 +0800, Tom Yan wrote:
> > I don't disagree with your proposal, but the thing is, do we even need
> > another field/limit for case 1? For example, do we *really* need to
> > clamp sizes[2] (NBD_INFO_BLOCK_SIZE) in nbd/server.c and
> > max_io_sectors (SCSIBlockLimits) in hw/scsi/scsi-disk.c to to any kind
> > of "dynamic" limit?
> 
> It depends on if we have other block drivers that have IO segment/max
> transfer sizes,
> that we need to enforce, other than the SG_IO case.
> 
> Knowing that the commit that added these limits to file-posix, which I was
> fixing is relatively
> new, I guess there are other cases for these limits.
> 
> I'll check this tomorrow.
> 
> >
> > Either way I can add another field (`max_pwrite`, maybe?) to
> > BlockLimits, as an infrastructure/hint, but what should be switched to
> > it, and what value should each block driver set it to, is probably
> > beyond what I can take.
> 
> Implementation wise, I think I can do this, but I'll wait few days for the
> feedback on my proposal.
> 
> Thanks for finding this bug!

There was also 
https://lore.kernel.org/qemu-devel/20200811225122.17342-2-dmitry.fomichev@wdc.com/
posted three weeks ago, but, seemingly, it was ignored... sorry if this
bug is already being fixed in a different way, but I think the fix presented
in that patch is worth considering.

Very best,
Dmitry

> 
> Best regards,
>       Maxim Levitsky
> 
> >
> > IMHO my patches should be merged first (they are really just fixing a
> > regression and some bugs). If anyone finds it necessary and is capable
> > and willing to fix the insufficiency of BlockLimits, they can do it
> > afterwards with another patch series as an improvement. I can add a
> > patch to remove the clamping in nbd/server.c as a perhaps-temporary
> > measure to address the original issue, if desired.
> >
> > On Sun, 6 Sep 2020 at 20:53, Maxim Levitsky <mlevitsk@redhat.com>
> wrote:
> > > On Sun, 2020-09-06 at 19:04 +0800, Tom Yan wrote:
> > > > Maybe you want to add some condition for this:
> > > > https://github.com/qemu/qemu/blob/v5.1.0/nbd/server.c#L659
> > > > Or not clamp it at all.
> > > >
> > > > On Sun, 6 Sep 2020 at 18:58, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > > In commit 867eccfed84f96b54f4a432c510a02c2ce03b430, Levitsky
> appears
> > > > > to have assumed that the only "SCSI Passthrough" is `-device
> > > > > scsi-generic`, while the fact is there's also `-device scsi-block`
> > > > > (passthrough without the sg driver). Unlike `-device scsi-hd`, getting
> > > > > max_sectors is necessary to it (more precisely, hw_max_sectors
> might
> > > > > what matters, but BLKSECTGET reports max_sectors, so).
> > > > >
> > > > > I'm unsure about how qemu-nbd works, but the commit clearly
> wasn't the
> > > > > right approach to fix the original issue it addresses. (It should for
> > > > > example ignore the max_transfer if it will never matter in to it, or
> > > > > overrides it in certain cases; when I glimpsed over
> > > > > https://bugzilla.redhat.com/show_bug.cgi?id=1647104, I don't see
> how
> > > > > it could be file-posix problem when it is reporting the right thing,
> > > > > regardless of whether "removing" the code helps.)
> > > > >
> > > > > I don't think we want to "mark" `-device scsi-block` as sg either. It
> > > > > will probably bring even more unexpected problems, because they
> are
> > > > > literally different sets of things behind the scene / in the kernel.
> > >
> > > Yes, I did overlook the fact that scsi-block is kind of hybrid passthough
> device,
> > > doing SG_IO for some things and regular IO for others.
> > >
> > > I don't think that my approach was wrong way to fix the problem, but as
> you found
> > > out, abusing 'bs->sg' hack (which I would be very happy to remove
> completely)
> > > wasn't a good idea.
> > > I actualy was aware of scsi-block and that it does SG_IO but it
> > > was forgotten some where on the way.
> > >
> > > So in summary what the problem is and what I think is the right solution:
> > >
> > >
> > > Each qemu block driver exposes block limits and assumes that they are
> the same
> > > for two IO interfaces the block driver can expose:
> > >
> > > 1. Regular qemu blk_pread/pwrite alike functions
> > > 2. blk_ioctl (tiny wrapper against SG_IO), supported by posix-raw on
> > >    host block devices/sg char devices, and by iscsi
> > >
> > > The problem is that these two interfaces can have different block limits.
> > >
> > > I don't know about iscsi, but for files, doing regular IO is always 
> > > unlimited,
> > > since it passes through the kernel block layer and segemented there on
> > > demand which is faster that doing it in userspace, while SG_IO is passed
> as is
> > > to the underlying SCSI device and lacks this segmentation.
> > >
> > > Regardless of how NBD uses these limits, I think that these limits should
> be correctly
> > > exposed by the block drivers, and therefore I propose is that each qemu
> block driver
> > > would expose a pair of block limits.
> > > One for the regular block IO, and other for SG_IO.
> > >
> > > Then block driver clients (like scsi devices that you mention, nbd, etc)
> > > can choose which limit to use, depending on which IO api they use.
> > > The scsi-generic, and scsi-block can use the SG_IO limits,
> > > while the rest  can use the normal (unlimited for file I/O) limits, 
> > > including
> the NBD.
> > >
> > > Best regards,
> > >         Maxim Levitsky
> > >
> > > > > On Fri, 4 Sep 2020 at 10:09, Tom Yan <tom.ty89@gmail.com> wrote:
> > > > > > sg devices have different major/minor than their corresponding
> > > > > > block devices. Using sysfs to get max segments never really worked
> > > > > > for them.
> > > > > >
> > > > > > Fortunately the sg driver provides an ioctl to get sg_tablesize,
> > > > > > which is apparently equivalent to max segments.
> > > > > >
> > > > > > Signed-off-by: Tom Yan <tom.ty89@gmail.com>
> > > > > > ---
> > > > > >  block/file-posix.c | 17 ++++++++++++++++-
> > > > > >  1 file changed, 16 insertions(+), 1 deletion(-)
> > > > > >
> > > > > > diff --git a/block/file-posix.c b/block/file-posix.c
> > > > > > index 411a23cf99..9e37594145 100644
> > > > > > --- a/block/file-posix.c
> > > > > > +++ b/block/file-posix.c
> > > > > > @@ -1178,6 +1178,21 @@ static int
> sg_get_max_transfer_length(int fd)
> > > > > >  #endif
> > > > > >  }
> > > > > >
> > > > > > +static int sg_get_max_segments(int fd)
> > > > > > +{
> > > > > > +#ifdef SG_GET_SG_TABLESIZE
> > > > > > +    long max_segments = 0;
> > > > > > +
> > > > > > +    if (ioctl(fd, SG_GET_SG_TABLESIZE, &max_segments) == 0) {
> > > > > > +        return max_segments;
> > > > > > +    } else {
> > > > > > +        return -errno;
> > > > > > +    }
> > > > > > +#else
> > > > > > +    return -ENOSYS;
> > > > > > +#endif
> > > > > > +}
> > > > > > +
> > > > > >  static int get_max_transfer_length(int fd)
> > > > > >  {
> > > > > >  #if defined(BLKSECTGET) && defined(BLKSSZGET)
> > > > > > @@ -1268,7 +1283,7 @@ static void
> hdev_refresh_limits(BlockDriverState *bs, Error **errp)
> > > > > >          bs->bl.max_transfer = pow2floor(ret);
> > > > > >      }
> > > > > >
> > > > > > -    ret = get_max_segments(s->fd);
> > > > > > +    ret = bs->sg ? sg_get_max_segments(s->fd) :
> get_max_segments(s->fd);
> > > > > >      if (ret > 0) {
> > > > > >          bs->bl.max_transfer = MIN_NON_ZERO(bs->bl.max_transfer,
> > > > > >                                             ret * 
> > > > > > qemu_real_host_page_size);
> > > > > > --
> > > > > > 2.28.0
> > > > > >
> 
> 


reply via email to

[Prev in Thread] Current Thread [Next in Thread]