lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Reading an empty file into a string


From: Vadim Zeitlin
Subject: Re: [lmi] Reading an empty file into a string
Date: Sun, 21 Sep 2008 18:26:37 +0200

On Sun, 21 Sep 2008 14:27:22 +0000 Greg Chicares <address@hidden> wrote:

GC> We've encountered an 'istream_to_string.hpp' problem with msvc,
GC> in this code:
GC> 
GC>     if(is.rdbuf()->in_avail())
GC>         {
GC>         oss << is.rdbuf();
GC>         }
GC> 
GC> because msvc's in_avail() always returns zero, as is apparently
GC> permitted:
GC> 
GC> http://groups.google.com/group/comp.lang.c++.moderated/msg/f3788f581b712d1e
GC> 
GC> It still seems desirable for this function template to produce an
GC> empty string from an empty file (more precisely, from an fstream
GC> corresponding to an empty file), without throwing an exception.

 For completeness, let me reproduce the comment from the current
istream_to_string.hpp:

/// Don't call basic_ostream::operator<<() if the input stream buffer
/// has no character to provide. Otherwise [27.6.2.5.3/8] it would set
/// 'failbit', and a spurious exception would be thrown when an empty
/// file is read.

And add that there unfortunately doesn't seem to be any way to distinguish
between not reading anything and a real error during reading. I'd expect
badbit to be set in the latter case but the Standard doesn't explicitly
support this expectation.

GC> Is there a tidy and reliable way to do that?

 The most tidy would be to just avoid in_avail() check and replace "!oss"
one with "oss.bad()" but, as I wrote above, this is not guaranteed to work
by the standard and so while it does seem like it should work, it's
probably not a good idea to rely on it, after all this is what led us to
the current problem with in_avail() in the first place.

 Otherwise replacing the in_avail() check with "is.rdbuf()->sgetc() !=
Traits::eof()" should work too: sgetc() will result in an attempt to read
from the file and if it fails for any reason other than the file being
empty, it should set the failbit or badbit. IOW I propose the following
patch:

--- istream_to_string.hpp 2008-08-02 23:30:32 +0000
+++ istream_to_string.hpp 2008-09-21 16:25:33 +0000
@@ -57,7 +57,7 @@
         }
 #else // !0
     std::basic_ostringstream<Char_t,Traits,Allocator> oss;
-    if(is.rdbuf()->in_avail())
+    if(is.rdbuf()->sgetc() != Traits::eof())
         {
         oss << is.rdbuf();
         }

 
GC> I added a unit test on 20080921T1309Z to make this easier to
GC> investigate, and also to compare performance of two methods:
GC>   Method 0: construct string from istreambuf_iterator pair
GC>   Method 1: extract from rdbuf into stringstream; use its str()
GC> of which the latter is used in production. Sample timings for a
GC> 30627-byte file:
GC> 
GC> MinGW gcc-3.4.4:
GC>   Method 0: 5.776e-004 s =     577611 ns, mean of 100 iterations
GC>   Method 1: 3.307e-004 s =     330730 ns, mean of 100 iterations
GC> 
GC> como-4.3.3:
GC>   Method 0: 2.030e-03  s =    2029539 ns, mean of 100 iterations
GC>   Method 1: 3.672e-03  s =    3671666 ns, mean of 100 iterations

 FWIW here is the result under Linux with g++ 4.1.2:

  Method 0: 2.758e-03 s =    2757620 ns, mean of 100 iterations
  Method 1: 3.529e-05 s =      35289 ns, mean of 284 iterations

I'm not quite sure if everything is correct here (why 284 iterations for
the second method?) but if it is, then it would seem that using
istreambuf_iterators, nice as it may look on the pages of a C++ book, is to
be avoided by all costs if performance matters ever so slightly.

 For comparison, I also benchmarked C STDIO performance, here is the output
of the test after applying the patch which follows at the end of this
message (please let me know if you think it would be useful to commit it):

  Method 0 (istreambuf_iterator): 2.792e-03 s =    2791670 ns, mean of 100 
iterations
  Method 1 (operator<<(rdbuf)):   3.666e-05 s =      36663 ns, mean of 273 
iterations
  Method 2 (stdio):               2.300e-05 s =      23002 ns, mean of 435 
iterations

So STDIO is still fastest, as expected, but the method 1 is at least not
ridiculously slow unlike the method 0.

 Regards,
VZ

P.S. Here is the patch adding more correctness testing to this test and
     a version using fread():

--- istream_to_string_test.cpp  2008-09-21 13:09:29 +0000
+++ istream_to_string_test.cpp  2008-09-21 16:16:58 +0000
@@ -32,10 +32,19 @@
 #include "test_tools.hpp"
 #include "timer.hpp"
 
+#include <boost/shared_ptr.hpp>
+
 #include <cstdio>         // std::remove()
 #include <fstream>
 #include <iterator>
 
+namespace
+{
+
+const char * const TEST_FILE_NAME = "sample.cns";
+
+}
+
 template<typename Char_t, typename Traits, typename Allocator>
 void istream_to_string_0
     (std::istream&                               is
@@ -70,41 +79,94 @@
         }
 }
 
+void file_to_string_0(char const *name, std::string& s)
+{
+    std::ifstream ifs(name, ios_in_binary());
+    istream_to_string_0(ifs, s);
+}
+
+void file_to_string_1(char const *name, std::string& s)
+{
+    std::ifstream ifs(name, ios_in_binary());
+    istream_to_string_1(ifs, s);
+}
+
+void file_to_string_2(char const * name, std::string& s)
+{
+    FILE * const fp = fopen(name, "rb");
+    if(!fp)
+        throw std::runtime_error("Unable to open input file.");
+
+    boost::shared_ptr<FILE> ensureClose(fp, fclose);
+
+    static size_t const BUF_SIZE = 4096;
+
+    char buf[BUF_SIZE];
+    while (!feof(fp))
+        {
+        size_t const numRead = fread(buf, 1, BUF_SIZE, fp);
+        if(ferror(fp))
+            throw std::runtime_error("Unable to read stream into string.");
+
+        if(numRead > 0)
+            {
+            s.append(buf, numRead);
+            }
+        }
+}
+
 void mete_0()
 {
-    std::ifstream ifs("sample.cns", ios_in_binary());
     std::string s;
-    istream_to_string_0(ifs, s);
+    file_to_string_0(TEST_FILE_NAME, s);
 }
 
 void mete_1()
 {
-    std::ifstream ifs("sample.cns", ios_in_binary());
-    std::string s;
-    istream_to_string_1(ifs, s);
+    std::string s;
+    file_to_string_1(TEST_FILE_NAME, s);
+}
+
+void mete_2()
+{
+    std::string s;
+    file_to_string_2(TEST_FILE_NAME, s);
 }
 
 int test_main(int, char*[])
 {
     // Test an empty file.
 
-    char const* p = "/tmp/eraseme.empty";
+    char const* TEST_EMPTY_FILE_NAME = "/tmp/eraseme.empty";
     {
-    std::ofstream ofs(p, ios_out_trunc_binary());
+    std::ofstream ofs(TEST_EMPTY_FILE_NAME, ios_out_trunc_binary());
     }
-    std::ifstream ifs(p, ios_in_binary());
+
     std::string s;
-    istream_to_string_0(ifs, s);
-    BOOST_TEST(s.empty());
-    istream_to_string_1(ifs, s);
-    BOOST_TEST(s.empty());
-    std::remove(p);
+    file_to_string_0(TEST_EMPTY_FILE_NAME, s);
+    BOOST_TEST(s.empty());
+    file_to_string_1(TEST_EMPTY_FILE_NAME, s);
+    BOOST_TEST(s.empty());
+    file_to_string_2(TEST_EMPTY_FILE_NAME, s);
+    BOOST_TEST(s.empty());
+
+    std::remove(TEST_EMPTY_FILE_NAME);
+
+    // Test a non-empty file.
+
+    std::string s0, s1, s2;
+    file_to_string_0(TEST_FILE_NAME, s0);
+    file_to_string_1(TEST_FILE_NAME, s1);
+    BOOST_TEST_EQUAL(s0, s1);
+    file_to_string_2(TEST_FILE_NAME, s2);
+    BOOST_TEST_EQUAL(s0, s2);
 
     // Test speed.
 
     std::cout << "  Speed tests...\n"
-        << "  Method 0: " << TimeAnAliquot(mete_0) << '\n'
-        << "  Method 1: " << TimeAnAliquot(mete_1) << '\n'
+        << "  Method 0 (istreambuf_iterator): " << TimeAnAliquot(mete_0) << 
'\n'
+        << "  Method 1 (operator<<(rdbuf)):   " << TimeAnAliquot(mete_1) << 
'\n'
+        << "  Method 2 (stdio):               " << TimeAnAliquot(mete_2) << 
'\n'
         ;
 
     return 0;


reply via email to

[Prev in Thread] Current Thread [Next in Thread]