lmi-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[lmi-commits] [lmi] master ac5731f 5/8: Controvert a published claim


From: Greg Chicares
Subject: [lmi-commits] [lmi] master ac5731f 5/8: Controvert a published claim
Date: Fri, 30 Jul 2021 16:14:47 -0400 (EDT)

branch: master
commit ac5731f5299e0050f440ef6414359c246cfd5031
Author: Gregory W. Chicares <gchicares@sbcglobal.net>
Commit: Gregory W. Chicares <gchicares@sbcglobal.net>

    Controvert a published claim
    
    Scherer (_Computational Physics_, page 97) claims:
      "Chandrupatla's method is more efficient than Dekker's or Brent's,
      especially for higher-order roots (Figs. 6.10, 6.11, 6.12)."
    As the present investigation shows, that conclusion is based on a tiny
    number of atypical examples, whereas a large number of real-world tests
    supports the opposite conclusion, that Brent's method is preferable to
    Chandrupatla's (and of course Dekker's, as proven in Brent's book).
    
    Translated Scherer's pseudocode for Chandrupatla's and Brent's methods
    to C++ to reproduce those examples and evaluate that claim. (Corrected
    some but not all defects in the pseudocode for Brent's method, as noted
    inline.)
    
    Chandrupatla's algorithm differs in two ways from Brent's. It never uses
    linear (secant) interpolation; instead, it uses only inverse quadratic
    interpolation (IQI) or bisection. And it imposes a more restrictive
    criterion for IQI, requiring that the interpolating parabola truly be
    single-valued in the current interval, where Brent instead heuristically
    requires that an IQI iterate move no more than three-quarters of the way
    from the current best estimate to the contrapoint at which the sign of
    the evaluated objective function is opposite.
    
    Results of lmi's system test:
    
    (1) Using Chandrupatla's algorithm:
    
    +#if defined USE_CHANDRUPATLA_ALGORITHM
    +// evaluations  max   mean  sample-std-dev
    +//     7332      65   18.9       7.13        HEAD (brent)
    +//     9149      59   23.6      11.13        this (chandrupatla)
    
    (2) Using Brent's algorithm, but modified to use Chadrupatla's tighter
    IQI acceptance criterion:
    
    +//#define     USE_CHANDRUPATLA_IQI_CRITERION
    +// evaluations  max   mean  sample-std-dev
    +//     7332      65   18.9       7.13        HEAD (brent)
    +//     8545      66   22.0      12.90        this (IQI more stringent)
    
    This real-world example involves 388 test cases, each of which finds a
    root of a polynomial of degree approximately six hundred, which is
    a higher-order root than the textbook's examples test. Brent's method
    is clearly superior. Both ways in which it differs from Chandrupatla's:
     - (conditionally) use secant interpolation
     - heuristically accept some IQI iterates that Chandrupatla rejects
    contribute to its superiority.
    
    Analysis of the three textbook examples (see unit test):
    
    Fig. 6.10, f(x) = x^-2: Ten evaluations versus eleven is not a big
    difference. The graph seems to show Brent's method taking an extra two
    iterations to ensure that a root is bracketed; probably that wouldn't
    occur with a slighly different tolerance.
    
    Fig. 6.11, f(x) = (x-1)^3: For this equation, bisection is the fastest
    of the algorithms depicted. lmi_root(), with an optional parameter to
    force bisection always, is shown to take exactly the same number of
    evaluations. Uncommenting the tracing code shows that Chandrupatla's
    method chooses IQI nine times, otherwise bisection. In the given
    [0.0, 1.8] range, bisection ought to take log2(1.8/DBL_EPSILON) = 53
    steps, so the graph for bisection seems correct, and Chandrupatla's
    method takes 53+9=62 evaluations.
    
    Fig. 6.11, f(x) = x^25: Here, Chandrupatla's method always chooses
    bisection, and the graphs of those two methods virtually coincide.
    
    Thus, the evidence presented consists of one test where Chandrupatla's
    and Brent's methods perform similarly, and two where bisection is
    optimal and Chandrupatla's bias toward bisection is favored. Against
    that thin evidence, the real-world system test above shows Brent's
    method to be the better choice.
---
 zero.hpp      | 280 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 zero_test.cpp | 101 ++++++++++++++++-----
 2 files changed, 361 insertions(+), 20 deletions(-)

diff --git a/zero.hpp b/zero.hpp
index 3d33e45..6b85363 100644
--- a/zero.hpp
+++ b/zero.hpp
@@ -544,6 +544,22 @@ root_type lmi_root
                 }
             s = e;
             e = d;
+//#define     USE_CHANDRUPATLA_IQI_CRITERION
+// evaluations  max   mean  sample-std-dev
+//     7332      65   18.9       7.13        HEAD (brent)
+//     8545      66   22.0      12.90        this (IQI more stringent)
+#if defined USE_CHANDRUPATLA_IQI_CRITERION
+            double xi  = ( b -  c) / ( a -  c);
+            double phi = (fb - fc) / (fa - fc);
+            // Chandrupatla acceptance criterion
+            bool const k0 =
+                    interpolate_inverse_quadratic != impetus
+                ||  (
+                        (phi * phi) < xi
+                        && ((1.0 - phi) * (1.0 - phi)) < (1.0 - xi)
+                    )
+                ;
+#else  // !defined USE_CHANDRUPATLA_IQI_CRITERION
             // Use the criteria in Brent's ALGOL, which differ
             // slightly from their descriptions in his text.
             //
@@ -551,6 +567,7 @@ root_type lmi_root
             //   "we reject i [i.e., b + p/q] if 2|p| ≥ 3|mq|"
             // Difference: the ALGOL subtracts tol×|q| [i.e., δ|q|]
             bool const k0 = 2.0 * p < 3.0 * m * q - std::fabs(tol * q);
+#endif // !defined USE_CHANDRUPATLA_IQI_CRITERION
             // AfMWD says on page 50:
             //   "Let e be the value of p/q at the step before the
             //   last one."
@@ -643,6 +660,20 @@ root_type decimal_root
             }
         };
 
+//#define     USE_CHANDRUPATLA_ALGORITHM
+#if defined USE_CHANDRUPATLA_ALGORITHM
+// evaluations  max   mean  sample-std-dev
+//     7332      65   18.9       7.13        HEAD (brent)
+//     9149      59   23.6      11.13        this (chandrupatla)
+    (void)&bias;
+    (void)&sprauchling_limit;
+    auto z = scherer_chandrupatla
+        (fr
+        ,round_dec(bound0)
+        ,round_dec(bound1)
+        ,3.14159 // disregarded
+        );
+#else  // !defined USE_CHANDRUPATLA_ALGORITHM
     auto z = lmi_root
         (fr
         ,round_dec(bound0)
@@ -652,6 +683,7 @@ root_type decimal_root
         ,os_trace
         ,bias
         );
+#endif // !defined USE_CHANDRUPATLA_ALGORITHM
     z.root = round_dec(z.root);
     os_trace << " function evaluations: " << z.n_eval;
     z.n_eval = lmi::ssize(m);
@@ -660,6 +692,254 @@ root_type decimal_root
     return z;
 }
 
+//#include <iostream>
+
+// The next two function templates are adapted from the pseudocode here:
+//    Philipp O. J. Scherer
+//    Computational Physics (2nd ed.)
+//    ISBN 978-3-319-00400-6
+// with numerous corrections.
+
+template<typename FunctionalType>
+root_type scherer_brent
+    (FunctionalType& f
+    ,double          a
+    ,double          b
+    ,double          t
+    )
+{
+    constexpr double epsilon {std::numeric_limits<double>::epsilon()};
+    root_impetus     impetus {evaluate_bounds};
+    (void)&impetus;
+//std::cout.precision(21);
+
+    double fa = f(a);
+    double fb = f(b);
+#if 0 // RECTIFIED
+    // Brent: at top of loop (not bottom, as here):
+    //   if fb and fc have the same sign, then swap a and c
+    // this code seems to do something equivalent
+    double fc = fa;   // Brent: = fb
+    double c = a;     // likewise
+#endif // 0
+#if 1 // RECTIFIED
+    double fc = fb;
+    double c = b;
+#endif // 1
+
+    double d = b - a;
+    double e = d;
+
+//std::cout << " br " << ' ' << a << ' ' << b << ' ' << c << ' ' << fa << ' ' 
<< fb << ' ' << fc << std::endl;
+
+int j = 0;
+    for(;; ++j)
+        {
+#if 1 // RECTIFIED
+        // Brent does this at the top
+        if((0.0 < fb) == (0.0 < fc))
+//          if(signum(fb) == signum(fc)) // should be equivalent
+            {
+            c = a;
+            fc = fa;
+            d = e = b - a;
+            }
+#endif // 1
+        // if c is a better approximant than b, then exchange
+        // Brent does the same, but it's confusing:
+        //   it looks like a three-way shift
+        //   but it actually wipes out the old 'a'
+        if(std::fabs(fc) < std::fabs(fb))
+            {
+             a =  b;  b =  c;  c =  a;
+            fa = fb; fb = fc; fc = fa;
+            }
+        // Brent: set 'tol' carefully--not just epsilon
+//      double tol = epsilon; // RECTIFIED
+        double tol = 2.0 * epsilon * std::fabs(b) + t;
+        double m = 0.5 * (c - b);
+        // Brent: '<=' rather than '<'
+//      if(0.0 == fb || std::fabs(m) < tol) // RECTIFIED
+        if(0.0 == fb || std::fabs(m) <= tol)
+            {
+            return {b, root_is_valid, j, 2 + j};
+            }
+        // bisect if previous iteration took too small a step or
+        // produced no improvement
+        // Brent: '<=' rather than '<' in second condition
+//      if(std::fabs(e) < tol || std::fabs(fa) < std::fabs(fb)) // RECTIFIED
+        if(std::fabs(e) < tol || std::fabs(fa) <= std::fabs(fb))
+            {
+            impetus = pis_aller;
+            d = e = m;
+            }
+        // else try interpolating
+        else
+            {
+            double p, q;
+            // Brent: store fb/fa for later use--avoid numerical problems?
+            if(a == c)
+            // linear
+                {
+                impetus = interpolate_linear;
+                p = 2.0 * m * fb / fa;
+                // Brent: opposite sign
+//              q = (fb - fa) / fa; // PARTLY RECTIFIED
+                q = (fa - fb) / fa;
+                }
+            else
+            // inverse quadratic
+                {
+                impetus = interpolate_inverse_quadratic;
+                // Brent: arrange differently--avoid numerical problems?
+                p = 2.0 * m * fb * (fa - fb) / (fc * fc) - (b - a) * fb * (fb 
- fc) / (fa * fc);
+                q = (fa / fc - 1.0) * (fb / fc - 1.0) * (fb / fa - 1.0);
+                }
+            // force 'p' to be positive
+            // Brent: '<' rather than '<='
+            if(0.0 <= p)
+                {q = -q;}
+            else
+                {p = -p;}
+            // store previous step
+            double s = e;
+            e = d;
+            if
+                // Brent: absolute value |tol * q|
+                (  2.0 * p < 3.0 * m * q - tol * q
+                && p < std::fabs(0.5 * s * q)
+                )
+                {
+                d = p / q;
+                }
+            else
+                {
+                d = e = m;
+                }
+            } // closing brace not evident in textbook
+
+        a = b;
+        fa = fb;
+        if(tol < std::fabs(d))
+            {
+            b += d;
+            }
+        else
+            {
+            // Brent: adds 'tol' if 0==m
+            b += tol * signum(m);
+            }
+        fb = f(b);
+//std::cout << j << " br " << ' ' << a << ' ' << b << ' ' << c << ' ' << 
impetus << std::endl;
+//std::cout << j << " br " << impetus << ' ' << b << std::endl;
+#if 0 // RECTIFIED
+        // Brent does this at the top
+        if((0.0 < fb) == (0.0 < fc))
+//          if(signum(fb) == signum(fc)) // should be equivalent
+            {
+            c = a;
+            fc = fa;
+            e = d = b - a;
+            }
+#endif // 0
+        }
+}
+
+template<typename FunctionalType>
+root_type scherer_chandrupatla
+    (FunctionalType& f
+    ,double          a
+    ,double          b
+    ,double          // t
+    )
+{
+    constexpr double epsilon   {std::numeric_limits<double>::epsilon()};
+    root_impetus     impetus {evaluate_bounds};
+    (void)&impetus;
+//std::cout.precision(21);
+
+    // textbook swaps 'a' and 'b' here compared to its "brent" routine
+    //   but that probably doesn't matter
+    double fa = f(a);
+    double fb = f(b);
+    double c  = a;
+    double fc = fa;
+    double t = 0.5;
+
+int j = 0;
+    for(;; ++j)
+        {
+        // 't' is coefficient for interpolating (linearly) by any technique
+        double x = a + t * (b - a);
+        double fx = f(x);
+//std::cout << j << " ch " << impetus << ' ' << x << std::endl;
+//std::cout << j << " ch " << impetus << ' ' << t << ' ' << x << ' ' << fx << 
' ' << a << ' ' << b << ' ' << c << std::endl;
+        if(signum(fx) == signum(fa))
+            {
+            c = a;
+            fc = fa;
+            a = x;
+            fa = fx; // textbook has weird capitalization on RHS
+            }
+        else
+            {
+            c = b;
+            b = a;
+            a = x;
+            fc = fb;
+            fb = fa;
+            fa = fx;
+            }
+        double xm = a;
+        double fm = fa;
+        if(std::fabs(fb) < std::fabs(fa))
+            {
+            xm = b;
+            fm = fb;
+            }
+        // textbook: 'epsM' and 'epsA' not defined
+        double epsM = epsilon;
+        double epsA = epsilon;
+        double tol = 2.0 * epsM * std::fabs(xm) + epsA;
+// Brent uses this:
+//      double tol = 2.0 * epsilon * std::fabs(b) + t;
+        double tl = tol / std::fabs(b - c);
+        if(0.5 < tl || 0.0 == fm)
+            {
+            return {xm, root_is_valid, j, 2 + j};
+            }
+// specimen observed values:
+// 0.0659374999999999961142 a
+// 0.0406250000000000013878 b this is one end of the bracket
+// 0.0912499999999999977796 c so this is the other end of the bracket
+// 0.0447250871687336973292 true value
+        double xi = (a - b) / (c - b);
+        double phi = (fa - fb) / (fc - fb);
+        if
+            (  1.0 - std::sqrt(1.0 - xi) < phi
+            && phi < std::sqrt(xi)
+            )
+            {
+            impetus = interpolate_inverse_quadratic;
+            // textbook: really use 't'?
+            t =
+                  (fa / (fb - fa)) * (fc / (fb - fc))
+                + ((c - a) / (b - a)) * (fa / (fc - fa)) * (fb / (fc - fb))
+                ;
+            }
+        else
+            {
+            impetus = pis_aller;
+            t = 0.5;
+            }
+        // constrain t to (tl, 1-tl)
+        if(t < tl)
+            {t = tl;}
+        if(1.0 - tl < t)
+            {t = 1.0 - tl;}
+        }
+}
+
 /// An instrumented translation of Brent's reference implementation.
 ///
 /// Deviation from the original ALGOL:
diff --git a/zero_test.cpp b/zero_test.cpp
index f408ee7..f9be645 100644
--- a/zero_test.cpp
+++ b/zero_test.cpp
@@ -883,32 +883,93 @@ void test_various_functions()
 
     // The next three examples are from _Computational Physics_,
     // Philipp O. J. Scherer, 2nd ed., ISBN 978-3-319-00400-6,
-    // page 96; number of iterations estimated from graphs.
-
-    // Scherer, Fig. 6.10, iteration counts for 2ϵ tolerance:
-    //   10 vs. 11 here (Brent)
-    //    7             (Chandrupatla)
+    // page 96; number of evaluations estimated from graphs as two
+    // plus apparent number of iterations, to account for required
+    // evaluation of both initial bounds; and separately measured
+    // by writing functions based on Scherer's pseudocode (with
+    // numerous corrections for his faulty Brent algorithm).
+    //
+    // (The LMI_* conditionals for evaluation counts may seem
+    // haphazard; by design, they're just adequate to prevent error
+    // messages for secondary (msw) platforms, where they probably
+    // indicate either x87 discrepancies or msw library defects).
+
+    root_type r {};
+
+    // Scherer, Fig. 6.10, iteration counts for a 2ϵ tolerance:
+    //    10              Scherer's Chandrupatla algorithm
+    //    (9)             Chandrupatla estimated from Scherer's graph
+    //    11 [22, x87]    Scherer's (not quite) Brent algorithm
+    //   (12)             Brent estimated from Scherer's graph
+    //    11              lmi_root(): Brent's method, validated
+    //    63              binary64_midpoint() bisection
     auto f04 = [](double x) {return std::pow(x, 2.0) - 2.0;};
     auto root_04 = std::sqrt(2.0);
-    test_a_decimal_function(f04, root_04,  -1.0, 2.0, 17     , __LINE__, 11);
-    test_a_function        (f04, root_04,  -1.0, 2.0, 0.0    , __LINE__);
-
-    // Scherer, Fig. 6.11, iteration counts for 2ϵ tolerance:
-    //   126 vs. 130 here (Brent)
-    //    59              (Chandrupatla)
+    test_a_decimal_function (f04, root_04,  -1.0, 2.0, 17     , __LINE__, 11);
+    test_a_function         (f04, root_04,  -1.0, 2.0, 0.0    , __LINE__);
+    r = scherer_chandrupatla(f04,           -1.0, 2.0, 0.0              );
+    LMI_TEST_EQUAL(10, r.n_eval);
+    r = scherer_brent       (f04,           -1.0, 2.0, 0.0              );
+#if !defined LMI_X87
+    LMI_TEST_EQUAL(11, r.n_eval);
+#else  // defined LMI_X87
+    LMI_TEST_EQUAL(22, r.n_eval);
+#endif // defined LMI_X87
+    r = lmi_root            (f04,           -1.0, 2.0, 0.0              );
+    LMI_TEST_EQUAL(11, r.n_eval);
+    r = lmi_root            (f04,           -1.0, 2.0, 0.0, 0           );
+    LMI_TEST_EQUAL(63, r.n_eval); // sprauchling_limit 0
+
+    // Scherer, Fig. 6.11, iteration counts for a 2ϵ tolerance:
+    //    62              Scherer's Chandrupatla algorithm
+    //   (61)             Chandrupatla estimated from Scherer's graph
+    //   130              Scherer's (not quite) Brent algorithm
+    //  (128)             Brent estimated from Scherer's graph
+    //   130              lmi_root(): Brent's method, validated
+    //    62              binary64_midpoint() bisection
     auto f05 = [](double x) {return std::pow((x - 1.0), 3.0);};
     auto root_05 = 1.0;
-    test_a_decimal_function(f05, root_05,   0.0, 1.8, 17     , __LINE__, 130);
-    test_a_function        (f05, root_05,   0.0, 1.8, 0.0    , __LINE__);
-
-    // Scherer, Fig. 6.12, iteration counts for 1.0e-12 tolerance
-    // (roundoff error in the computed function precludes 2ϵ):
-    //   124 vs. 107 here (Brent)
-    //    31              (Chandrupatla)
+    test_a_decimal_function (f05, root_05,   0.0, 1.8, 17     , __LINE__, 130);
+    test_a_function         (f05, root_05,   0.0, 1.8, 0.0    , __LINE__);
+    r = scherer_chandrupatla(f05,            0.0, 1.8, 0.0              );
+    LMI_TEST_EQUAL(62, r.n_eval);
+    r = scherer_brent       (f05,            0.0, 1.8, 0.0              );
+    LMI_TEST_EQUAL(130, r.n_eval);
+    r = lmi_root            (f05,            0.0, 1.8, 0.0              );
+    LMI_TEST_EQUAL(130, r.n_eval);
+    r = lmi_root            (f05,            0.0, 1.8, 0.0, 0           );
+    LMI_TEST_EQUAL(62, r.n_eval); // sprauchling_limit 0
+
+    // Scherer, Fig. 6.12, iteration counts for a 1.0e-12 tolerance
+    // (roundoff error in the computed function precludes using 2ϵ):
+    //    44 [45, x87]    Scherer's Chandrupatla algorithm
+    //   (33)             Chandrupatla estimated from Scherer's graph
+    //   105 [119, x87]   Scherer's (not quite) Brent algorithm
+    //  (126)             Brent estimated from Scherer's graph
+    //   117              lmi_root(): Brent's method, validated
+    //     3              binary64_midpoint() bisection
     auto f06 = [](double x) {return std::pow(x, 25.0);};
     auto root_06 = 0.0;
-    test_a_decimal_function(f06, root_06,  -1.0, 2.0, 12     , __LINE__, 107);
-    test_a_function        (f06, root_06,  -1.0, 2.0, 5.0e-13, __LINE__);
+    test_a_decimal_function (f06, root_06,  -1.0, 2.0, 12     , __LINE__, 107);
+    test_a_function         (f06, root_06,  -1.0, 2.0, 5.0e-13, __LINE__);
+    r = scherer_chandrupatla(f06,           -1.0, 2.0, 5.0e-13          );
+#if !defined LMI_X87
+    LMI_TEST_EQUAL(44, r.n_eval);
+#else  // defined LMI_X87
+    LMI_TEST_EQUAL(45, r.n_eval);
+#endif // defined LMI_X87
+    r = scherer_brent       (f06,           -1.0, 2.0, 5.0e-13          );
+#if defined LMI_X86_64 && defined LMI_POSIX
+    LMI_TEST_EQUAL(105, r.n_eval);
+#endif // defined LMI_X86_64 && defined LMI_POSIX
+    r = lmi_root            (f06,           -1.0, 2.0, 5.0e-13          );
+#if defined LMI_X86_64 && defined LMI_POSIX
+    LMI_TEST_EQUAL(117, r.n_eval);
+#endif // defined LMI_X86_64 && defined LMI_POSIX
+    r = lmi_root            (f06,           -1.0, 2.0, 5.0e-13, 0       );
+    // This is not a fair test: 0.0, an exact root, is the
+    // first iterate with binary64_midpoint().
+    LMI_TEST_EQUAL(3, r.n_eval); // sprauchling_limit 0
 
     // Despite its apparent insipidity, this is actually a very
     // interesting test: after the first iterate has been calculated



reply via email to

[Prev in Thread] Current Thread [Next in Thread]