texmacs-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Texmacs-dev] Performance questions, proposals, patches


From: Josef Weidendorfer
Subject: Re: [Texmacs-dev] Performance questions, proposals, patches
Date: Thu, 14 Oct 2004 18:53:09 +0200
User-agent: KMail/1.7.1

Hi Joris,

On Thursday 14 October 2004 13:08, Joris van der Hoeven wrote:
> Hi Josef,
>
> On Wed, 13 Oct 2004, Josef Weidendorfer wrote:
> > I wonder why with GCC >=2.96 on Linux/FreeBSD, the default compilation
> > flags for texmacs include "-fno-default-inline -fno-inline"?
> > At least here, if compiling with inlining, the code gains at least 25%
> > speedup without any negative effects.
>
> Certain versions of GCC 3.* are bugged and caused segmentation faults
> in combination with inlining. Maybe we can put inlining in again for
> the most recent version, if you did not notice any suspicious behaviour.
> Any patch for configure.in is welcome.

AFAIK, Suse uses gcc 3.3.3 for some time now (on 9.0 and on the latest 9.1), 
and as I said, I have no problem with inlining switched on.

===================================================
--- configure.in.orig   2004-10-14 17:54:27.208601456 +0200
+++ configure.in        2004-10-14 17:56:04.009885424 +0200
@@ -781,6 +781,8 @@
 optimize_default="yes"

 case "$GXX_VERSION" in
+    3.3.3)
+    ;;
     2.96 | 3.0 | 3.0.* | 3.1 | 3.1.* | 3.2 | 3.2.* | 3.3 | 3.3.*)
        case "${host}" in
          i*86-*-linux-gnu* | i*86-*-freebsd*)
==================================================

> may still be optimized a bit further. I also noticed another possible
> optimization for arrays (and strings): instead of allocating an array
> of a size which depends on the size of the array (which is used for
> the << operator), it might be better to systematically allocate an array
> of the same size and only use over-allocation when the << operator
> is explicitly used. This might reduce the memory requirements of TeXmacs
> quite a lot.

Possible. I only looked at problems I saw on top in my profile, in different
use cases (loading, scrolling, ...).
Quite some time is used by the scheme lib (especially for garbage collection), 
but that's a difficult to change for the better.

> You probably may use some of the testing routines in analyze.hpp
> for this kind of purpose too.

Ah, I only looked in string.hpp. Yes, with search_forwards(), its shorter:

===============================================
--- 
/home/weidendo/SW/CVS-SOFT/texmacs/src/src/Plugins/Ghostscript/ghostscript.cpp  
    
20
03-10-24 12:43:48.000000000 +0200
+++ ./Plugins/Ghostscript/ghostscript.cpp       2004-10-14 18:31:28.021986504 
+0200
@@ -43,10 +43,16 @@
 static string
 encapsulate_postscript (string s) {
   int i, n=N(s);
-  string r;
-  for (i=0; i<n; ) {
-    if ((i<(n-8)) && (s(i,i+8)=="showpage")) {i+=8; continue;}
-    r << s[i++];
+  int last_begin = 0;
+  string r, showpage("showpage");
+  while(1) {
+    i = search_forwards(showpage, last_begin, s);
+    if (i<0) {
+      r << s(last_begin, n);
+      break;
+    }
+    r << s(last_begin, i);
+    last_begin = i+8;
   }
   return r;
 }
===============================

> >  least_upper_bound (rectangles l) {
> No problem, I can do that. Is it really here that we get the stack
> overflow? In that case, we might have to check why we do get such long
> lists of rectangles.

Originally, the stack overflow happens in requires_update() for me, but after 
I changed that function into iterative, it happened in least_upper_bound().
I'm not sure why the list gets so long. But that should be easy to find out. 
Perhaps it's better to compact the list when it is getting longer than a 
given threeshould.

> A better solution might be to use something like
>
>       if ((nr_painted%10 == 9) && dev->check_event (INPUT_EVENT)) return;
>
> I need to check though whether nr_painted cannot be increased during
> such interruptions of the painting process though.

Yes, may be better.

> Thanks a lot for all the work! I will apply your patches soon.

Actually, I'm interested in making my profiling tool and visualization better. 
And texmacs is my current victim ;-)

The tool is based on the instrumentation framework Valgrind, which is used for 
on-the-fly cache simulation and building up the call graph of unmodified 
binaries. And here, I only use the number of x86 instructions executed in 
functions to get some sorted list of hot points.
Of course, optimizations should be checked afterwards for real run time 
improvement. But in contrast to OProfile alone, I'm getting exact call 
numbers and call arcs, and in contrast to GProf this is working with shared 
libraries. So I actually see e.g. how often a loop body in XCheckMaskEvent is 
executed by looking at annotated assembler in the visualization.

Cheers,
Josef

>
> Best wishes, Joris




reply via email to

[Prev in Thread] Current Thread [Next in Thread]