qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC] migration: set cpu throttle value by worklo


From: Dr. David Alan Gilbert
Subject: Re: [Qemu-devel] [PATCH RFC] migration: set cpu throttle value by workload
Date: Fri, 24 Feb 2017 13:01:02 +0000
User-agent: Mutt/1.7.1 (2016-10-04)

* Chao Fan (address@hidden) wrote:
> On Fri, Jan 27, 2017 at 12:07:27PM +0000, Dr. David Alan Gilbert wrote:
> >* Chao Fan (address@hidden) wrote:
> >> Hi all,
> >> 
> >> This is a test for this RFC patch.
> >> 
> >> Start vm as following:
> >> cmdline="./x86_64-softmmu/qemu-system-x86_64 -m 2560 \
> >> -drive if=none,file=/nfs/img/fedora.qcow2,format=qcow2,id=foo \
> >> -netdev tap,id=hn0,queues=1 \
> >> -device virtio-net-pci,id=net-pci0,netdev=hn0 \
> >> -device virtio-blk,drive=foo \
> >> -enable-kvm -M pc -cpu host \
> >> -vnc :3 \
> >> -monitor stdio"
> >> 
> >> Continue running benchmark program named himeno[*](modified base on
> >> original source). The code is in the attach file, make it in MIDDLE.
> >> It costs much cpu calculation and memory. Then migrate the guest.
> >> The source host and target host are in one switch.
> >> 
> >> "before" means the upstream version, "after" means applying this patch.
> >> "idpr" means "inst_dirty_pages_rate", a new variable in this RFC PATCH.
> >> "count" is "dirty sync count" in "info migrate".
> >> "time" is "total time" in "info migrate".
> >> "ct pct" is "cpu throttle percentage" in "info migrate".
> >> 
> >> -------------------------------------------- 
> >> |     |    before    |        after        | 
> >> |-----|--------------|---------------------| 
> >> |count|time(s)|ct pct|time(s)| idpr |ct pct| 
> >> |-----|-------|------|-------|------|------| 
> >> |  1  |    3  |   0  |    4  |   x  |   0  | 
> >> |  2  |   53  |   0  |   53  | 14237|   0  | 
> >> |  3  |   97  |   0  |   95  |  3142|   0  | 
> >> |  4  |  109  |   0  |  105  | 11085|   0  | 
> >> |  5  |  117  |   0  |  113  | 12894|   0  | 
> >> |  6  |  125  |  20  |  121  | 13549|  67  | 
> >> |  7  |  133  |  20  |  130  | 13550|  67  | 
> >> |  8  |  141  |  20  |  136  | 13587|  67  | 
> >> |  9  |  149  |  30  |  144  | 13553|  99  | 
> >> | 10  |  156  |  30  |  152  |  1474|  99  |  
> >> | 11  |  164  |  30  |  152  |  1706|  99  |  
> >> | 12  |  172  |  40  |  153  |   0  |  99  |  
> >> | 13  |  180  |  40  |  153  |   0  |   x  |  
> >> | 14  |  188  |  40  |---------------------|
> >> | 15  |  195  |  50  |      completed      |  
> >> | 16  |  203  |  50  |                     |  
> >> | 17  |  211  |  50  |                     |  
> >> | 18  |  219  |  60  |                     |  
> >> | 19  |  227  |  60  |                     |  
> >> | 20  |  235  |  60  |                     |  
> >> | 21  |  242  |  70  |                     |  
> >> | 22  |  250  |  70  |                     |  
> >> | 23  |  258  |  70  |                     |  
> >> | 24  |  266  |  80  |                     |  
> >> | 25  |  274  |  80  |                     |  
> >> | 26  |  281  |  80  |                     |  
> >> | 27  |  289  |  90  |                     |  
> >> | 28  |  297  |  90  |                     |  
> >> | 29  |  305  |  90  |                     |  
> >> | 30  |  315  |  99  |                     |  
> >> | 31  |  320  |  99  |                     |  
> >> | 32  |  320  |  99  |                     |  
> >> | 33  |  321  |  99  |                     |  
> >> | 34  |  321  |  99  |                     |  
> >> |--------------------|                     |
> >> |    completed       |                     |
> >> --------------------------------------------
> >> 
> >> And the "info migrate" when completed:
> >> 
> >> before:
> >> capabilities: xbzrle: off rdma-pin-all: off auto-converge: on
> >> zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
> >> Migration status: completed
> >> total time: 321091 milliseconds
> >> downtime: 573 milliseconds
> >> setup: 40 milliseconds
> >> transferred ram: 10509346 kbytes
> >> throughput: 268.13 mbps
> >> remaining ram: 0 kbytes
> >> total ram: 2638664 kbytes
> >> duplicate: 362439 pages
> >> skipped: 0 pages
> >> normal: 2621414 pages
> >> normal bytes: 10485656 kbytes
> >> dirty sync count: 34
> >> 
> >> after:
> >> capabilities: xbzrle: off rdma-pin-all: off auto-converge: on
> >> zero-blocks: off compress: off events: off postcopy-ram: off x-colo: off 
> >> Migration status: completed
> >> total time: 152652 milliseconds
> >> downtime: 290 milliseconds
> >> setup: 47 milliseconds
> >> transferred ram: 4997452 kbytes
> >> throughput: 268.20 mbps
> >> remaining ram: 0 kbytes
> >> total ram: 2638664 kbytes
> >> duplicate: 359598 pages
> >> skipped: 0 pages
> >> normal: 1246136 pages
> >> normal bytes: 4984544 kbytes
> >> dirty sync count: 13
> >> 
> >> It's clear that the total time is much better(321s VS 153s).
> >> The guest began cpu throttle in the 6th dirty sync. But at this time,
> >> the dirty pages born too much in this guest. So the default
> >> cpu throttle percentage(20 and 10) is too small for this condition. I
> >> just use (inst_dirty_pages_rate / 200) to calculate the cpu throttle
> >> value. This is just an adhoc algorithm, not supported by any theories. 
> >> 
> >> Of course on the other hand, the cpu throttle percentage is higher, the
> >> guest runs more slowly. But in the result, after applying this patch,
> >> the guest spend 23s with the cpu throttle percentage is 67 (total time
> >> from 121 to 144), and 9s with cpu throttle percentage is 99 (total time
> >> from 144 to completed). But in the upstream version, the guest spend
> >> 73s with the cpu throttle percentage is 70.80.90 (total time from 21 to
> >> 30), 6s with the cpu throttle percentage is 99 (total time from 30 to
> >> completed). So I think the influence to the guest performance after my
> >> patch is fewer than the upstream version.
> >> 
> >> Any comments will be welcome.
> Hi Dave,
> Thanks for review and sorry for replying late, I was on holiday.
> >
> >Hi Chao Fan,
> >  I think with this benchmark those results do show it's better;
> >having 23s of high guest performance loss is better than 73s.
> >
> >The difficulty is as you say the ' / 200' is an adhoc algorithm,
> 
> Yes, in other conditions, ' / 200' may be not suitable.
> 
> >so for other benchmarks who knows what value we should use - higher
> >or smaller?  Your test is only on a very small VM (1 CPU, 2.5GB RAM);
> >what happens on a big VM (say 32 CPU, 256GB RAM).
> >
> >I think there are two parts to this:
> >   a) Getting a better measure of how fast the guest changes memory
> >   b) Modifying the auto-converge parameters
> >
> >  (a) would be good to do in QEMU
> >  (b) We can leave to some higher level management system outside
> >QEMU, as long as we provide (a) in the 'info migrate' status
> >for that tool to use - it means we don't have to fix that '/ 200'
> >in qemu.
> 
> Do you mean that just add an auto-converge parameter to show
> how fast the guest changes memory, then users set the cpu
> throttle value, instead of QEMU changing it automatic?

Yes, because if QEMU sets it then we have to make decisions like that '/ 200'
that will only work for some workloads and users.  Generally we leave
decisions like that to the higher levels.

> 
> >
> >I'm surprised that your code for (a) goes direct to dirty_memory[]
> >rather than using the migration_bitmap that we synchronise from;
> >that only gets updated at the end of each pass and that's what we
> >calculate the rate from - is your mechanism better than that?
> 
> Because cpu throttle makes migration faster by dcreasing the dirty
> pages born, I think cpu throttle value should be caculated according
> to how many *new dirty pages* born between two sync. So dirty_memory
> is more helpfule. If I get from migration_bitmap, some dirty pages
> will be migrated and some will be born, and also some dirty pages
> may be migrated and dirtied again. migration_bitmap can not show
> exactly how many new dirty pages born.

Yes, true, it's a little better.

Dave

> Thanks,
> Chao Fan
> 
> >
> >Dave
> >
> >
> >> [*]http://accc.riken.jp/en/supercom/himenobmt/
> >> 
> >> Thanks,
> >> 
> >> Chao FanOn Thu, Dec 29, 2016 at 05:16:19PM +0800, Chao Fan wrote:
> >> >This RFC PATCH is my demo about the new feature, here is my POC mail:
> >> >https://lists.gnu.org/archive/html/qemu-devel/2016-12/msg00646.html
> >> >
> >> >When migration_bitmap_sync executed, get the time and read bitmap to
> >> >calculate how many dirty pages born between two sync.
> >> >Use inst_dirty_pages / (time_now - time_prev) / ram_size to get
> >> >inst_dirty_pages_rate. Then map from the inst_dirty_pages_rate
> >> >to cpu throttle value. I have no idea how to map it. So I just do
> >> >that in a simple way. The mapping way is just a guess and should
> >> >be improved.
> >> >
> >> >This is just a demo. There are more methods.
> >> >1.In another file, calculate the inst_dirty_pages_rate every second
> >> >  or two seconds or another fixed time. Then set the cpu throttle
> >> >  value according to the inst_dirty_pages_rate
> >> >2.When inst_dirty_pages_rate gets a threshold, begin cpu throttle
> >> >  and set the throttle value.
> >> >
> >> >Any comments will be welcome.
> >> >
> >> >Signed-off-by: Chao Fan <address@hidden>
> >> >---
> >> > include/qemu/bitmap.h | 17 +++++++++++++++++
> >> > migration/ram.c       | 49 
> >> > +++++++++++++++++++++++++++++++++++++++++++++++++
> >> > 2 files changed, 66 insertions(+)
> >> >
> >> >diff --git a/include/qemu/bitmap.h b/include/qemu/bitmap.h
> >> >index 63ea2d0..dc99f9b 100644
> >> >--- a/include/qemu/bitmap.h
> >> >+++ b/include/qemu/bitmap.h
> >> >@@ -235,4 +235,21 @@ static inline unsigned long 
> >> >*bitmap_zero_extend(unsigned long *old,
> >> >     return new;
> >> > }
> >> > 
> >> >+static inline unsigned long bitmap_weight(const unsigned long *src, long 
> >> >nbits)
> >> >+{
> >> >+    unsigned long i, count = 0, nlong = nbits / BITS_PER_LONG;
> >> >+
> >> >+    if (small_nbits(nbits)) {
> >> >+        return hweight_long(*src & BITMAP_LAST_WORD_MASK(nbits));
> >> >+    }
> >> >+    for (i = 0; i < nlong; i++) {
> >> >+        count += hweight_long(src[i]);
> >> >+    }
> >> >+    if (nbits % BITS_PER_LONG) {
> >> >+        count += hweight_long(src[i] & BITMAP_LAST_WORD_MASK(nbits));
> >> >+    }
> >> >+
> >> >+    return count;
> >> >+}
> >> >+
> >> > #endif /* BITMAP_H */
> >> >diff --git a/migration/ram.c b/migration/ram.c
> >> >index a1c8089..f96e3e3 100644
> >> >--- a/migration/ram.c
> >> >+++ b/migration/ram.c
> >> >@@ -44,6 +44,7 @@
> >> > #include "exec/ram_addr.h"
> >> > #include "qemu/rcu_queue.h"
> >> > #include "migration/colo.h"
> >> >+#include "hw/boards.h"
> >> > 
> >> > #ifdef DEBUG_MIGRATION_RAM
> >> > #define DPRINTF(fmt, ...) \
> >> >@@ -599,6 +600,9 @@ static int64_t num_dirty_pages_period;
> >> > static uint64_t xbzrle_cache_miss_prev;
> >> > static uint64_t iterations_prev;
> >> > 
> >> >+static int64_t dirty_pages_time_prev;
> >> >+static int64_t dirty_pages_time_now;
> >> >+
> >> > static void migration_bitmap_sync_init(void)
> >> > {
> >> >     start_time = 0;
> >> >@@ -606,6 +610,49 @@ static void migration_bitmap_sync_init(void)
> >> >     num_dirty_pages_period = 0;
> >> >     xbzrle_cache_miss_prev = 0;
> >> >     iterations_prev = 0;
> >> >+
> >> >+    dirty_pages_time_prev = 0;
> >> >+    dirty_pages_time_now = 0;
> >> >+}
> >> >+
> >> >+static void migration_inst_rate(void)
> >> >+{
> >> >+    RAMBlock *block;
> >> >+    MigrationState *s = migrate_get_current();
> >> >+    int64_t inst_dirty_pages_rate, inst_dirty_pages = 0;
> >> >+    int64_t i;
> >> >+    unsigned long *num;
> >> >+    unsigned long len = 0;
> >> >+
> >> >+    dirty_pages_time_now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> >> >+    if (dirty_pages_time_prev != 0) {
> >> >+        rcu_read_lock();
> >> >+        DirtyMemoryBlocks *blocks = atomic_rcu_read(
> >> >+                         &ram_list.dirty_memory[DIRTY_MEMORY_MIGRATION]);
> >> >+        QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> >> >+            if (len == 0) {
> >> >+                len = block->offset;
> >> >+            }
> >> >+            len += block->used_length;
> >> >+        }
> >> >+        ram_addr_t idx = (len >> TARGET_PAGE_BITS) / 
> >> >DIRTY_MEMORY_BLOCK_SIZE;
> >> >+        if (((len >> TARGET_PAGE_BITS) % DIRTY_MEMORY_BLOCK_SIZE) != 0) {
> >> >+            idx++;
> >> >+        }
> >> >+        for (i = 0; i < idx; i++) {
> >> >+            num = blocks->blocks[i];
> >> >+            inst_dirty_pages += bitmap_weight(num, 
> >> >DIRTY_MEMORY_BLOCK_SIZE);
> >> >+        }
> >> >+        rcu_read_unlock();
> >> >+
> >> >+        inst_dirty_pages_rate = inst_dirty_pages * TARGET_PAGE_SIZE *
> >> >+                            1024 * 1024 * 1000 /
> >> >+                            (dirty_pages_time_now - 
> >> >dirty_pages_time_prev) /
> >> >+                            current_machine->ram_size;
> >> >+        s->parameters.cpu_throttle_initial = inst_dirty_pages_rate / 200;
> >> >+        s->parameters.cpu_throttle_increment = inst_dirty_pages_rate / 
> >> >200;
> >> >+    }
> >> >+    dirty_pages_time_prev = dirty_pages_time_now;
> >> > }
> >> > 
> >> > static void migration_bitmap_sync(void)
> >> >@@ -629,6 +676,8 @@ static void migration_bitmap_sync(void)
> >> >     trace_migration_bitmap_sync_start();
> >> >     memory_global_dirty_log_sync();
> >> > 
> >> >+    migration_inst_rate();
> >> >+
> >> >     qemu_mutex_lock(&migration_bitmap_mutex);
> >> >     rcu_read_lock();
> >> >     QLIST_FOREACH_RCU(block, &ram_list.blocks, next) {
> >> >-- 
> >> >2.9.3
> >> >
> >> 
> >> 
> >
> >> /********************************************************************
> >> 
> >>  This benchmark test program is measuring a cpu performance
> >>  of floating point operation by a Poisson equation solver.
> >> 
> >>  If you have any question, please ask me via email.
> >>  written by Ryutaro HIMENO, November 26, 2001.
> >>  Version 3.0
> >>  ----------------------------------------------
> >>  Ryutaro Himeno, Dr. of Eng.
> >>  Head of Computer Information Division,
> >>  RIKEN (The Institute of Pysical and Chemical Research)
> >>  Email : address@hidden
> >>  ---------------------------------------------------------------
> >>  You can adjust the size of this benchmark code to fit your target
> >>  computer. In that case, please chose following sets of
> >>  (mimax,mjmax,mkmax):
> >>  small : 33,33,65
> >>  small : 65,65,129
> >>  midium: 129,129,257
> >>  large : 257,257,513
> >>  ext.large: 513,513,1025
> >>  This program is to measure a computer performance in MFLOPS
> >>  by using a kernel which appears in a linear solver of pressure
> >>  Poisson eq. which appears in an incompressible Navier-Stokes solver.
> >>  A point-Jacobi method is employed in this solver as this method can 
> >>  be easyly vectrized and be parallelized.
> >>  ------------------
> >>  Finite-difference method, curvilinear coodinate system
> >>  Vectorizable and parallelizable on each grid point
> >>  No. of grid points : imax x jmax x kmax including boundaries
> >>  ------------------
> >>  A,B,C:coefficient matrix, wrk1: source term of Poisson equation
> >>  wrk2 : working area, OMEGA : relaxation parameter
> >>  BND:control variable for boundaries and objects ( = 0 or 1)
> >>  P: pressure
> >> ********************************************************************/
> >> 
> >> #include <stdio.h>
> >> 
> >> #ifdef XSMALL
> >> #define MIMAX            16
> >> #define MJMAX            16
> >> #define MKMAX            16
> >> #endif
> >> 
> >> #ifdef SSSMALL
> >> #define MIMAX            17
> >> #define MJMAX            17
> >> #define MKMAX            33
> >> #endif
> >> 
> >> #ifdef SSMALL
> >> #define MIMAX            33
> >> #define MJMAX            33
> >> #define MKMAX            65
> >> #endif
> >> 
> >> #ifdef SMALL
> >> #define MIMAX            65
> >> #define MJMAX            65
> >> #define MKMAX            129
> >> #endif
> >> 
> >> #ifdef MIDDLE
> >> #define MIMAX            129
> >> #define MJMAX            129
> >> #define MKMAX            257
> >> #endif
> >> 
> >> #ifdef LARGE
> >> #define MIMAX            257
> >> #define MJMAX            257
> >> #define MKMAX            513
> >> #endif
> >> 
> >> #ifdef ELARGE
> >> #define MIMAX            513
> >> #define MJMAX            513
> >> #define MKMAX            1025
> >> #endif
> >> 
> >> double second();
> >> float jacobi();
> >> void initmt();
> >> double fflop(int,int,int);
> >> double mflops(int,double,double);
> >> 
> >> static float  p[MIMAX][MJMAX][MKMAX];
> >> static float  a[4][MIMAX][MJMAX][MKMAX],
> >>               b[3][MIMAX][MJMAX][MKMAX],
> >>               c[3][MIMAX][MJMAX][MKMAX];
> >> static float  bnd[MIMAX][MJMAX][MKMAX];
> >> static float  wrk1[MIMAX][MJMAX][MKMAX],
> >>               wrk2[MIMAX][MJMAX][MKMAX];
> >> 
> >> static int imax, jmax, kmax;
> >> static float omega;
> >> 
> >> int
> >> main()
> >> {
> >>   int    i,j,k,nn;
> >>   float  gosa;
> >>   double cpu,cpu0,cpu1,flop,target;
> >> 
> >>   target= 3.0;
> >>   omega= 0.8;
> >>   imax = MIMAX-1;
> >>   jmax = MJMAX-1;
> >>   kmax = MKMAX-1;
> >> 
> >>   /*
> >>    *    Initializing matrixes
> >>    */
> >>   initmt();
> >>   printf("mimax = %d mjmax = %d mkmax = %d\n",MIMAX, MJMAX, MKMAX);
> >>   printf("imax = %d jmax = %d kmax =%d\n",imax,jmax,kmax);
> >> 
> >>   nn= 3;
> >>   printf(" Start rehearsal measurement process.\n");
> >>   printf(" Measure the performance in %d times.\n\n",nn);
> >> 
> >>   cpu0= second();
> >>   gosa= jacobi(nn);
> >>   cpu1= second();
> >>   cpu= cpu1 - cpu0;
> >> 
> >>   flop= fflop(imax,jmax,kmax);
> >>   
> >>   printf(" MFLOPS: %f time(s): %f %e\n\n",
> >>          mflops(nn,cpu,flop),cpu,gosa);
> >> 
> >>   nn= (int)(target/(cpu/3.0));
> >> 
> >>   printf(" Now, start the actual measurement process.\n");
> >>   printf(" The loop will be excuted in %d times\n",nn);
> >>   printf(" This will take about one minute.\n");
> >>   printf(" Wait for a while\n\n");
> >> 
> >>   /*
> >>    *    Start measuring
> >>    */
> >> while (1)
> >> {
> >>   cpu0 = second();
> >>   gosa = jacobi(nn);
> >>   cpu1 = second();
> >> 
> >>   cpu= cpu1 - cpu0;
> >>   
> >>   //printf(" Loop executed for %d times\n",nn);
> >>   //printf(" Gosa : %e \n",gosa);
> >>   printf(" MFLOPS measured : %f\tcpu : %f\n",mflops(nn,cpu,flop),cpu);
> >>   fflush(stdout);
> >>   //printf(" Score based on Pentium III 600MHz : %f\n",
> >>   //       mflops(nn,cpu,flop)/82,84);
> >> }  
> >>   return (0);
> >> }
> >> 
> >> void
> >> initmt()
> >> {
> >>    int i,j,k;
> >> 
> >>   for(i=0 ; i<MIMAX ; i++)
> >>     for(j=0 ; j<MJMAX ; j++)
> >>       for(k=0 ; k<MKMAX ; k++){
> >>         a[0][i][j][k]=0.0;
> >>         a[1][i][j][k]=0.0;
> >>         a[2][i][j][k]=0.0;
> >>         a[3][i][j][k]=0.0;
> >>         b[0][i][j][k]=0.0;
> >>         b[1][i][j][k]=0.0;
> >>         b[2][i][j][k]=0.0;
> >>         c[0][i][j][k]=0.0;
> >>         c[1][i][j][k]=0.0;
> >>         c[2][i][j][k]=0.0;
> >>         p[i][j][k]=0.0;
> >>         wrk1[i][j][k]=0.0;
> >>         bnd[i][j][k]=0.0;
> >>       }
> >> 
> >>   for(i=0 ; i<imax ; i++)
> >>     for(j=0 ; j<jmax ; j++)
> >>       for(k=0 ; k<kmax ; k++){
> >>         a[0][i][j][k]=1.0;
> >>         a[1][i][j][k]=1.0;
> >>         a[2][i][j][k]=1.0;
> >>         a[3][i][j][k]=1.0/6.0;
> >>         b[0][i][j][k]=0.0;
> >>         b[1][i][j][k]=0.0;
> >>         b[2][i][j][k]=0.0;
> >>         c[0][i][j][k]=1.0;
> >>         c[1][i][j][k]=1.0;
> >>         c[2][i][j][k]=1.0;
> >>         p[i][j][k]=(float)(i*i)/(float)((imax-1)*(imax-1));
> >>         wrk1[i][j][k]=0.0;
> >>         bnd[i][j][k]=1.0;
> >>       }
> >> }
> >> 
> >> float
> >> jacobi(int nn)
> >> {
> >>   int i,j,k,n;
> >>   float gosa, s0, ss;
> >> 
> >>   for(n=0 ; n<nn ; ++n){
> >>     gosa = 0.0;
> >> 
> >>     for(i=1 ; i<imax-1 ; i++)
> >>       for(j=1 ; j<jmax-1 ; j++)
> >>         for(k=1 ; k<kmax-1 ; k++){
> >>           s0 = a[0][i][j][k] * p[i+1][j  ][k  ]
> >>              + a[1][i][j][k] * p[i  ][j+1][k  ]
> >>              + a[2][i][j][k] * p[i  ][j  ][k+1]
> >>              + b[0][i][j][k] * ( p[i+1][j+1][k  ] - p[i+1][j-1][k  ]
> >>                               - p[i-1][j+1][k  ] + p[i-1][j-1][k  ] )
> >>              + b[1][i][j][k] * ( p[i  ][j+1][k+1] - p[i  ][j-1][k+1]
> >>                                - p[i  ][j+1][k-1] + p[i  ][j-1][k-1] )
> >>              + b[2][i][j][k] * ( p[i+1][j  ][k+1] - p[i-1][j  ][k+1]
> >>                                - p[i+1][j  ][k-1] + p[i-1][j  ][k-1] )
> >>              + c[0][i][j][k] * p[i-1][j  ][k  ]
> >>              + c[1][i][j][k] * p[i  ][j-1][k  ]
> >>              + c[2][i][j][k] * p[i  ][j  ][k-1]
> >>              + wrk1[i][j][k];
> >> 
> >>           ss = ( s0 * a[3][i][j][k] - p[i][j][k] ) * bnd[i][j][k];
> >> 
> >>           gosa+= ss*ss;
> >>           /* gosa= (gosa > ss*ss) ? a : b; */
> >> 
> >>           wrk2[i][j][k] = p[i][j][k] + omega * ss;
> >>         }
> >> 
> >>     for(i=1 ; i<imax-1 ; ++i)
> >>       for(j=1 ; j<jmax-1 ; ++j)
> >>         for(k=1 ; k<kmax-1 ; ++k)
> >>           p[i][j][k] = wrk2[i][j][k];
> >>     
> >>   } /* end n loop */
> >> 
> >>   return(gosa);
> >> }
> >> 
> >> double
> >> fflop(int mx,int my, int mz)
> >> {
> >>   return((double)(mz-2)*(double)(my-2)*(double)(mx-2)*34.0);
> >> }
> >> 
> >> double
> >> mflops(int nn,double cpu,double flop)
> >> {
> >>   return(flop/cpu*1.e-6*(double)nn);
> >> }
> >> 
> >> double
> >> second()
> >> {
> >> #include <sys/time.h>
> >> 
> >>   struct timeval tm;
> >>   double t ;
> >> 
> >>   static int base_sec = 0,base_usec = 0;
> >> 
> >>   gettimeofday(&tm, NULL);
> >>   
> >>   if(base_sec == 0 && base_usec == 0)
> >>     {
> >>       base_sec = tm.tv_sec;
> >>       base_usec = tm.tv_usec;
> >>       t = 0.0;
> >>   } else {
> >>     t = (double) (tm.tv_sec-base_sec) + 
> >>       ((double) (tm.tv_usec-base_usec))/1.0e6 ;
> >>   }
> >> 
> >>   return t ;
> >> }
> >
> >--
> >Dr. David Alan Gilbert / address@hidden / Manchester, UK
> >
> >
> 
> 
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK



reply via email to

[Prev in Thread] Current Thread [Next in Thread]