help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Odd behavior of time-domain convolutions


From: Fredrik Lingvall
Subject: Re: Odd behavior of time-domain convolutions
Date: Wed, 05 Sep 2007 19:46:15 +0000
User-agent: Thunderbird 2.0.0.6 (X11/20070807)

Quentin Spencer wrote:
> John W. Eaton wrote:
>> On  5-Sep-2007, Quentin Spencer wrote:
>>
>> | John W. Eaton wrote:
>> | > On  5-Sep-2007, Quentin Spencer wrote:
>> | >
>> | > | I've used the wrapper | > | approach before where the only
>> difference was the interface code, and | > | even then the mex code
>> was a little faster, but not by the large margins | > | you are
>> observing.
>> | >
>> | > I'm surprised by that becuase I think more work has to be done to
>> set
>> | > up and recover from calling a function with the MEX interface. 
>> Also,
>> | > in some cases copies of the data passed to and from a MEX file are
>> | > required when no copy is needed with the .oct file interface.
>> | | It may be that my problem was what David pointed out--using the
>> indexing | functions that bring in a lot of extra overhead.
>>
>> You say that the only difference is the inteface code.  Did the
>> interface code perform indexing on Array/Matrix values?
>>   
>
> Yes. I was taking double values and recasting one at a time into an
> array of integers for the low-level code. I was using the X(i,j)
> indexing to get the double values. I take it that extracting the array
> pointers and doing the low-level indexing directly would be faster.
>
> Quentin
>
I don't use x(i,j) indexing(I use x[i+j*M]). If the overhead was due to
data copy then one would expect the same
behavior for the fft-based code as well but that code runs at
approximately at the same speed both for Octave
and Matlab (which can be seen in the figs). Can gcc generate different
machine code for C and C++ code (with
the same flags)? I have attached the files (they need to be linked with
the pthread lib).

Usage:

C = conv_p(A,B,n_cpus);


A and B must have the same number of columns (B can be a vector). n_cpus
is the number of
threads to use (typically the same as the number of CPUs/Cores)

/Fredrik



/***
*
* Copyright (C) 2003-2007 Fredrik Lingvall
*
* This file is part of the DREAM Toolbox.
*
* The DREAM Toolbox is free software; you can redistribute it and/or modify 
* it under the terms of the GNU General Public License as published by the
* Free Software Foundation; either version 2, or (at your option) any
* later version.
*
* The DREAM Toolbox is distributed in the hope that it will be useful, but 
* WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* for more details.
*
* You should have received a copy of the GNU General Public License
* along with the DREAM Toolbox; see the file COPYING.  If not, write to the 
* Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA.
*
***/

#include <string.h>
#include <stdlib.h>
#include <math.h>
#include <pthread.h>
#include <signal.h>
#include "mex.h"

#include "dream_error.h"

#define TRUE 1
#define FALSE 0
#define printf mexPrintf

#define EQU 0
#define SUM 1
#define NEG 2

/***
 * Name and date (of revisions):
 * 
 * Fredrik Lingvall 2003-10-22
 * Fredrik Lingvall 2004-03-23 : Fixed so that n_cpus can be 1 or more (was a 
bug).
 * Fredrik Lingvall 2004-04-08 : Fixed a typo in comments (saft -> conv).
 * Fredrik Lingvall 2004-04-08 : Added BLAS dsdot support (not really working 
though). 
 * Fredrik Lingvall 2004-04-11 : Fixed segfault. hanged 
 *                                 for (i=0; i < (nx+ny); i++) to
 *                                 for (i=0; i < (nx+ny)-1; i++) when clearing 
data. 
 *
 * Fredrik Lingvall 2004-04-11 : Added #include <stdlib.h>.
 * Fredrik Lingvall 2006-01-24 : Removed BLAS related code
 * Fredrik Lingvall 2006-01-25 : Added check of n_cpus arg.
 * Fredrik Lingvall 2006-01-30 : Added code to not use threads at all when 
n_cpus == 1.
 * Fredrik Lingvall 2006-08-23 : Fixed bug in CTRL-C message when N=1.
 * Fredrik Lingvall 2006-11-03 : Added '=', '+=', and '-=' options.
 * Fredrik Lingvall 2006-11-09 : Added GPL info.
 * Fredrik Lingvall 2007-08-29 : Changed index type from 'int' to 
'dream_idx_type' (for 64-bit support).
 *
 ***/

// ToDo: Add check if input pars have zero length

#include "dream.h"

/***
 *
 *  Parallel (threaded) convolution.
 *
 ***/

//
// Globals
//
volatile int running;
volatile int in_place;
int mode = EQU;

//
// typedef:s
//

typedef struct
{
  dream_idx_type line_start;
  dream_idx_type line_stop;
  double *A;
  dream_idx_type A_M;
  dream_idx_type A_N;
  double *B;
  dream_idx_type B_M;
  dream_idx_type B_N;
  double *Y;
} DATA;

typedef void (*sighandler_t)(int);

//
// Function prototypes.
//
void* smp_process(void *arg);
void sighandler(int signum);
void sig_abrt_handler(int signum);
void sig_keyint_handler(int signum);
void conv( double *xr, dream_idx_type nx, double *yr, dream_idx_type ny, double 
*zr);

/***
 *
 * Thread function. 
 *
 ***/

void* smp_process(void *arg)
{
  DATA D = *(DATA *)arg;
  dream_idx_type    line_start=D.line_start, line_stop=D.line_stop, n;
  double *A = D.A, *B = D.B, *Y = D.Y;
  dream_idx_type A_M = D.A_M, B_M = D.B_M, B_N = D.B_N;

  // Do the convolution.

  for (n=line_start; n<line_stop; n++) {

    if (B_N > 1) // B is a matrix.
      conv( &A[0+n*A_M], A_M, &B[0+n*B_M], B_M, &Y[0+n*(A_M+B_M-1)]);
    else // B is a vector.
      conv( &A[0+n*A_M], A_M, B, B_M, &Y[0+n*(A_M+B_M-1)]);
    
    if (running==FALSE) {
      printf("conv_p: thread for column %d -> %d bailing 
out!\n",line_start+1,line_stop);
      break;
    }
 
  }
  return(NULL);
}


/***
 *
 * Convolution of two vectors.
 *
 ***/

// The -funroll-loops -fprefetch-loop-arrays gcc-3.x flags makes conv
// run two times slower!

void conv( double *xr, dream_idx_type nx, double *yr, dream_idx_type ny, double 
*zr)
{
  dream_idx_type i,j;
  
  // Don't clear if in-place '+=' and '-=' modes. 
  if (in_place == FALSE || mode == EQU) {
    for (i=0; i < (nx+ny)-1; i++) {
      zr[i]=0.0;
    }
  }

  if (in_place == FALSE) { // Normal mode.
    
    for(i=0; i<nx; i++) {
      for(j=0; j<ny; j++) {
        zr[i+j] += xr[i] * yr[j];
      }
    }
  }
  else { // in-place 

    switch (mode) {
      
    case EQU:
    case SUM:
      // in-place '=' and '+=' operation.
      for(i=0; i<nx; i++) {
        for(j=0; j<ny; j++) {
          zr[i+j] += xr[i] * yr[j];
        }
      }
      break;
      
    case NEG:
      // in-place '-=' operation.
      for(i=0; i<nx; i++) {
        for(j=0; j<ny; j++) {
          zr[i+j] -= xr[i] * yr[j];
        }
      }
      break;
      
    default:
      // in-place '=' operation.
      for(i=0; i<nx; i++) {
        for(j=0; j<ny; j++) {
          zr[i+j] += xr[i] * yr[j];
        }
      }
      break;
    } // switch
  } // in-place
}

/***
 *
 * Signal handlers.
 *
 ***/

void sighandler(int signum) {
  //printf("Caught signal SIGTERM.\n");
  running = FALSE;
}

void sig_abrt_handler(int signum) {
  //printf("Caught signal SIGABRT.\n");
}

void sig_keyint_handler(int signum) {
  //printf("Caught signal SIGINT.\n");
}

/***
 * 
 * conv_p.c - Matlab (MEX) gateway function for CONV.
 *
 ***/

void  mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
  double *A,*B, *Y; 
  sighandler_t   old_handler, old_handler_abrt, old_handler_keyint;
  pthread_t *threads;
  dream_idx_type line_start, line_stop, A_M, A_N, B_M, B_N, n;
  int    thread_n, N, err;    
  char   *the_str = NULL;
  int    buflen, is_set = FALSE;
  void   *retval;
  DATA   *D;

  in_place = FALSE;

  // Check for proper inputs arguments.

  switch (nrhs) {
    
  case 0:
  case 1:
  case 2:
    mexErrMsgTxt("conv_p requires 3 to 5 input arguments!");
    break;
    
  case 3:
    if (nlhs > 1) {
      mexErrMsgTxt("Too many output arguments for conv_p!");
    }    
    break;

  case 4:
    if (nlhs > 0) {
      mexErrMsgTxt("No output arguments required for conv_p in in-place 
operating mode!");
    }   
    if (mxIsChar(prhs[3])) {
      mexErrMsgTxt("Arg 4 must be a matrix (not a string)");
    }
    break;
  
  case 5:
    if (mxIsChar(prhs[4])) { // 5th arg is a mode string.
      //the_str = (char*) mxGetChars(prhs[4]); 
      buflen = mxGetM(prhs[4])*mxGetN(prhs[4])+1;
      the_str = (char*) malloc(buflen * sizeof(char));
      mxGetString(prhs[4], the_str, buflen); // Obsolete in Matlab 7.x ?
      
      // Valid strings are:
      //  '='  : In-place replace mode.
      //  '+=' : In-place add mode.
      //  '-=' : In-place sub mode.
      
      is_set = FALSE;
      
      if (strcmp(the_str,"=") == 0) {
        mode = EQU; 
        is_set = TRUE;
      }
      
      if (strcmp(the_str,"+=") == 0) {
        mode = SUM; 
        is_set = TRUE;
      }
      
      if (strcmp(the_str,"-=") == 0) {
        mode = NEG; 
        is_set = TRUE;
      }
      
      if (is_set == FALSE)
        mexErrMsgTxt("Non-valid string in arg 5!");
      
    }
    free(the_str);
    break;
    
  default:
    mexErrMsgTxt("conv_p requires 3 to 5 input arguments!");
    break;
  }
  

  
  A_M = mxGetM(prhs[0]);
  A_N = mxGetN(prhs[0]);
  A = mxGetPr(prhs[0]);

  B_M = mxGetM(prhs[1]);
  B_N = mxGetN(prhs[1]);
  B = mxGetPr(prhs[1]);

  // Check that arg 2. 
  if ( B_M != 1 && B_N !=1 && B_N != A_N)
    mexErrMsgTxt("Argument 2 must be a vector or a matrix with the same number 
of rows as arg 1!");

  if (  B_M == 1 || B_N == 1 ) { // B is a vector.
    B_M = B_M*B_N;
    B_N = 1;
  }
    
  //
  // Number of threads.
  //

  // Check that arg 3 is a scalar.
  if ( mxGetM(prhs[2]) * mxGetN(prhs[2]) !=1)
    mexErrMsgTxt("Argument 3 must be a scalar!");
  
  N = (int) mxGetScalar(prhs[2]);
  if (N < 1)
    mexErrMsgTxt("Argument 3 must be larger or equal to 1 (min number of 
CPUs)!");

  // Check n_cpus argument.
  if (N > A_N) {
    // Add a -v verbose flag to display stuff like this!
    //mexPrintf("Warning: n_cpus is larger then number of columns in first 
arg.\n");
    //mexPrintf("         Setting n_cpus = # cols in 1st arg!\n");
    N = A_N;
  }

  //
  // Create/get output matrix.
  //

  if (nrhs == 3) { // Normal mode.
    in_place = FALSE;
    plhs[0] = mxCreateDoubleMatrix(A_M+B_M-1, A_N, mxREAL);
    Y = mxGetPr(plhs[0]);
  } else { // in-place mode.
    in_place = TRUE;

    if ( mxGetM(prhs[3]) != A_M+B_M-1)
      mexErrMsgTxt("Wrong number of rows in argument 4!");

    if ( mxGetN(prhs[3]) != A_N)
      mexErrMsgTxt("Wrong number of columns in argument 4!");

    Y = mxGetPr(prhs[3]);
  }


  //
  // Register signal handlers.
  //

  if ((old_handler = signal(SIGTERM, &sighandler)) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }

  if ((old_handler_abrt=signal(SIGABRT, &sighandler)) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }
  
  if ((old_handler_keyint=signal(SIGINT, &sighandler)) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }
  
  //
  // Call the CONV subroutine.
  //

  running = TRUE;
  
  if (N>1) { // Use threads

    // Allocate local data.
    D = (DATA*) malloc(N*sizeof(DATA));
    if (!D)
      mexErrMsgTxt("Failed to allocate memory for thread data!");
    
    // Allocate mem for the threads.
    threads = (pthread_t*) malloc(N*sizeof(pthread_t));
    
    if (!threads)
      mexErrMsgTxt("Failed to allocate memory for threads!");
    
    for (thread_n = 0; thread_n < N; thread_n++) {
      
      line_start = thread_n * A_N/N;
      line_stop =  (thread_n+1) * A_N/N;
      
      // Init local data.
      D[thread_n].line_start = line_start; // Local start index;
      D[thread_n].line_stop = line_stop; // Local stop index;
      D[thread_n].A = A;
      D[thread_n].A_M = A_M;
      D[thread_n].A_N = A_N;
      D[thread_n].B = B;
      D[thread_n].B_M = B_M;
      D[thread_n].B_N = B_N;
      D[thread_n].Y = Y;
      
      // Starts the threads.
      err = pthread_create(&threads[thread_n], NULL, smp_process, &D[thread_n]);
      if (err != 0)
        mexErrMsgTxt("Error when creating a new thread!\n");
    }
    
    // Wait for all threads to finnish.
    for (thread_n = 0; thread_n < N; thread_n++) {
      err = pthread_join(threads[thread_n], &retval);
      if (err != 0)
        mexErrMsgTxt("Error when joining a thread!\n");
    }
    
    
    // Free memory.
    if (D) {
      free((void*) D);
    }

    if (threads) {
      free((void*) threads);
    }
    

  } else{ // Do not use threads
      
    if (B_N > 1) { // B is a matrix.
      for (n=0; n<A_N; n++) {
        conv( &A[0+n*A_M], A_M, &B[0+n*B_M], B_M, &Y[0+n*(A_M+B_M-1)]);
        if (running==FALSE) {
          printf("conv_p: bailing out!\n");
          break;
        }
      }
    } else {// B is a vector.
      for (n=0; n<A_N; n++) {
        
        conv( &A[0+n*A_M], A_M, B, B_M, &Y[0+n*(A_M+B_M-1)]);
        
        if (running==FALSE) {
          printf("conv_p: bailing out!\n");
          break;
        }
      }
    }
  }

  //
  // Restore old signal handlers.
  //

  if (signal(SIGTERM, old_handler) == SIG_ERR) {
    printf("Couldn't register old signal handler.\n");
  }
   
  if (signal(SIGABRT,  old_handler_abrt) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }
  
  if (signal(SIGINT, old_handler_keyint) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }

  if (!running) {
    // This may kill Matlab 7 release 14 (bug in Matlab).
    mexErrMsgTxt("CTRL-C pressed!\n"); // Bail out.
  }

  return;
}
/***
*
* Copyright (C) 2006-2007 Fredrik Lingvall
*
* This file is part of the DREAM Toolbox.
*
* The DREAM Toolbox is free software; you can redistribute it and/or modify 
* it under the terms of the GNU General Public License as published by the
* Free Software Foundation; either version 2, or (at your option) any
* later version.
*
* The DREAM Toolbox is distributed in the hope that it will be useful, but 
* WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License
* for more details.
*
* You should have received a copy of the GNU General Public License
* along with the DREAM Toolbox; see the file COPYING.  If not, write to the 
* Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
* 02110-1301, USA.
*
***/

#include <string.h>
#include <stdlib.h>
#include <math.h>
#include <pthread.h>
#include <signal.h>


//#include "dream_error.h"

//
// Octave headers.
//

#include <octave/oct.h>

#include <octave/config.h>

#include <iostream>
using namespace std;

#include <octave/defun-dld.h>
#include <octave/error.h>
#include <octave/oct-obj.h>
#include <octave/pager.h>
#include <octave/symtab.h>
#include <octave/variables.h>

#define TRUE 1
#define FALSE 0

#define EQU 0
#define SUM 1
#define NEG 2

/***
 * Name and date (of revisions):
 * 
 * Fredrik Lingvall 2006-08-23 : File created.
 * Fredrik Lingvall 2006-11-03 : Added '=', '+=', and '-=' options.
 * Fredrik Lingvall 2006-11-09 : Added GPL info.
 * Fredrik Lingvall 2007-08-08 : Fixed a @copyright bug.
 * Fredrik Lingvall 2007-08-29 : Changed index type from 'int' to 
'dream_idx_type' (for 64-bit support).
 * Fredrik Lingvall 2007-08-30 : Moved oct_retval.append(..); and made changes 
to avoid that Ymat not goes out of scope.
 *
 ***/

#include "dream.h"

/***
 *
 *  Parallel (threaded) convolution.
 *
 ***/

//
// Globals
//
volatile int running;
volatile int in_place;
int mode = EQU;

//
// typedef:s
//

typedef struct
{
  dream_idx_type line_start;
  dream_idx_type line_stop;
  double *A;
  dream_idx_type A_M;
  dream_idx_type A_N;
  double *B;
  dream_idx_type B_M;
  dream_idx_type B_N;
  double *Y;
} DATA;

typedef void (*sighandler_t)(int);

//
// Function prototypes.
//
void* smp_process(void *arg);
void sighandler(int signum);
void sig_abrt_handler(int signum);
void sig_keyint_handler(int signum);
void conv( double *xr, dream_idx_type nx, double *yr, dream_idx_type ny, double 
*zr);

/***
 *
 * Thread function. 
 *
 ***/

void* smp_process(void *arg)
{
  DATA D = *(DATA *)arg;
  dream_idx_type    line_start=D.line_start, line_stop=D.line_stop, n;
  double *A = D.A, *B = D.B, *Y = D.Y;
  dream_idx_type A_M = D.A_M, B_M = D.B_M, B_N = D.B_N;

  // Do the convolution.

  for (n=line_start; n<line_stop; n++) {

    if (B_N > 1) // B is a matrix.
      conv( &A[0+n*A_M], A_M, &B[0+n*B_M], B_M, &Y[0+n*(A_M+B_M-1)]);
    else // B is a vector.
      conv( &A[0+n*A_M], A_M, B, B_M, &Y[0+n*(A_M+B_M-1)]);
    
    if (running==FALSE) {
      printf("conv_p: thread for column %d -> %d bailing 
out!\n",line_start+1,line_stop);
      break;
      //return(NULL);
    }
 
  }
  return(NULL);
}


/***
 *
 * Convolution of two vectors.
 *
 ***/

void conv( double *xr, dream_idx_type nx, double *yr, dream_idx_type ny, double 
*zr)
{
  dream_idx_type i,j;

  // Don't clear if in-place '+=' and '-=' modes. 
  if (in_place == FALSE || mode == EQU) {
    for (i=0; i < (nx+ny)-1; i++) {
      zr[i]=0.0;
    }
  }

  if (in_place == FALSE) { // Normal mode.
    
    for(i=0; i<nx; i++) {
      for(j=0; j<ny; j++) {
        zr[i+j] += xr[i] * yr[j];
      }
    }
  }
  else { // in-place 

    switch (mode) {
      
    case EQU:
    case SUM:
      // in-place '=' and '+=' operation.
      for(i=0; i<nx; i++) {
        for(j=0; j<ny; j++) {
          zr[i+j] += xr[i] * yr[j];
        }
      }
      break;
      
    case NEG:
      // in-place '-=' operation.
      for(i=0; i<nx; i++) {
        for(j=0; j<ny; j++) {
          zr[i+j] -= xr[i] * yr[j];
        }
      }
      break;
      
    default:
      // in-place '=' operation.
      for(i=0; i<nx; i++) {
        for(j=0; j<ny; j++) {
          zr[i+j] += xr[i] * yr[j];
        }
      }
      break;
    } // switch
  } // in-place
}


/***
 *
 * Signal handlers.
 *
 ***/

void sighandler(int signum) {
  //printf("Caught signal SIGTERM.\n");
  running = FALSE;
}

void sig_abrt_handler(int signum) {
  //printf("Caught signal SIGABRT.\n");
}

void sig_keyint_handler(int signum) {
  //printf("Caught signal SIGINT.\n");
}

/***
 * 
 * conv_p.c -  Octave (oct) gateway function for CONV_P.
 *
 ***/

DEFUN_DLD (conv_p, args, nlhs,
           "-*- texinfo -*-\n\
@deftypefn {Loadable Function} {}  [Y] = conv_p(A, B, n_cpus).\n\
\n\
CONV_P Computes one dimensional convolutions of the columns in the matrix A and 
the matrix (or vector) B.\n\
\n\
Input parameters:\n\
\n\
@table @samp\n\
@item A\n\
An MxN matrix.\n\
@item B\n\
A KxN matrix or a K-length vector. If B is a vector each column in A is 
convolved with the vector B.\n\
@item n_cpus\n\
Number of threads to use. n_cpus must be greater or equal to 1.\n\
@end table\n\
\n\
Output parameter:\n\
\n\
@table @samp\n\
@item Y\n\
The (M+K-1)xN output matrix.\n\
@end table\n\
\n\
@code{conv_p} is an oct-function that is a part of the DREAM Toolbox available 
at\n\
 @url{http://www.signal.uu.se/Toolbox/dream/}.\n\
\n\
@copyright{2007-08-08 Fredrik Lingvall}.\n\
@seealso {fftconv_p, conv}\n\
@end deftypefn")
{
  double *A,*B, *Y; 
  sighandler_t   old_handler, old_handler_abrt, old_handler_keyint;
  pthread_t *threads;
  dream_idx_type line_start, line_stop, A_M, A_N, B_M, B_N, n;
  int    thread_n, N, err;    
  char   *the_str = NULL;
  int    buflen, is_set = FALSE;
  void   *retval;
  DATA   *D;
  octave_value_list oct_retval; 

  in_place = FALSE;

  int nrhs = args.length ();

  // Check for proper inputs arguments.

  switch (nrhs) {
    
  case 0:
  case 1:
  case 2:
    error("conv_p requires 3 to 5 input arguments!");
    return oct_retval;
    break;
    
  case 3:
    if (nlhs > 1) {
      error("Too many output arguments for conv_p!");
      return oct_retval;
    }    
    break;

  case 4:
    if (nlhs > 0) {
      error("No output arguments required for conv_p in in-place operating 
mode!");
      return oct_retval;
    }    
    break;

  case 5:
    if ( args(4).is_string() ) { // 5:th arg is a fftw wisdom string.
      std::string strin = args(4).string_value(); 
      buflen = strin.length();
      the_str = (char*) malloc(buflen * sizeof(char));
      for ( n=0; n<buflen; n++ ) {
        the_str[n] = strin[n];
      }

      // Valid strings are:
      //  '='  : In-place replace mode.
      //  '+=' : In-place add mode.
      //  '-=' : In-place sub mode.
      
      is_set = FALSE;
      
      if (strcmp(the_str,"=") == 0) {
        mode = EQU; 
        is_set = TRUE;
      }
      
      if (strcmp(the_str,"+=") == 0) {
        mode = SUM; 
        is_set = TRUE;
      }
      
      if (strcmp(the_str,"-=") == 0) {
        mode = NEG; 
        is_set = TRUE;
      }
      
      if (is_set == FALSE) {
        error("Non-valid string in arg 5!");
        return oct_retval;
      }
    }
    free(the_str);
    break;

  default:
    error("conv_p requires 3 to 5 input arguments!");
    return oct_retval;
    break;
  }



  const Matrix tmp = args(0).matrix_value();
  A_M = tmp.rows();
  A_N = tmp.cols();
  A = (double*) tmp.fortran_vec();

  const Matrix tmp2 = args(1).matrix_value();
  B_M = tmp2.rows();
  B_N = tmp2.cols();
  B = (double*) tmp2.fortran_vec();

  // Check that arg 2. 
  if ( B_M != 1 && B_N !=1 && B_N != A_N) {
    error("Argument 2 must be a vector or a matrix with the same number of rows 
as arg 1!");
    return oct_retval;
  }

  if (  B_M == 1 || B_N == 1 ) { // B is a vector.
    B_M = B_M*B_N;
    B_N = 1;
  }
    
  //printf("1: %d %d %d %d\n",A_M,A_N,B_M,B_N);

  //
  // Number of threads.
  //


  // Check that arg 3 is a scalar.
  if (  args(2).matrix_value().rows()  * args(2).matrix_value().cols() !=1) {
    error("Argument 3 must be a scalar!");
    return oct_retval;
  }

  N = (int) args(2).matrix_value().fortran_vec()[0];
  if (N < 1) {
    error("Argument 3 must be larger or equal to 1 (min number of CPUs)!");
    return oct_retval;
  }

  // Check n_cpus argument.
  if (N > A_N) {
    // Add a -v verbose flag to display stuff like this!
    //mexPrintf("Warning: n_cpus is larger then number of columns in first 
arg.\n");
    //mexPrintf("         Setting n_cpus = # cols in 1st arg!\n");
    N = A_N;
  }

  //
  // Register signal handlers.
  //

  if ((old_handler = signal(SIGTERM, &sighandler)) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }

  if ((old_handler_abrt=signal(SIGABRT, &sighandler)) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }
  
  if ((old_handler_keyint=signal(SIGINT, &sighandler)) == SIG_ERR) {
    printf("Couldn't register signal handler.\n");
  }


  if (nrhs == 3) { // Test for in-place/normal mode.

    //
    //
    // Normal (non in-place) mode.
    //
    //

    in_place = FALSE;

    //
    // Create/get output matrix.
    //
    
    Matrix Ymat(A_M+B_M-1, A_N);
    Y = Ymat.fortran_vec();

    //
    // Call the CONV subroutine.
    //
    
    running = TRUE;
  
    if (N>1) { // Use threads
      
      // Allocate local data.
      D = (DATA*) malloc(N*sizeof(DATA));
      if (!D) {
        error("Failed to allocate memory for thread data!");
        return oct_retval;
      }
      
      // Allocate mem for the threads.
      threads = (pthread_t*) malloc(N*sizeof(pthread_t));
      
      if (!threads) {
        error("Failed to allocate memory for threads!");
        return oct_retval;
      }
      
      for (thread_n = 0; thread_n < N; thread_n++) {
        
        line_start = thread_n * A_N/N;
        line_stop =  (thread_n+1) * A_N/N;
        
        // Init local data.
        D[thread_n].line_start = line_start; // Local start index;
        D[thread_n].line_stop = line_stop; // Local stop index;
        D[thread_n].A = A;
        D[thread_n].A_M = A_M;
        D[thread_n].A_N = A_N;
        D[thread_n].B = B;
        D[thread_n].B_M = B_M;
        D[thread_n].B_N = B_N;
        D[thread_n].Y = Y;
        
        // Starts the threads.
        err = pthread_create(&threads[thread_n], NULL, smp_process, 
&D[thread_n]);
        if (err != 0)
          error("Error when creating a new thread!\n");
      }
      
      // Wait for all threads to finnish.
      for (thread_n = 0; thread_n < N; thread_n++) {
        err = pthread_join(threads[thread_n], &retval);
        if (err != 0) {
          error("Error when joining a thread!\n");
          return oct_retval;
        }
      }
      
      
      // Free memory.
      if (D) {
        free((void*) D);
      }
      
      if (threads) {
        free((void*) threads);
      }
      
      
    } else{ // Do not use threads
      
      if (B_N > 1) { // B is a matrix.
        for (n=0; n<A_N; n++) {
          conv( &A[0+n*A_M], A_M, &B[0+n*B_M], B_M, &Y[0+n*(A_M+B_M-1)]);
          if (running==FALSE) {
            printf("conv_p: bailing out!\n");
            break;
          }
        }
      } else {// B is a vector.
        for (n=0; n<A_N; n++) {
          
          //printf("3: %p %p %p %f\n",A,B,Y,Y[0]);
          
          conv( &A[0+n*A_M], A_M, B, B_M, &Y[0+n*(A_M+B_M-1)]);
          
          //printf("4: %p %p %p %f\n",A,B,Y,Y[0]);
          
          if (running==FALSE) {
            printf("conv_p: bailing out!\n");
            break;
          }
        }
      }
    }
    
    //
    // Restore old signal handlers.
    //
    
    if (signal(SIGTERM, old_handler) == SIG_ERR) {
      printf("Couldn't register old signal handler.\n");
    }
    
    if (signal(SIGABRT,  old_handler_abrt) == SIG_ERR) {
      printf("Couldn't register signal handler.\n");
    }
    
    if (signal(SIGINT, old_handler_keyint) == SIG_ERR) {
      printf("Couldn't register signal handler.\n");
    }
    
    if (!running) {
      error("CTRL-C pressed!\n"); // Bail out.
      return oct_retval;
    }
    
    oct_retval.append(Ymat);
    return oct_retval;
    
  } else { 
    
    //
    //
    // In-place mode.
    //
    //

    in_place = TRUE;

    if (  args(3).matrix_value().rows() != A_M+B_M-1) {
      error("Wrong number of rows in argument 4!");
      return oct_retval;
    }

    if (  args(3).matrix_value().cols() != A_N) {
      error("Wrong number of columns in argument 4!");
      return oct_retval;
    }
    
    const Matrix Ytmp = args(3).matrix_value();

    Y = (double*) Ytmp.fortran_vec();

    //
    // Call the CONV subroutine.
    //

    running = TRUE;
  
    if (N>1) { // Use threads

      // Allocate local data.
      D = (DATA*) malloc(N*sizeof(DATA));
      if (!D) {
        error("Failed to allocate memory for thread data!");
        return oct_retval;
      }
    
      // Allocate mem for the threads.
      threads = (pthread_t*) malloc(N*sizeof(pthread_t));
    
      if (!threads) {
        error("Failed to allocate memory for threads!");
        return oct_retval;
      }

      for (thread_n = 0; thread_n < N; thread_n++) {
      
        line_start = thread_n * A_N/N;
        line_stop =  (thread_n+1) * A_N/N;
      
        // Init local data.
        D[thread_n].line_start = line_start; // Local start index;
        D[thread_n].line_stop = line_stop; // Local stop index;
        D[thread_n].A = A;
        D[thread_n].A_M = A_M;
        D[thread_n].A_N = A_N;
        D[thread_n].B = B;
        D[thread_n].B_M = B_M;
        D[thread_n].B_N = B_N;
        D[thread_n].Y = Y;
      
        // Starts the threads.
        err = pthread_create(&threads[thread_n], NULL, smp_process, 
&D[thread_n]);
        if (err != 0)
          error("Error when creating a new thread!\n");
      }
    
      // Wait for all threads to finnish.
      for (thread_n = 0; thread_n < N; thread_n++) {
        err = pthread_join(threads[thread_n], &retval);
        if (err != 0) {
          error("Error when joining a thread!\n");
          return oct_retval;
        }
      }
    
    
      // Free memory.
      if (D) {
        free((void*) D);
      }

      if (threads) {
        free((void*) threads);
      }
    

    } else{ // Do not use threads
      
      if (B_N > 1) { // B is a matrix.
        for (n=0; n<A_N; n++) {
          conv( &A[0+n*A_M], A_M, &B[0+n*B_M], B_M, &Y[0+n*(A_M+B_M-1)]);
          if (running==FALSE) {
            printf("conv_p: bailing out!\n");
            break;
          }
        }
      } else {// B is a vector.
        for (n=0; n<A_N; n++) {
        
          //printf("3: %p %p %p %f\n",A,B,Y,Y[0]);

          conv( &A[0+n*A_M], A_M, B, B_M, &Y[0+n*(A_M+B_M-1)]);

          //printf("4: %p %p %p %f\n",A,B,Y,Y[0]);

          if (running==FALSE) {
            printf("conv_p: bailing out!\n");
            break;
          }
        }
      }
    }

    //
    // Restore old signal handlers.
    //

    if (signal(SIGTERM, old_handler) == SIG_ERR) {
      printf("Couldn't register old signal handler.\n");
    }
   
    if (signal(SIGABRT,  old_handler_abrt) == SIG_ERR) {
      printf("Couldn't register signal handler.\n");
    }
  
    if (signal(SIGINT, old_handler_keyint) == SIG_ERR) {
      printf("Couldn't register signal handler.\n");
    }

    if (!running) {
      error("CTRL-C pressed!\n"); // Bail out.
      return oct_retval;
    }
  
    return oct_retval;
  }
  
}

reply via email to

[Prev in Thread] Current Thread [Next in Thread]