gnuastro-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[gnuastro-commits] master bb9b3fd: Library (data.h): arrays first loaded


From: Mohammad Akhlaghi
Subject: [gnuastro-commits] master bb9b3fd: Library (data.h): arrays first loaded in RAM, then as mmap'd files
Date: Thu, 22 Oct 2020 16:01:17 -0400 (EDT)

branch: master
commit bb9b3fd438ad06716d8e17956736f408f2422859
Author: Mohammad Akhlaghi <mohammad@akhlaghi.org>
Commit: Mohammad Akhlaghi <mohammad@akhlaghi.org>

    Library (data.h): arrays first loaded in RAM, then as mmap'd files
    
    Until now, an internal array was only allocated in the RAM when its size
    was smaller (in bytes) than the value given to the '--minmapsize'
    option. But this was annoying/buggy when the system has enough RAM to keep
    large files.
    
    With this commit, all Gnuastro programs will first attempt to write the
    array in RAM, only when it fails (there is no more RAM left), will they use
    a memory-mapped file (which can dramatically slow down the program). This
    has been done by setting the default value to '--minmapsize' to an
    extremely large value, but adding a check after attempting to allocate the
    file. If the allocation fails, a memory-mapped file will be created.
    
    In the process, I also noticed that the arithmetic multi-operand operators
    for quantile and sigma-clipping weren't actually freeing one of their
    internal arrays, causing some wasted program memory. This has been
    corrected.
    
    This issue was found thanks to the Euclid OU-MER team and in particular
    on data provided by Martin Kuemmel.
---
 NEWS                         |  14 ++++
 THANKS                       |   1 +
 bin/gnuastro.conf            |  11 ++-
 doc/announce-acknowledge.txt |   1 +
 doc/gnuastro.texi            | 158 ++++++++++++++++++++++++++++++++------
 lib/arithmetic.c             |   2 +
 lib/data.c                   | 175 +++++++++++++++++++++++++++++++++++++++++--
 lib/pointer.c                |   6 +-
 8 files changed, 336 insertions(+), 32 deletions(-)

diff --git a/NEWS b/NEWS
index bab7e55..08d2012 100644
--- a/NEWS
+++ b/NEWS
@@ -10,6 +10,9 @@ See the end of the file for license conditions.
   Book:
    - Tutorial on "Detecting large extended targets" improved with better
      NoiseChisel configuration, and more clear description.
+   - New sub-section on "Memory management" in the "Common program
+     behavior" chapter. It fully describes how to optimally deal with large
+     datasets that may exceed your system's RAM.
 
   New program:
    - Query ('astquery') is a new program to allow easy submission of
@@ -79,6 +82,17 @@ See the end of the file for license conditions.
 
 ** Changed features
 
+  All programs:
+   - Memory management: Until now, an internal array was only allocated in
+     the RAM when its size was smaller (in bytes) than the value given to
+     the '--minmapsize' option. But this was annoying/buggy when the system
+     has enough RAM to keep large files. From this version, all Gnuastro
+     programs will first attempt to write the array in RAM, only when it
+     fails (there is no more RAM left), will they use a memory-mapped file
+     (which can dramatically slow down the program). Please see the newly
+     added "Memory management" section of the book for a complete
+     explanation of Gnuastro's new memory management strategy.
+
   Table:
    - Column arithmetic operators 'degree-to-ra' and 'degree-to-dec' will
      return the sexagesimal format of '_h_m_s' and '_d_m_s'
diff --git a/THANKS b/THANKS
index 69d11f6..d3dacc6 100644
--- a/THANKS
+++ b/THANKS
@@ -55,6 +55,7 @@ support in Gnuastro. The list is ordered alphabetically (by 
family name).
     Mohammad-Reza Khellat                moha.khe@gmail.com
     Johan Knapen                         jhk@iac.es
     Geoffry Krouchi                      geoffrey.krouchi@etu.univ-lyon1.fr
+    Martin Kuemmel                       mkuemmel@usm.lmu.de
     Floriane Leclercq                    floriane.leclercq@univ-lyon1.fr
     Alan Lefor                           alefor@astr.tohoku.ac.jp
     Sebastián Luna Valero                sluna@iaa.es
diff --git a/bin/gnuastro.conf b/bin/gnuastro.conf
index 7c88e3d..40d5f8f 100644
--- a/bin/gnuastro.conf
+++ b/bin/gnuastro.conf
@@ -39,4 +39,13 @@
 
 # Operating mode
  quietmmap        0
- minmapsize       2000000000
+
+ # The default 'minmapsize' is set to the maximum possible value for signed
+ # 64-bit integers (half the full logical size of a 64-bit system, which is
+ # larger than 9 x 10^18 bytes). Effectively, this means that forced
+ # memory-mapping will be disabled. Therefore memory-mapping will only
+ # happen when memory cannot be allocated with the RAM (for any reason,
+ # mainly not having enough space). On 32-bit systems, this will
+ # automatically be read by the C library as the largest possible memory in
+ # those systems (~4.3 x 10^9 bytes).
+ minmapsize       9223372036854775807
diff --git a/doc/announce-acknowledge.txt b/doc/announce-acknowledge.txt
index 2b707cb..16c9ed7 100644
--- a/doc/announce-acknowledge.txt
+++ b/doc/announce-acknowledge.txt
@@ -1,5 +1,6 @@
 Alphabetically ordered list to acknowledge in the next release.
 
+Martin Kuemmel
 Sebastian Luna Valero
 Samane Raji
 Joanna Sakowska
diff --git a/doc/gnuastro.texi b/doc/gnuastro.texi
index 9ba81b3..51712f8 100644
--- a/doc/gnuastro.texi
+++ b/doc/gnuastro.texi
@@ -322,6 +322,7 @@ Common program behavior
 * Installed scripts::           Installed Bash scripts, not compiled programs.
 * Multi-threaded operations::   How threads are managed in Gnuastro.
 * Numeric data types::          Different types and how to specify them.
+* Memory management::           How memory is allocated (in RAM or HDD/SSD).
 * Tables::                      Recognized table formats.
 * Tessellation::                Tile the dataset into non-overlapping bins.
 * Automatic output::            About automatic output names.
@@ -6177,6 +6178,7 @@ When the output is a FITS file, all the programs also 
store some very useful inf
 * Installed scripts::           Installed Bash scripts, not compiled programs.
 * Multi-threaded operations::   How threads are managed in Gnuastro.
 * Numeric data types::          Different types and how to specify them.
+* Memory management::           How memory is allocated (in RAM or HDD/SSD).
 * Tables::                      Recognized table formats.
 * Tessellation::                Tile the dataset into non-overlapping bins.
 * Automatic output::            About automatic output names.
@@ -6535,29 +6537,16 @@ Also, if they are irrelevant for a program, these 
options will not display in th
 @table @option
 
 @item --minmapsize=INT
-The minimum size (in bytes) to store the contents of each main processing 
array of a program as a file (on the non-volatile HDD/SSD), not in RAM.
-This can be very useful when you have limited RAM, but need to process large 
datasets which can be very memory intensive.
-In such scenarios, without this option, the program will crash.
-
-A random filename is assigned to the array.
-This file will keep the contents of the array as long as it is necessary and 
the program will delete it as soon as its not necessary any more.
-
-If the @file{.gnuastro_mmap} directory exists and is writable, then the random 
file will be placed in there.
-Otherwise, the randomly named file will be directly written in the current 
directory with the @file{.gnuastro_mmap_} prefix.
-
-By default, the name of the created file, and its size (in bytes) is printed 
by the program when it is created and later, when its deleted/freed.
-These messages are useful to the user who has enough RAM, but has forgot to 
increase the value to @code{--minmapsize} (this is often the case).
-To suppress/disable such messages, use the @code{--quietmmap} option.
-
-When this option has a value of @code{0} (zero, strongly discouraged, see box 
below), all arrays that use this feature in a program will actually be placed 
in a file (not in RAM).
-When this option is larger than all the input datasets, all arrays will be 
definitely allocated in RAM and the program will run MUCH faster.
+The minimum size (in bytes) to memory-map a processing/internal array as a 
file (on the non-volatile HDD/SSD), and not use the system's RAM.
+Before using this option, please read @ref{Memory management}.
+By default processing arrays will only be memory-mapped to a file when the RAM 
is full.
+With this option, you can force the memory-mapping, even when there is enough 
RAM.
+To ensure this default behavior, the pre-defined value to this option is an 
extremely large value (larger than any existing RAM).
 
 Please note that using a non-volatile file (in the HDD/SDD) instead of RAM can 
significantly increase the program's running time, especially on HDDs (where 
read/write is slower).
-So it is best to give this option large values by default.
+Also, note that the number of memory-mapped files that your kernel can support 
is limited.
+So when this option is necessary, it is best to give it values larger than 1 
megabyte (@option{--minmapsize=1000000}).
 You can then decrease it for a specific program's invocation on a large input 
after you see memory issues arise (for example an error, or the program not 
aborting and fully consuming your memory).
-
-The random file will be deleted once it is no longer needed by the program.
-The @file{.gnuastro} directory will also be deleted if it has no other 
contents (you may also have configuration files in this directory, see 
@ref{Configuration files}).
 If you see randomly named files remaining in this directory when the program 
finishes normally, please send us a bug report so we address the problem, see 
@ref{Report a bug}.
 
 @cartouche
@@ -7538,7 +7527,7 @@ This allows you to be much more productive in easily 
checking various ideas/assu
 
 
 
-@node Numeric data types, Tables, Multi-threaded operations, Common program 
behavior
+@node Numeric data types, Memory management, Multi-threaded operations, Common 
program behavior
 @section Numeric data types
 
 @cindex Bit
@@ -7652,7 +7641,132 @@ If you are writing your own program, you can use the 
@code{gal_data_copy_to_new_
 
 
 
-@node Tables, Tessellation, Numeric data types, Common program behavior
+@node Memory management, Tables, Numeric data types, Common program behavior
+@section Memory management
+
+@cindex Memory management
+@cindex Non-volatile memory
+@cindex Memory, non-volatile
+In this section we'll review how Gnuastro manages your input data in the 
available memory of your system.
+But before that, let's have a short basic introduction to the types of memory 
most relevant to this discussion.
+
+Input datasets (that are later fed into programs for analysis) are commonly 
first stored in @emph{non-volatile memory}.
+This is a type of memory that doesn't need any constant power to keep the 
data, like HDDs or SSDs and is primarily aimed for long-term storage.
+So data in this type of storage is preserved when you turn off your computer.
+But non-volatile memory is much slower in reading or writing than the speeds 
that CPUs can process the data.
+Thus relying on this type of memory alone would create a bad bottleneck in the 
input/output (I/O) phase of any processing.
+
+@cindex RAM
+@cindex Volatile memory
+@cindex Memory, volatile
+The first step to decrease this bottleneck is to have a faster storage space, 
but with more limited volume.
+For this type of storage, computers have a Random Access Memory (or RAM).
+RAM is classified as a @emph{volatile memory} because it needs a constant 
power source to keep the information.
+In other words, the moment power is cut-off, all the stored information in it 
is gone (hence the 'volatile' name).
+By assuming that sufficient power is always available, volatile memory is much 
faster than non-volatile memory.
+
+Hence, the general/simplistic way that programs deal with memory is the 
following (this is general to all programs, not just Gnuastro's):
+1) Load/copy the input data from the non-volatile memory into RAM.
+2) Use the copy in RAM as input for all the internal processing as well as the 
intermediatea data that is necessary during the processing.
+3) Finally, when the analyis is complete, write the final output data back 
into non-volatile memory, and delete all the used space in the RAM (the initial 
copy and all the intermediate data).
+The RAM is most important for the data of the intermediate steps (that you 
never see as a user of a program!).
+
+When the datasets are small (compared to the available space in your system's 
RAM when they are run) the steps above are roughly accurate in Gnuastro's 
programs and libraries.
+The only exception is that deleting the intermediate data is not only done at 
the end of the program.
+As soon as an intermediate dataset is no longer necessary for the next 
internal steps, it is deleted.
+This allows Gnuastro programs to minimize their usage of your system's RAM.
+
+The situation gets complicated when the datasets are large (compared to your 
available RAM when they are run).
+For example if a dataset if half the size of your system's available RAM, and 
the program's analysis needs three or more intermediately processed versions of 
it in a certain phase of its analysis, there won't be enough RAM to keep those 
higher-level versions.
+In such cases, programs that don't do any memory management will crash.
+But fortunately Gnuastro's programs do have a memory management plans for such 
situations.
+
+@cindex Memory-mapped file
+When the necessary amount of space for an intermediate dataset cannot be 
allocated in the RAM, instead of crashing, Gnuastro's programs will not use the 
RAM at all.
+They will use the ``memory-mapped file'' concept in modern operating systems 
to create a randomly-named file in your non-volatile memory and use that 
instead.
+That file will have the exact size (in bytes) of that intermediate dataset.
+Any time the program needs that intermediate dataset, the operating system 
will directly go to that file, and bypass your RAM.
+As soon as that file is no longer necessary for the analysis, it will be 
deleted.
+But as mentioned above, non-volatile memory has much slower I/O speed than the 
RAM.
+Hence in such situations, the programs will become noticably slower.
+
+By default, Gnuastro's programs and libraries will notify you with a statement 
like below, the moment that an intermediate dataset is memory-mapped (can 
happen in any phase of their analysis).
+It shows the location of the memory-mapped file and its size, complemented 
with a small description of the cause and a pointer to this section of the book 
for more information on how to deal with it (if necessary).
+
+@example
+astarithmetic: ./.gnuastro_mmap/B1QgVf: temporary memory-mapped file
+(XXXXX bytes) for intermediate data that is not stored in RAM (see
+the "Memory management" section of Gnuastro's manual)
+@end example
+
+@noindent
+Finally, when the intermediate dataset is no longer necessary, the program 
will automatically delete it and let you know with a statement like this:
+
+@example
+astarithmetic: ./.gnuastro_mmap/B1QgVf: deleted
+@end example
+
+@noindent
+To disable these messages, you can run the program with @code{--quietmmap}, or 
set the @code{quietmmap} variable in the allocating library function to be 
non-zero.
+
+An important component of these messages is the name of the memory-mapped file.
+Knowing that the file has been deleted is important for the user if the 
program crashes for any reason: internal (for example a parameter is given 
wrongly) or (for example you mistakenly kill the running job) reason.
+In the event of a crash, the memory-mapped files will not be deleted and you 
have to manually delete them because they are usually large and they may soon 
fill your full storage if not deleted in a long time due to successive crashes.
+
+This brings us to managing the memory-mapped files in your non-volatile 
memory: knowing where they are saved, placing them in different places of your 
file system, or deleting them when necessary.
+As the examples above show, memory-mapped files are stored in a hidden 
sub-directory of the the running directory: @file{.gnuastro_mmap}.
+If this directory doesn't exist, Gnuastro will automatically create it if it 
needs to memory-map an internal dataset.
+However, If the @file{.gnuastro_mmap} sub-directory exists and isn't writable, 
or it can't be created, then the file will be created in the running directory 
with a @file{.gnuastro_mmap_} prefix.
+
+Therefore one easy way to delete all memory-mapped files is to delete 
everything within this directory:
+
+@example
+rm -f .gnuastro_mmap/*
+@end example
+
+A much more common issue when dealing with memory-mapped files is their 
location.
+For example you may be running a program in a partition that is hosted by a 
HDD.
+But you also have another partition on an SSD (which has much faster I/O).
+So you want your memory-mapped files to be created in the SSD to speed up your 
processing.
+Another common scenario is this: you want your project source directory to 
only contain your plain-text scripts and you want your project's built products 
(even the temporary memory-mapped files) to be built in a different location 
because they are large and hard to maintain with the valuable scripts.
+
+To do host the memory-mapped files in another location, you can set 
@file{.gnuastro_mmap} to be a symbolic link to that location.
+For example, let's assume you want your memory-mapped files to be stored in 
@file{/path/to/dir/for/mmap}.
+All you have to do is to run following command before your Gnuastro analysis 
command:
+
+@example
+ln -s /path/to/dir/for/mmap .gnuastro_mmap
+@end example
+
+The programs will delete the memory-mapped file when it is no longer needed, 
but they won't delete the @file{.gnuastro_mmap} directory that hosts them.
+So if your project involves many Gnuastro programs and you want your 
memory-mapped files to be in a different location, you just have to make the 
symbolic link above once.
+
+Another memory-management scenario that may happen is this: you don't want a 
Gnuastro program to allocate internal datasets in the RAM at all.
+For example the speed of your Gnuastro-related project doesn't matter at that 
moment, and you have higher-priority jobs that are being run at the same time.
+In such cases, you can use the @option{--minmapsize} option that is available 
in all Gnuastro programs (see @ref{Processing options}).
+Any intermediate dataset that has a size larger than the value of this option 
will be memory-mapped, even if there is space available in your RAM.
+For example if you want any intermediate dataset larger than 1 megabyte to be 
memory-mapped, use @option{--minmapsize=1000000}.
+
+@cindex Linux kernel
+@cindex Kernel, Linux
+You shouldn't set the value of @option{--minmapsize} to be too small, 
otherwise even small intermediate values (that are very numerous) in the 
program will be memory-mapped.
+However the kernel can only host a limited number of memory-mapped files at 
every moment (by all running programs combined).
+For example in the default@footnote{If you need to host more memory-mapped 
files at one moment, you need to build your own customized Linux kernel.} Linux 
kernel on GNU/Linux operating systems this limit is roughly 64000.
+If the total number of memory-mapped files exceeds this number, all the 
programs using them will crash.
+Small/numerous intermediate values will rarely exceed a mega byte, so based on 
our experience, this is a good value in such scenarios.
+
+Actually, the default behavior for Gnuastro's programs (to only use 
memory-mapped files when there isn't enough RAM) is a side-effect of 
@option{--minmapsize}.
+The pre-defined value to this option is an extremely large value in the 
lowest-level Gnuastro configuration file (the installed @file{gnuastro.conf} 
described in @ref{Configuration file precedence}).
+This value is larger than the largest available RAM.
+You can check by running any Gnuastro program with a @option{-P} option.
+Because no dataset will be larger than this, by default the programs will 
first attempt to use the RAM for temporary storage.
+But if writing in the RAM fails (for any reason, maily due to lack of 
available space), then a memory-mapped file will be created.
+
+
+
+
+
+@node Tables, Tessellation, Memory management, Common program behavior
 @section Tables
 
 ``A table is a collection of related data held in a structured format within a 
database.
diff --git a/lib/arithmetic.c b/lib/arithmetic.c
index fb33c6b..0138642 100644
--- a/lib/arithmetic.c
+++ b/lib/arithmetic.c
@@ -1139,6 +1139,7 @@ struct multioperandparams
       }                                                                 \
                                                                         \
     /* Clean up. */                                                     \
+    free(pixs);                                                         \
     gal_data_free(cont);                                                \
   }
 
@@ -1195,6 +1196,7 @@ struct multioperandparams
       }                                                                 \
                                                                         \
     /* Clean up. */                                                     \
+    free(pixs);                                                         \
     gal_data_free(cont);                                                \
   }
 
diff --git a/lib/data.c b/lib/data.c
index f41e29e..2d60805 100644
--- a/lib/data.c
+++ b/lib/data.c
@@ -92,6 +92,137 @@ gal_data_alloc(void *array, uint8_t type, size_t ndim, 
size_t *dsize,
 
 
 
+/* On the Linux kernel, due to "overcommitting" (which is activated by
+   default), malloc will not return NULL when we allocate more memory than
+   the physically available memory. It is possible to disable overcommiting
+   with root permissions, but I have not been able to find any way to do
+   this as a normal user. So the only way is to look into the
+   '/proc/meminfo' file (constantly filled by the Linux kernel) and read
+   the available memory from that.
+
+   Note that this overcommiting apparently only occurs on Linux. From what
+   I have read, other kernels are much more strict and 'malloc' will indeed
+   return NULL if there isn't any physical RAM to support it. So if the
+   '/proc/meminfo' doesn't exist, we can assume that 'malloc' works as
+   expected, until its inverse is proven. */
+static size_t
+data_available_ram()
+{
+  FILE *file;
+  int keyfound=0;
+  size_t *freemem=NULL;
+  size_t linelen=80, out=GAL_BLANK_SIZE_T;
+  char *token, *line, *linecp, *saveptr, delimiters[] = " ";
+  char *meminfo="/proc/meminfo", *keyname="MemAvailable", *units="kB";
+
+  /* If /proc/meminfo exists, read it. Otherwise, don't bother doing
+     anything. */
+  if ((file = fopen(meminfo, "r")))
+    {
+      /* Allocate space to read the line. */
+      errno=0;
+      line=malloc(linelen*sizeof *line);
+      if(line==NULL)
+        error(EXIT_FAILURE, errno, "%s: allocating %zu bytes for line",
+              __func__, linelen*sizeof *line);
+
+      /* Read it line-by-line until you find 'MemAvailable'.  */
+      while( getline(&line, &linelen, file) != -1 )
+        if( !strncmp(line, keyname, 12) )
+          {
+            /* Necessary for final check: */
+            keyfound=1;
+
+            /* We need to work on a copied file to avoid messing up the
+               contents of the actual line. */
+            gal_checkset_allocate_copy(line, &linecp);
+
+            /* The first token (which we don't need). */
+            token=strtok_r(linecp, delimiters, &saveptr);
+
+            /* The second token (which is the actual number we want). */
+            token=strtok_r(NULL, delimiters, &saveptr);
+            if(token)
+              {
+                /* Read the token as a number. */
+                if( gal_type_from_string((void **)(&freemem), token,
+                                         GAL_TYPE_SIZE_T) )
+                  error(EXIT_SUCCESS, 0, "WARNING: %s: value of '%s' "
+                        "keyword couldn't be read as an integer. Hence "
+                        "the amount of available RAM couldn't be "
+                        "determined. If a large volume of data is "
+                        "provided, the program may crash. Please contact "
+                        "us at '%s' to fix the problem",
+                        meminfo, keyname, PACKAGE_BUGREPORT);
+                else
+                  {
+                    /* The third token should be the units ('kB'). If it
+                       isn't, there should be an error because we currently
+                       assume kilobytes. */
+                    token=strtok_r(NULL, delimiters, &saveptr);
+                    if(token)
+                      {
+                        /* The units should be 'kB' (for kilobytes). */
+                        if( !strncmp(token, units, 2) )
+                          out=freemem[0]*1000;
+                        else
+                          error(EXIT_SUCCESS, 0, "WARNING: %s: the units of "
+                                "the value of '%s' keyword is (usually 'kB') "
+                                "isn't recognized. Hence the amount of "
+                                "available RAM couldn't be determined. If a "
+                                "large volume of data is provided, the "
+                                "program may crash. Please contact us at "
+                                "'%s' to fix the problem", meminfo, keyname,
+                                PACKAGE_BUGREPORT);
+                      }
+                    else
+                      error(EXIT_SUCCESS, 0, "WARNING: %s: the units of the "
+                            "value of '%s' keyword (usually 'kB') couldn't "
+                            "be read as an integer. Hence the amount of "
+                            "available RAM couldn't be determined. If a "
+                            "large volume of data is provided, the program "
+                            "may crash. Please contact us at '%s' to fix "
+                            "the problem", meminfo, keyname, 
PACKAGE_BUGREPORT);
+                  }
+
+                /* Clean up. */
+                if(freemem) free(freemem);
+              }
+            else
+              error(EXIT_SUCCESS, 0, "WARNING: %s: line with the '%s' "
+                    "keyword didn't have a value. Hence the amount of "
+                    "available RAM couldn't be determined. If a large "
+                    "volume of data is provided, the program may crash. "
+                    "Please contact us at '%s' to fix the problem",
+                    meminfo, keyname, PACKAGE_BUGREPORT);
+
+            /* Clean up. */
+            free(linecp);
+          }
+
+      /* The file existed but a keyname couldn't be found. In this case we
+         should inform the user to be aware that we can't automatically
+         determine the available memory.*/
+      if(keyfound==0)
+        error(EXIT_FAILURE, 0, "WARNING: %s: didn't contain a '%s' keyword "
+              "hence the amount of available RAM couldn't be determined. "
+              "If a large volume of data is provided, the program may "
+              "crash. Please contact us at '%s' to fix the problem",
+              meminfo, keyname, PACKAGE_BUGREPORT);
+
+      /* Close the opened file and free the line. */
+      free(line);
+      fclose(file);
+    }
+
+  /* Return the final value. */
+  return out;
+}
+
+
+
+
+
 /* Initialize the data structure.
 
    Some notes:
@@ -115,7 +246,8 @@ gal_data_initialize(gal_data_t *data, void *array, uint8_t 
type,
                     int clear, size_t minmapsize, int quietmmap,
                     char *name, char *unit, char *comment)
 {
-  size_t i;
+  size_t nouseram=1500000000;
+  size_t i, bytesize, availableram;
   size_t data_size_limit = (size_t)(-1);
 
   /* Do the simple copying cases. For the display elements, set them all to
@@ -188,18 +320,47 @@ gal_data_initialize(gal_data_t *data, void *array, 
uint8_t type,
         data->array=array;
       else
         {
+          /* If a size wasn't given, just set a NULL pointer. */
           if(data->size)
             {
-              if( gal_type_sizeof(type)*data->size > minmapsize )
-                /* Allocate the space into disk (HDD/SSD). */
+              /* Find the available RAM space (only relevant for Linux). */
+              availableram=data_available_ram();
+              bytesize=gal_type_sizeof(type)*data->size;
+
+              /* For a check: */
+              printf("check: %zu (data), %zu (ram)\n",
+                     bytesize, availableram-nouseram);
+
+              /* If the final size is larger than the user's maximum, or is
+                 larger than the available memory minus 500Mb (to leave the
+                 system some breathing space!), then read the array into
+                 disk using memory-mapping (HDD/SSD). */
+              if( bytesize > 1000000
+                  && ( bytesize > minmapsize
+                       || availableram < nouseram
+                       || bytesize > (availableram-nouseram) ) )
                 data->array=gal_pointer_allocate_mmap(data->type, data->size,
                                                       clear, &data->mmapname,
                                                       quietmmap);
               else
-                /* Allocate the space in RAM. */
-                data->array = gal_pointer_allocate(data->type, data->size,
-                                                   clear, __func__,
-                                                   "data->array");
+                {
+                  /* Allocate the necessary space in the RAM. */
+                  data->array = ( clear
+                        ? calloc( data->size,  gal_type_sizeof(data->type) )
+                        : malloc( data->size * gal_type_sizeof(data->type) ) );
+
+                  /* If the array is NULL. */
+                  if(data->array==NULL)
+                    data->array=gal_pointer_allocate_mmap(data->type,
+                                                          data->size,
+                                                          clear,
+                                                          &data->mmapname,
+                                                          quietmmap);
+
+                  /* The 'errno' is re-set to zero just incase 'malloc'
+                     changed it, which may cause problems later. */
+                  errno=0;
+                }
             }
           else data->array=NULL; /* The given size was zero! */
         }
diff --git a/lib/pointer.c b/lib/pointer.c
index e9204fd..7dfb287 100644
--- a/lib/pointer.c
+++ b/lib/pointer.c
@@ -155,8 +155,10 @@ gal_pointer_allocate_mmap(uint8_t type, size_t size, int 
clear,
 
   /* Inform the user. */
   if(!quietmmap)
-    error(EXIT_SUCCESS, 0, "%s: temporary %zu byte file (consider "
-          "'--minmapsize')", *filename, bsize);
+    error(EXIT_SUCCESS, 0, "%s: temporary memory-mapped file (%zu bytes) "
+          "for intermediate data that is not stored in RAM (see "
+          "the \"Memory management\" section of Gnuastro's manual)",
+          *filename, bsize);
 
 
   /* Write to the newly set file position so the space is allocated. To do



reply via email to

[Prev in Thread] Current Thread [Next in Thread]