library(duflor)
#> Attaching duflor version 1.0.1 from library C:/Users/Claudius Main/AppData/Local/R/cache/R/renv/library/duflor-1030851c/windows/R-4.4/x86_64-w64-mingw32.
Introduction
Image-Analysis is a resource-consuming process. To give some context,
this package was built and designed for the analysis of up to 200 images
with a resolution of 6000x4000=24.000.000 pixels. As each pixel is
represented in HSV-color space, each image consists of a
multidimensional matrix with 72.000.000 elements. As a result, each full
image consumes 576
MB of RAM.
The likely effectiveness of any measure outlined on this article thus
heavily depends on your preexisting specs and workflow-requirements.
This problem gets accentuated by various factors:
- General
- number of images
- image dimensions
- Programming Language used for iteration-heavy steps
- Number of spectra extracted per image
- User-specific
- Specifications of the machine running the analysis-process
- Parallelised computing
Not all of these factors can be addressed by the package author.
Within the scope of this page, resources
is a general
term for any one of the following:
- RAM (both available and total)
- CPU Clock speed
- Storage speed
Finally, most of the aspects listed in this article are merely listed for the sake of completeness, as they are very generalized and should not be “new” to the reader.
1. General
Steps and advice outlined in this section are considered to hold true regardless of the PC’s specifications.
1.1 Number of images
It is pretty self-explanatory that the analysis of 400 images will require considerably more resources than the analysis of fewer images.
1.2 Image dimensions
It should come as no surprise that smaller images will be processed
faster, because less pixels must be checked against any set of color
boundaries.
As a nice bonus, smaller images also have fewer pixels which can be
considered a potential success, and thus the resulting data-objects
can be smaller1.
1.3 Programming Language used for iteration-heavy steps
First things first - R is completely fine for a lot of tasks. And rewriting stuff in C++ can help, but may not always help. In general, a re-implementation in C++ is only useful if the code you re-write needs a stupid amount of iterations, or invokes recursion. Within the scope of this package, the process of determining for every pixel if it is part of either one range or another can be utterly slow when implemented in raw R-code. As a central and important step, quantifying matches for several HSV-ranges (e.g. ‘drought’-pixels,‘green’ pixels, ‘identifier’-pixels, …) further increases the time-cost. On the authors laptop (mobile Ryzen 5500U with 16GB total RAM), the mean execution time of 32 sequential analyses of the same image (dimensions 6000x4000 pixels) took
- R: ~4.02279 [s/iteration]
- cpp: ~2.01527 [s/iteration]
- cpp_optimised: ~0.594039 [s/iteration]
respectively. Needless to say, extrapolating this over 200 images will snowball appropriately. It could be argued that the slower implementation(s) should not be part of the package at all; however they serve as an important stepping-stone for understanding the more optimised cpp-implementation(s).
Loading is done in Cpp via the cImg
-library via the R-package
Imager
. To
quantify the green leaf-area or root-area in an image, every pixel in
the image must be checked, This can and will take time, especially on
high-resolution images. To counterbalance this, this step is implemented
in C++ via the Rcpp
-package, since
iterating over large matrices and checking each value against a
condition is substantially faster in C++ than in R.
1.4 Number of spectra extracted per image
Similarly to 1.1, the number of spectra which have to be extracted for each image increases execution-time. In principal, the factor would be relative to the number of spectra; however benchmarks on my end don’t support this trend. However, as I don’t have the hardware required to run benchmarks at full efficiency, I cannot verify if this assumption actually holds.
2. User-specific changes
2.1 Specifications of the machine running the analysis-process
This section gives a brief overview on which parts of a computer running this package for analysis tends to get stressed the most, and is intended as a very simple baseline.
RAM (a lot of number-crunching)
CPU (a lot of number-crunching)
GPU
A proper CPU is important, and so is a decent amount of RAM. It should be noted that a parallelised evaluation requires significantly more RAM., especially if the code is run in parallel. See 2.2 Parallelised computing for more details.
2.2 Parallelised computing
Parallelisation is a simple solution to computations which
take a lot of time. Simple in principal, somewhat complicated in detail.
However, the in-depth fine details of how and why parallelisation works
are beyond the scope of this article (and to an extend, its author).
This package itself does not implement any parallel back-end. The app
declared in the duflor.gui
-package
is capable of parallel execution. Refer to its documentation for further
information.