ZEISS Knowledge Base
Help
ZEISS Knowledge Base

Imaging basics

What is an image?

This article attempts to explain one of the most basic concepts of image analysis, namely, what actually is an image.

To humans who learn from birth to use their eyes, along with their other senses, to learn about the world around them, the fact that image analysis seems rather complicated can be confusing. As humans, we can look at an image and instantly make interpretations of the content of that image. We can look at family pictures and recognize individuals. We might even look at family photos of relatives we might never have met and recognize familiar features. We can also look is micrographs and recognize cells, or individual cell components, instantly recognize relationships and differences. But these abilities are woven with several years of learning related to concepts that overlayed on top of other concepts that only exist in a human brain. Many advances in artificial intelligence, and especially Deep Learning seek to duplicate this process to simplify image analysis, but even for this task, it is important to understand what an image actually IS.

To understand what an image is, perhaps it is best to start with what an image is not.

To paraphrase Rene Magritte, these are not cell nuclei:

This is an image of cell nuclei. More specifically, it is a matrix of intensity measurements of the light emitted by fluorophores, projected through a microscope lens onto a light-sensitive wafer of silicon, stored in a computer as binary numbers, and displayed as pixels of various intensity on the computer screen that you are looking at right now.

If we zoom in on one of these "nuclei", we can see more clearly that it is nothing like the actual object it represents:

An image then is a collection of pixels (short for Picture-Element). But as far as the computer is concerned, each pixel is just a number:

The higher the number, the brighter the pixel is displayed on the screen.

So now we can see that *Image* analysis is actually *matrix* analysis. In many ways, the first step in image analysis is displaying these intensity values on the screen in a way that humans can interpret them, but the process of identifying individual objects so that we can measure their features is a process called segmentation. To learn about segmentation, please check the What is segmentation? article.

What is resolution

Resolution is a term that is talked about a lot in imaging, but its meaning can vary depending on the context. This article seeks to clarify the meaning of resolution in the context of microscopy, specifically for image analysis.

Introduction

The Wiktionary lists 14 separate definitions for "resolution". In microscopy resolution is mostly related to the ability of the optical system to separate fine features clearly. In photography, resolution is used to measure the number of pixels in a camera sensor. In Imaging, resolution is something in between. It is formally defined as "the ability to separate the constituent parts of a whole".

In imaging, resolution is mostly defined by the following 4 factors:

  1. Optical resolution - how finely the optics separate points of light from their neighbors
  2. Sampling resolution - how finely the pixels on the camera sensor are separated
  3. Signal-to-noise ratio - how much light are we collecting compared to the noise generated by the system
  4. The bit depth of the imaging sensor - the degrees of separation between the minimum and maximum value recorded by the imaging device

The different types of resolution

Optical resolution

Whole books have been written on this topic by experts in optics, engineering, physics, and microscopy. We cannot do this topic justice in the context of this KB. Essentially, optical resolution is defined by the numerical aperture of the lens and the wavelength of light we are measuring. It is calculated slightly differently depending on whether we are working with transmitted light or fluorescent light, but in short, a higher NA for the objective lens generally means less blurring of the light and therefore better separation of the signal from the sample. Individual points of light projected through a lens are distorted into what is called an airy disc. The diameter of the airy disc limits the separation at which it becomes impossible to reliably tell 2 points of light apart.

In fluorescence microscopy, the diameter of the airy disc is usually measured by the Rayleigh criterion. We can see from this formula that the shorter the wavelength of light or the higher the NA the smaller the diameter of the airy disc and the closer points of light can get to each other before we stop being able to tell them apart.

Sampling resolution

Imaging systems aren't made of just optical elements. To generate an image, we also need a sensor. In many optical systems, this sensor is a digital camera device, typically based on either CCD or CMOS technology, where the light is projected on a grid of individual light sensors we call pixels ("pixel" is a contraction of picture element). In laser scanning confocal we use a single light sensor and scan the sample using a laser then measure the light emission at regular intervals.

In either case, the light coming from the sample is not measured in a continuous fashion, but at specific intervals. The separation between those measurements is what we call the sampling resolution. 

Because of this, the ability to resolve objects from an image isn't limited by the optics alone, it is also limited by the sampling frequency. If the inter-pixel separation is equivalent to 1um when we project light from the sample onto the sensor, the smallest gap we can measure will be larger than this since we need at least 1 pixel of separation to know a gap exists reliably. Likewise, if we are looking for fine features of 1um width in the sample, we really need a sampling frequency of at least half of that. In camera systems, the sampling frequency is the pixel separation on the silicon in the sensor / overall magnification of the optics. This number also serves as a calibration for the size of features in the image. A 10x lens paired to a camera with 5um pixel separation has a sampling resolution of 0.5um.

In this example above, the image on the left is under-sampled. It means that we cannot reliably say whether this is one continuous object, or two distinct objects that are in close proximity. 

The image on the right is over-sampled. We can see the gap between the two objects, but the pixels are so fine that we have a lot more data to process, without getting any valuable information that might allow us to make significantly better measurements of the separation in the image.

The image in the middle is a good compromise between having the ability to still see the gap between the objects, with at least 2 pixels separating them, but without collecting so much information that we put an unnecessary burden on our computer to store and process the data.

Noise

While looking at resolution so far we have only considered ideal scenarios. In practice, images are never without noise. Noise can come from a variety of factors, including:

  • Shot noise caused by the quantum nature of light
  • Electronic noise from the imaging sensor we use to measure light, including read noise and dark current
  • Out-of-focus light appearing at the plane we're imaging from points either further up or down the imaging axis
  • Crosstalk or bleedthrough from different fluorophores from the on we are trying to measure

All of these, and their relative prominence relative to the signal we are trying to image, make up the signal-to-noise ratio. Let's look at how two of these work to better understand how they might affect our ability to separate objects.

When we measure the light coming from the sample, we usually have a system whereby a photon of light hits our sensor, and there is a given probability that this photon will excite an electron. Then, after a given period of time, we measure the voltage from the accumulated charge and we store this as a proxy for the intensity of the light emission intensity.

The probability that a photon will excite an electron is called the quantum efficiency. The excited electrons we measure are called photo-electrons. Shot noise comes from the fact that we may be measuring relatively few photons. If we have 10 photons of light hitting our sensor, with a quantum efficiency of 50%, it would not be uncommon to collect either 0 or 10 photo-electrons due to the probability that none of the photons manage to excite an electron or all the photons do. We could think of it as flipping a coin. From multiple series of 10 throws of the coin, we would expect the average to be 50/50 heads and tails, but it would not be that rare to end up with 10 heads in a row. However, if we take 1000 flips of the coin, the probability that none or all of them are heads diminishes towards 0. Likewise, as the number of photons we collect increases, the individual probability of error in the photo-electron count of individual pixels diminishes. Because of this, generally, the longer the exposure time, or the brighter the signal, the less noisy it will be.

At the same time, when we try to measure the accumulated charge from the excited photo-electrons, there is a certain degree of error in that measurement. We call this the read noise. When looking at specification sheets for cameras, the read noise is normally expressed as a number of electrons. If the read noise is 10 electrons, then we need to collect at least 10 more electrons in one pixel compared to its neighbor to reliably know that it collected more light. When we add the error in the measurement to the shot noise we get much closer to what you might actually expect from your imaging system.

In the end, the main concern is whether the signal-to-noise ratio allows us to reliably see the connections or gaps between objects. In our example above, we can see the connection between the axon and the spines in the high-exposure image, but the noise in the 100ms image is too much to reliably know that they are connected. We could, with our knowledge of the sample and expected behavior tweak a segmentation pipeline to make it more likely to make that connection, but that would introduce a bias in our analysis that should be clearly stated when communicating these results.

Bit depth

Finally, we need to consider how computers store those intensity measurements.

Most will be familiar with the concept that computers store values as 1s and 0s. When storing image data, each pixel will represent a given number of bits that can be 1 or 0, the number of bits we use to store the value is called the bit depth.

An image with a bit depth of 1 will have 1 bit per pixel, so each pixel in the image will be either 0 or 1. If we had 2 bits, each pixel can be 00, 01, 10, or 11. Those binary numbers aren't very intuitive for humans, so we instead translate this as 0, 1, 2, and 3. As we increase the number of bits, we increase the range of values by 2 to the power of the bit depth. So, in a 4-bit image, we can have 2^4 = 16 degrees of variation, and if a pixel collected no light it would have a value of 0 and if it collected as much light as we're able to collect its value would be 15. 

What does this mean for our digital camera images? As we saw, if we have a read noise of 10 electrons we need more than 10 electrons difference to know that a pixel is truly brighter. Our camera might have the capacity to store up to 10 000 electrons per pixel. We call this the full well capacity. If our camera's full well capacity is 10 000 electrons, and our read noise is 10 electrons, the degree of variation accuracy is 10 000 / 10 = 1000. So this camera could reliably measure 1000 levels of intensity, and we call this the dynamic range of the camera. 1000 isn't an exact power of 2, so instead we would most likely store the image in 10-bit, or 1024 greyscales, with the remaining 24 greyscales not making a significant difference to the overall accuracy of the measurement. 

However, for a variety of reasons, computer data structures are easier to handle if they are made up of groups of 8 bits. So the most common image file formats usually store the intensity data as either 8 or 16-bit. 

8-bit gives us 256 greyscales, for values going from 0 to 255. This is not a lot compared to the 1000 greyscale dynamic range of the sensor we just discussed above, but it is also roughly twice as good as the human eye can normally manage. In fact, because of this, most computer displays are limited to 8-bit display ranges, and conversely, higher bit depth images are displayed with this 8-bit display range.

With this in mind, we might ask what the utility is of storing the images in 16-bit, and this brings us back to what we mentioned earlier about measuring low levels of light.

A camera with a full well capacity of 10000 electrons and a read noise of 10 electrons might be OK for a brightfield system where we are collecting a lot of light, but a high-end fluorescence imaging sensor for low light imaging might more likely have a full well capacity of 80 000 electrons and a read noise of 2 electrons, which would give us an effective dynamic range of 40 000 greyscales. However, in low light imaging scenarios, the exposure time required for the highest intensity pixel to collect 80 000 photo-electrons might be measured in seconds. Exposing our sample for that length of time will probably be impractical for a range of reasons:

  • our sample might be very dynamic and might move or change during that time
  • such long exposure time might cause our excitation light to bleach the sample or kill it through photo-toxicity
  • waiting several seconds to take a single snapshot would also make the acquisition of multiple Z planes less practically feasible

For all these reasons we might compromise and settle for a much shorter exposure time. 

So, let's consider a case where we might have an ideal exposure time of 10s to collect 80 000 in at least 1 pixel. Instead, we might compromise and choose a 100ms exposure time and only collect 800 photo-electrons or 1/100th of the maximum intensity range. If we stored this image in 8-bit, we would only have 2, maybe 3, greyscales. If we stored these images in 16-bit instead, we would have 600 greyscales, and measurements of the difference between the brightest and darkest pixels in the image would be a lot more precise.

So, in the end, we store the images in 16-bit, not because we expect to accurately differentiate between 65 536 levels of intensity accurately, but because we might only collect a small fraction of the maximum intensity range and we still want to be able to work with that.

Conclusions

We discussed a lot of factors that affect what image analysts call resolution, but in the end, the consideration of what is useful resolution is fairly simple. We just need to ask ourselves the question:

When looking at the image at the pixel level, can I reliably identify the links connecting objects or the gaps separating them?

If we can answer that question in the positive then we can say that we have enough resolution in the image. If not, then considering the factors above should help in formulating a strategy to try and improve the resolution of our images. Of course, in the end, some compromises will most likely be required, but we should then be in a better position to judge the limitations of the analysis capabilities.

What is segmentation

What is segmentation?

In a nutshell, segmentation is the process by which software converts the pixels in the image, which are just intensity measurements, into objects that we can use to characterize certain features of the objects. Those features can include counting how many objects there are, their morphological properties (volume, surface area, sphericity, etc), intensity characteristics to measure signal expression, or more complex inter-object relationships (number of children, distance to the nearest neighbor, etc).

Segmentation is generally a 2 step process where the pixels in the image are first classified as belonging to either the background or object class, and then the actual segments are created by establishing the relationship between the neighboring pixels. The simplest and most common kind of segmentation is based on an intensity threshold, where pixels are classified according to their value being above or below a given threshold and then contiguous groups of object class pixels are identified as objects.

To better understand this process consider this very simple image:

Humans have no issue intuitively recognizing dozens of individual object, and biologists might even interpret these as nuclei, but software doesn't work like that. What the software has is pixels, which are essentially just intensity measurements:

As a fun exercise, you can try to download this image as a CSV, import it into Excel, and use conditional formatting rules to color the pixels so that they are black if they have an intensity of 0, and white if they have an intensity of 255.

From this map of intensity values, we can set up a simple rule to the effect that all pixels with an intensity above a certain threshold are part of the object class. Here we can see highlighted in red all the pixels that meet our rule (value greater than 50).

Once the rule is established to classify each pixel in the image we then identify contiguous groups of pixels within the class as objects:

By analyzing the pixels in each group we can extract features of the object such as:

  • Volume (total number of pixels multiplied by the image calibration)
  • Surface Area (total number of exposed pixel on the outside of the group)
  • Mean intensity (sum of every pixel intensity in the group divided by the total number of pixels)
  • ...

Instance vs. Semantic Segmentation

As we can see from the example above, once the pixels have been classified into their various classes identifying objects is relatively easy. However, most objects in an image are not usually so conveniently resolved. In many cases, objects can appear to touch or even overlap.

In such cases, a method that simply classifies pixels to identify contiguous groups would produce only one contiguous mass rather than discreet objects.

Such segmentation is known as semantic segmentation.

Separating this singular mass into individual objects is known as instance segmentation.

Various methods exist in arivis for both semantic & instance segmentation, using both traditional intensity based techniques and new ML and DL tool. Examples of Semantic segmentation in arivis Pro include:

  • Threshold based segmenter
  • Machine Learning segmenter (using a random forest pixel classifier)
  • Deep Learning Segmenter using ONNX models

And instance segmentation tools include:

  • Blob Finder
  • Region Growing
  • Watershed segmenter
  • Membrane based segmenter
  • Cellpose based segmenter
  • Deep Learning segmenter using arivis Cloud Instance models

To find out more about using Deep Learning for instance segmentation, please check the Deep Learning segmentation pipelines article. 

How does arivis handle large datasets

arivis solutions are known for their ability to handle large datasets. This article explains in a little more detail why and how this is done.

How do computers process image data?

Most people are familiar with the basic concept that computers deal with binary data, where every morsel of information that is stored and processed in a computer is reduced to a series of 1s and 0s. However, few truly understand how that affects many aspects of how computers store and process data. A complete explanation of the intricacies of computer data processing is much too large for the scope of this article, but some basic explanation may help clarify why arivis solutions do the things they do the way they are done.

The first thing to consider is how images are stored. Most will be familiar with the concept of Pixels. A pixel is very simply an intensity value for a specific point in a dataset. 

Looking at the example above, most human beings will, with little thought or effort, recognize a picture of an eye. But the data that is stored and displayed by the computer is a matrix of intensity values. What imaging software does is translate the numbers representing intensity values stored in the image file and display them as intensities on the screen for the purpose of visualization, or apply rules to individual pixels based on those intensity values to identify specific patterns like recognizing objects. To translate it into a more familiar concept, an image is essentially a spreadsheet of intensity values. 

Now, to come back to the idea of computers as binary machines that process 1s and 0s, these pixels values need to be stored in a way that makes sense to the computer. The smallest amount of information that a computer can process is called a bit. It is a value that can be either on or off, 1 or 0. Of course, when talking about intensity values, 1 or 0 is a very narrow range of possibilities. So, instead of using just one bit of data to store a pixel, we will use multiple bits. So, for a 2-bit image, each pixel would be represented by two bits. Therefore the completer range of possibilities would be 00, 01, 10, and 11. This gives us 4 degrees of variation. Each time we add a bit to our data structure we double the range of possible values. The more bits we use, the more degrees of separation we have between the minimum and maximum value.

But each bit also represents a certain amount of computing resource used to store and process the data contained therein. So, for example, an 8bit image gives us 2^8 degrees of intensity variation for intensities between 0 and 255,  and requires 8 bits or one byte per pixel in the image to store the file. In contrast, a 16-bit image gives us 2^16 degrees of variations (intensities between 0 and 65535), but also requires twice as much space on the disk and twice much processing resources as the 8-bit image since we are using twice as many bits to store and process the data. 

This is important because all the image processing will be done by the computer, which has limited resources, and the data that is being processed needs to be held in the system's memory (RAM) for it to be available to the CPU. Moreover, the memory not only needs to hold the data the CPU is processing, but also the result of those operations, meaning that the computer really needs at least twice as much RAM as the data we are trying to process. 

Typically in data processing, when an application opens a file, the entirety of that file's data is loaded into the system memory. Any processing on that image then outputs the results to the memory, and when the application closes only the necessary information is written back to the hard disk, either as a new file or modifications to the existing files. The problem with this approach comes when the files to be processed grow beyond the available memory. This is a very common problem in imaging as it is very easy when acquiring multidimensional datasets to generate files that are significantly larger than the available memory. Remember that each pixel usually represents 1 or 2 bytes of data. If we use a 1 million pixel imaging sensor, each image will represent 1/2 megabyte of data. If we acquire a Z-stack, we can multiply that by the number of planes in the stack. If we acquire multiple channels, as is common in fluorescence microscopy, we multiply the size of the dataset by the number of channels. If we want to measure changes over time we need to capture multiple time points. Each time we multiply the size of any of these dimensions we also multiply the size of the dataset by the same factor. It is relatively trivial in today's microscopy environment to generate datasets that represent several Terabytes of data. 

So what can we do to optimize the processing of such potentially large datasets?

Generally, in computing, there are a few specific bottlenecks that limit the ability to view and process data. These are:

  • The number of display pixels - Most computer systems only have between 2 and 8 million pixels of screen real estate. A larger display with more pixels will also typically require more memory and a better GPU.
  • Amount of memory available to the system - Most computers come with 8-16GB of memory as standard. High-end workstations can be configured with up to 2-4TB of memory, but these workstations will typically cost 10s of thousands of dollars, of which the memory could be as much as 2/3rds of the cost. 
  • Central Processing Unit capabilities - Processing power is generally limited by several factors, from clock speed to core architecture. A common strategy to boost computing power is to use parallelization to split the processing across multiple cores. Not all tasks can be parallelized. 
  • Graphics Processing Unit capabilities - GPUs are essentially small computing units inside your computer dedicated to the task of displaying information in 3D. They also have limitations with regards to the number and speed of the cores that can be built into a chip, and the amount of video memory available.
  • Read/write speeds and storage capacity - As we'll see below, we can use temporary documents on the hard disk to work around memory limitations, but hard disks read/write speed can become important, especially when considering the cost/speed/capacity compromise for differing storage technologies. Also, many file formats don't readily allow a program to load only part of a file. 

Visualizing larger datasets

First, in terms of pure visualization, as mentioned above most computer displays only have 2 to 8 million pixels of display real estate. When a TB image is loaded, it is almost impossible to physically display every pixel from the dataset on the screen at one time. Therefore, a system that spends minutes or hours loading the entire image in memory to only display a small fraction of these pixels at any one time is generally wasteful. Instead, arivis built a system that only loads the pixels that we need, at the resolution we need, as and when we need them.

Typically this is done by creating a so-called pyramidal file structure, where the image is stored with multiple levels of redundancy with regard to zoom levels, so that when you load the image at the lowest zoom level you only load the lowest resolution layer, and as the user zooms in we remove that from the memory and load the next layer with a higher resolution. This is generally very efficient with regards to memory use, but highly inefficient with regards to data storage as the pyramidal file can end up being 1.5x time larger than the raw data just to give this ability to load the resolution as needed. Instead, arivis uses a redundancy-free file structure where the resolution is calculated as needed, meaning that even without compression the converted SIS file will typically be as little as a few 100MB larger than the raw image data. On top of this, we can also apply lossless compression, of the same type as is used in standard compressed file formats like ZIP, to get between 20-80% compression ratios depending on the image data without any loss of information that is critical for scientific image analysis. 

We cover 3D visualization of large datasets in more details here, but in short, here is a summary of the visualization concerns in 3D.

When rendering a dataset in 3D, similar constraints exist with regards to the number of pixels that can be displayed on the screen, but also the rendering capabilities of the system are also going to be limited by the graphics capabilities of the GPU and the available video memory. So, likewise, when rendering in 3D arivis will first load in memory only as much of the image data as can be readily handled by the GPU to enable a smooth user experience. This leads to a small, but usually acceptable, loss in resolution for the purpose of interactive visualization and navigation. arivis Vision4D also gives the possibility to render at the highest level of resolution as and when needed and when user interactivity is no longer required.

Analyzing and processing lager datasets

When it comes to image processing, a simple workaround for memory availability is what is sometimes referred to as a divide-and-conquer approach or blocking. When using blocking, the software loads blocks of image data into the memory, then processes that block of data, and finally writes the results to a temporary document on the hard disk before loading the next block for processing. This very efficient method of processing does impose some constraints.

First, we need to have enough spare hard disk space to hold the processed images. However, since disk storage is generally significantly cheaper than RAM and there is usually a lot more of it available this is actually an asset of this approach. Our article outlining the System requirements has further details on this point. 

Second, a lot of image processing algorithms will be affected by the boundary pixels, meaning that we need to load a large enough margin on the side of the block we are interested in to allow for seamless stitching of the processed blocks.

Third, some image processing algorithms require the entire image to be held in memory. Therefore, as a result of our decision to make sure the software works with any images of any size on any computer we have specifically decided not to use these algorithms. 

This topic is covered in more details in this article that explains why it may look like arivis is not using the full system resources.

The arivis file format

All of the approaches mentioned above require a method of file access that permits the software to readily load arbitrary blocks of data from the file which not all files allow. To this end, arivis developed what we call arivis ImageCore. ImageCore is a combination of a file format that enables arbitrary access to any part of the file, and the libraries implemented in the software that allows us to make use of this feature. Because this is central to the way arivis solutions operate it also means that we do not *open* files such as TIFF, PNG, or any of the common microscopy imaging formats (CZI, LIF, OIF, ND2, etc), but instead we import the data contained in those files into a new file capable of storing all the information and make it accessible in a way that makes it readily accessible to our software.

The files that arivis creates use the ".SIS" extension. SIS files are multidimensional-redundancy free-pyramidal file structures with lossless compression support. This means that:

  • We can load the resolution that we need to display the images as and when we need them without needing huge amounts of RAM
  • We do so without adding a lot of redundant data to handle the different zoom levels, keeping the imported file size relatively small
  • We can handle images of virtually infinite sizes in a range of dimensions. SIS files support virtually limitless numbers of channels, planes, time points, and tiles of any width and height. The only real limit is the size of the storage in your computer. SIS files are technically 7-dimensional, where each pixel can be anything from 8-bit to 32-bit floating-point, and each pixel belongs to a specific X, Y, and Z location in a specific channel, time point, and image set. 
  • With lossless compression, we can further reduce the size of the imported file by 20-80% depending on the image data. 

So how does Vision4D work with my microscopy images?

Simply put, we don't. At least not directly. arivis Vision4D doesn't open any file in formats other than SIS. This means that to view and process your images we must first import them. To import a file into an SIS file we can simply drag and drop any supported file into an open viewer or our stand-alone SIS importer to start the import process. More details on the importing process can be found in the Getting Started user guide in the arivis Vision4D help menu. 

Once imported, the original file can be kept as backup storage for the raw data and the SIS file can be used as a working document for visualization and analysis. The importing process will, in most cases, automatically read and copy all the relevant image metadata (calibration, channel colours, time intervals etc). If some of the metadata is missing for any reason it can easily be updated and stored in the SIS file. SIS files support a range of storage features, including:

  • Calibration information
  • Time information
  • Visualization parameters (colors, display range, LUTs)
  • 3D Bookmarks (including opacity settings, clipping plane settings, 4D zoom and location...)
  • Measurements (including lines, segments, tracks etc)
  • Version history, allowing for undos even after closing and saving a file
  • Original file metadata not normally used in image analysis (e.g. microscope parameters like laser power, lens characteristics, camera details etc)

Because the original file is left unchanged, users can work with SIS files in arivis Vision4D and VisionVR with full confidence that the original data is safe, that changes to the SIS file are reversible, and that the imported file can be opened and processed on virtually any computer that has enough storage space to hold the data.

How does arivis render datasets that are larger than the video memory in 3D?

This article explains in some detail how arivis handles large data in the context of the 4D viewer.

Introduction

Generally, whatever is displayed on your computer screen has to be held in the computer's memory. This is true for 2D images and 3D images. The memory requirement is mostly defined by the number of pixels we must display. This means that a JPEG image and a TIFF image of the same dimensions will need the same amount of memory, regardless of the amount of space they take up on the hard drive. So, a 1-megapixel image (1 million pixels) typically requires 1 megabyte of memory. For color images, where we have 3 channels for each pixel that image would require 3MB of RAM. Some modern computer systems can display 10 bits of intensity resolution which requires a little more memory to display but this is not particularly relevant in this topic.

Most computer displays only have between 1 and 8 million pixels with which to display images, along with the rest of the user interface. So if we are dealing with very large images in 2D (images that are much larger than the display canvas) it is not usually worth loading the entire image in memory to then only display a fraction of the pixels. Because of this, many imaging software packages make use of Just-in-Time (JiT) loading to only put in memory those pixels that we can display, and if we zoom in on the image we will purge the memory of those pixels we are no longer displaying and load the pixels we were unable to show before.

arivis software solutions have all been built around these JiT strategies, and use an efficient file format to enable us to do this, which is why arivis only "opens" SIS files and any other file (CZI/TIFF, etc) must first be imported.

However, when dealing with 3D stacks, JiT loading becomes a lot more difficult since we now need to display multiple planes simultaneously, from different perspectives, while remaining interactive to keep the viewer useful. Because of this, we need a slightly different strategy.

Rendering 3D stacks as volumes

When we do volumetric rendering a few different things happen that we need to bear in mind. 

First, 3D rendering is best handled by graphics cards which have their own dedicated memory, which is usually much less than the system memory. While most PCs have between 8-32GB of system memory, most GPUs only have between 4-12GB of VideoRAM. Of course, high-end computer systems can have a lot more than 32GB of RAM, but even very high-end GPUs rarely go above 32GB of VRAM. 

32GB of VRAM is clearly a very large amount of video memory, and plenty to hold most but the largest of imaging datasets, but beyond a certain point the amount of memory available becomes less important than the speed at which the GPU can process and render that amount of data.

One really important requirement for a volumetric viewer is interactivity. Without it, it is very difficult to get any sense of depth and of the spatial relationships within a dataset. At a minimum, we need to be able to render images 10-20 times per second. The only way we can currently do this is by only loading into the GPU a manageable amount of information. What is considered a "manageable" amount of information will depend greatly on how fast the GPU is, and sometimes on the amount of VRAM available.

The first time that a user opens the 4D viewer after installing arivis you will see that the software does an automatic optimization of the 3D settings. This optimization is based on the amount of VRAM and the speed of the GPU. The amount of VRAM defines how much of a time series we can hold in cache to display it quickly when navigating through time (as far as the 4D viewer is concerned, each times point is essentially a whole separate stack). The speed of the GPU dictates the level of detail that we can display, or more precisely, the number of pixels we can display efficiently. For a low-end GPU, like what is included in so-called "on-board graphics", this might be as little as 16MPixels, for very high-end GPUs we might manage 8GPixels.

For reference, the Mega/Giga-pixel count for an image is the width x height x depth. So if your image is a 1000x1000x1000 cube, your image represents 1 billion pixels or 1GPixel.

So what happens if our image is larger than the maximum the GPU can handle smoothly? Simply, we downscale it to something more manageable. If the dataset is 3x larger than the GPU can handle we will load only a third of the total image. This happens automatically when you switch to the 4D viewer when the "loading data" progress bar shows at the bottom of the screen. 

The other thing we can see right at the bottom of the window is the GPU that arivis detected and is using and the current resolution we have loaded for the current dataset. In our example above, we have downsampled the volume to 44% of the original image stack, which represents a volume of 1490x1615x404 pixels, or 0.9GPixels.

Note that for volume rendering we can only use one GPU since this is not a task that can be shared across multiple cards.

So what effect does this have on what we see on the screen?

First, it means that you will see that you can dynamically navigate through the dataset, and change visualization settings without a significant lag.

But second, it also means that you may not see some fine details so small objects might not be clearly distinguishable. Also, the level of downscaling together with patterns in the image can sometimes result in Moire patterns or stripes. The vertical stripes we can see in this example below are an example of sub-sampling patterns appearing.

This kind of pattern is caused by the downsampling strategy we use as we try to strike a balance between the quality of the render and the time it takes to downsample the data. 

Note, however, that it is always possible to render full-resolution images if needed, though users should bear in mind that producing such renders requires a lot more computing resources and time:

How can we improve the rendering in the 4D viewer?

It depends on what one means by "improve". Generally, until computer graphics technology improves much beyond what is currently available, when it comes to large datasets we will always have to choose between quality and time. If we want a higher quality it will take more time to load, more time to render, and therefore a less interactive experience. If we want faster rendering we need to sacrifice quality. But first, it is important to remember that the purpose of the 4D viewer is to provide a way to manually interact with 3D data, which requires a certain level of interactivity and therefore a reasonable rendering frame rate. If our intention is to produce a high-quality screenshot or animation then we should use the HD Screenshot or Storyboard tools.

So what are the options available?

Upgrading the GPU

It may be worth considering if there are faster GPUs available. If your PC is old, or your GPU is on the lower end of capabilities, then replacing the GPU could significantly reduce the amount of downscaling required. Laptop GPUs are generally significantly slower than their desktop equivalent and so using a desktop workstation may be preferable if the quality of the view in the 4D viewer is important to you. Of course, if you already have a workstation with the software installed this may not be a practical solution and in any case, it could be quite expensive. However, as mentioned above, once datasets get larger than a certain size, there are increasingly fewer GPUs that can handle such datasets at full resolution, and beyond a certain size, downscaling is simply inevitable.

Note also that a more expensive GPU may not necessarily be faster. Gaming GPUs are generally just as fast as their "professional" equivalents, but significantly cheaper, with the difference in price coming down mostly to the type and amount of VRAM available and the availability of certified drivers for certain specific applications. As an example, you can get comparable rendering speed from an NVIDIA RTX 4090 as from an NVIDIA RTX 6000, but the RTX 6000 is 4x the price. See our article on recommended hardware configurations for more information.

Clipping the dataset to a manageable volume

Changing application preferences

Beyond hardware changes and region clipping, there are a few options available in the preferences that users can set to optimize the performance according to their requirements. We access the Preferences through the Extras menu.

First, it may be worth using the "Run Auto-Detection" button to get a good starting point based on our recommendation.

This will set the caching based on the available amount of VRAM which is important if you are frequently visualizing time series in the 4D viewer. It will also set the Data Quality parameter, which is the amount of VRAM that will be used to display a single volume. The data quality scales from "Low" to "Ultra", but it is also possible to manually choose how much we want to use using the "Custom" option.

Note that the maximum will be dictated by the amount of VRAM available on the GPU.

A higher level of quality will generally result in finer details and fewer downscaling artifacts but will result in longer loading times and slower frame rates. Try increasing the quality by one rank and see if the performance remains acceptable for your personal use.

Note that displaying large numbers of segmented objects will add significantly to the GPU load, which combined with higher data quality can lead to very significant slow-downs.

Using the "Reduce down-scaling artifacts" option can improve image quality but takes longer to load.

Additional options are available that also have an effect on rendering speeds under the Render Settings tab. The details of these options can be found in the help files.

Conclusions

There has always been a race between the largest datasets we can collect and the computing capabilities available to process those datasets. The limit for what is considered a "large" dataset today, will be different from what is considered a "large" dataset in the future, but while computer graphics might one day evolve to handle what we currently consider large readily, by then it is likely that much larger datasets will exist and the problem will still remain. Various strategies exist to handle this disparity, but in the end, we need to pick one. Arivis currently uses the strategy described above because it is what we currently consider to provide the best compromise. These and other considerations are also discussed in this article about HD screenshots and storyboard processing times.

Please do let us know if you found this page useful, and feel free to contact your local support team for more information.

Why does processing a 3D animation movie export take so long?

This article examines the factors that affect the rendering time for exporting high-resolution snapshots and movies.

Overview

The time it takes to render a single snapshot is mostly dependent on the Data and Image Resolution setting. Higher data resolution means more data to process and more time. Reduce the Data and Image Resolution parameters to speed up the rendering time.

When rendering animations the framerate is also important. The higher the framerate the more images we have to render, and the longer it will take.

Introduction

Vision4D has been designed to enable scientists to work on images of virtually unlimited size. As long as you have enough hard disk space to hold the data, all functions of Vision4D are available regardless of the size of the dataset. However, processing more data will take more time.

Regarding the 4D viewer performance, the main limiting factor is the graphics card's capabilities. Specifically, the GPU is limited by the amount of video memory available (VRAM), and the speed/number of graphics processing cores.

When the 4D viewer is activated for the first time on a new installation, the software will run a system check to optimize the performance based on the given hardware. This means that depending on the speed of the GPU we will calculate how large a volume we can render at an acceptable level of performance. For a low-end GPU, like onboard graphics systems, this may mean a maximum of around 256 x 256 x 256 pixels. For high-end GPUs, we may be able to render up to 2000 x 2000 x 2000 pixels.

If the dataset is larger than what the GPU can handle smoothly, Vision4D will automatically subsample the image to a more practicable volume. This is why we see a "loading data" progress bar at the bottom of the viewer:

The other indication that we are subsampling is the Resolution values in the status bar. In our example above, the subsampled resolution is 1505x2090x306, which represents 17% of the actual image data.

While working with the 4D viewer, interactivity is clearly a very important factor. This means that we need to be able to render the volume speedily to enable a high frame rate to reduce lag. To enable speedy renders, Vision4D will usually temporarily drop the image data resolution while the volume is moving/rotating, and then enable a higher resolution once the volume is immobile.

Moving

Static

The drop in resolution is considered an acceptable trade-off for the speed increase that enables interactivity.

All this means that under normal circumstances, users can always smoothly visualize and navigate through even very large datasets on any computer they might be using the software on.

How is producing a high-resolution snapshot different?

When in the 4D viewer, we can always use the Snapshot button to quickly take a snapshot of the viewer that we can then paste into another application.

This process is instantaneous, but the resolution of the image is as it was in the viewer. It is essentially a screengrab.

In many cases, snapshots at that resolution are perfectly fine, but for an important presentation, or if the image will be displayed in a larger format (e.g. poster), getting a better quality image may be desirable. In those cases, we can use the high-resolution snapshot button instead. 

When using the high-resolution snapshot , we can select from additional options to change the size and quality of the image output.

The two options that affect the quality and processing time are Image Resolution and Data Resolution.

The image resolution dictates how big the output image is in pixels. For a slideshow or text document illustration, a Full HD render is usually sufficient. For a large poster or visualization on a high-resolution screen, a higher resolution may be preferable. Going from 1080p HD to 4K UHD can lead to a 3-4x increase in rendering time.

The Data Resolution changes how much of the image data is used to render the image. A higher data resolution will result in a smoother image and finer detail. The scale of the Data Resolution bar goes from 64MB on the left, to whatever is the full data resolution on the right. Typically, each graduation means a doubling of the amount of image data that is used to render the image. Using more data to render the image results in finer details as mentioned above, but also results in longer loading times because we're loading more data, and longer rendering times because we're processing more data. 

Note that the Data Resolution scale is color-coded according to the size of the dataset and the computing resources available.

The green part of the scale represents the amount of video memory available. As long as we stay within the green range, the data selected will be loaded into the GPU's VRAM and the processing time will generally be fast.

Most computers have more system memory (RAM) than they have video memory (VRAM), and so, if the dataset is larger than the amount of VRAM available we may be able to load it into the RAM instead. The loading time is likely to be longer because we are loading more data, but the rendering time will be longer still because now the GPU is also having to access and process the data from the RAM instead, which is slower than reading from the VRAM. So, while doubling the data resolution while staying in VRAM can result in a 3x rendering time increase, doing so while switching to RAM causes a 5x increase in rendering time.

Large datasets (those larger than the RAM available on the system) can't be loaded all at once in either the RAM or VRAM. In these cases Vision4D still allows users to generate high-resolution snapshots and videos, but the software will proceed by loading smaller chunks of the data, one at a time, processing each in turn before loading the next block and so on until the full dataset has been processed. Clearly, this method is much slower, not only because of the larger amounts of data we must process but also because the process of subdividing the image is comparatively inefficient. Consequently, processing datasets at a resolution larger than what is possible to store in RAM will be much slower. 

Note that using a very high level of Data Resolution coupled with a low Image Resolution is likely to lead to much longer processing times without significant improvements in image quality, at least from a low-level zoom. A high level of Data Resolution is most useful if the snapshot uses a highly zoomed-in region of the volume. 

So what about movies?

We can use the storyboard to create 3D animations and in that case, we have to render multiple images, which will clearly take longer than a single snapshot. So what affects the rendering time for a movie?

First, everything mentioned above concerning high-resolution snapshots is also valid for animations. So again, the data resolution and image resolution both affect the time it takes to render individual frames.

Additionally, when exporting animation we have a few additional options:

The first two, file name and video format, have no significant effect on production time. They are just necessary because, unlike snapshots that will be relatively small and can be held in memory before we save them, movies create much larger file sizes.

We mentioned the video resolution and data resolution above, and clearly, this will have an effect. Note also that if your animation also includes progressions through time then the system will have to load each time point it renders which will also increase the total processing time.

But now we have one more setting, which is the framerate. This defines how many individual images the render will produce. If your animation is 10s long, and your production framerate is 30FPS, the software will need to render 10x30=300 images. So, if the software takes 2 seconds, not including the loading time, to render 1 frame, it will need 600 seconds (10min) to render the full movie. If instead, we render at 10FPS we only need to render 100 images and the production time will be about 3 minutes. Note that reducing the frame rate also leads to significantly smaller file sizes.

In most cases, a framerate of 25FPS is ample for smooth animations. For smaller videos that fit more easily in a slide deck reducing the framerate to 10FPS may be acceptable. The video will not be as smooth, but this may be enough for the purposes.

When rendering videos for visualization in a 360 video that people might look at through a headset a higher framerate may be preferable to reduce the flicker effect, but you will also need to use a very performant video player to ensure it is capable of playing back the larger amounts of data at the increased frame rate.

Conclusions

The time it takes to render images is dependent on several factors, including data resolution, the size of the image or video we are producing, and the framerate. As with many things, higher quality and bigger datasets lead to longer processing times. 

At arivis, we decided that everything a user might want to do should be possible regardless of the size of the dataset, so the option to render videos at the highest possible quality is always possible, but that does not mean to say that this process won't take a lot of time and that some compromises aren't necessary if time is an important factor.

What is metadata, and why is it important?

This article explains what metadata is, why it is important, and how arivis deals with metadata during import and image processing.

What is metadata?

Metadata is information about an image

As explained in this article about what is an image, an image file is essentially a collection of intensity values. However, a list of numbers on its own isn't an image, and in any case, every type of computer file depends on metadata of some sort for the computer to handle the file correctly.

Some metadata is universal to all computer files. this includes things like the size of the file, the date the file was created and last edited, read & write privileges etc. But most file metadata is format specific. A PDF document will include formatting and pagination information, an Excel spreadsheet will include the number and name of worksheets and formulas, and an image file will also have it's own metadata, kept in a different structure depending on the exact file format. JPEGs and TIFF images might well keep the same metadata, but in different locations and with different labels.

So what kind of metadata does an image file typically include?

First, metadata will include a lot of structural information. This includes things like:

  • Bit depths of the image
  • width and height of the image
  • Additional dimensional information concerning time series, z-planes and image sets

In the context of scientific image analysis, some other metadata might also be really important, including:

  • pixel dimensions calibrations
  • time intervals between timepoints
  • exposure parameters
  • lens information such as magnification, numerical aperture and refractive indices
  • channel information like laser power and emission wavelength

And many more.

Some of those bits of metadata are clearly more important than others when it comes to image analysis. Information like the spatial and temporal calibration are crucial to the correct interpretation and visualisation of the images. Channel colour information and display range parameters are also clearly very useful for visualisation, but have no effect on segmentation results. Information about the name and model of microscope used can be useful for debugging file compatibility issues, but are generally of little interest to image analysts.

The choice of what metadata to include or omit when writing the file is up to the engineering team who creates and maintains the specific file format. Typically this will be down to the company who manufactures the imaging device, or at least down to the engineer who codes the software that controls the various devices connected to an imaging setup.

Most microscopy files include all this metadata, plus device specific information, and arivis has been engineered to read it and store it in the SIS files it creates.

Metadata such as calibration information and channel visualisation options (colours, display range etc) are automatically translated into their equivalent fields in the SIS metadata where possible. Other metadata which is not relevant to the image analysis process is usually kept in a special metadata container with the SIS file.

Finally, arivis will also create and add its own metadata. This includes objects that are created on the image (segments, tracks, ROIS etc), object features (volume, area, intensity etc), but also pipelines a user creates, modification history and others. Not all of this information is kept in the actual SIS file. Particularly, the objects created by a user, either manually or with a pipeline, are stored in a .OBJECTS file that is separate from the SIS file but uses the same name. As an example, here you can see a CZI file and the files created by arivis during the import process

If we transfer the SIS file to another location, it is important to also copy these additional files to ensure that we do not lose any metadata.

Impressum
Carl-Zeiss-Strasse 22
73447 Oberkochen
Germany
Legal