Massive Remote Batch Visualizer (Porting AVS/Express to HECToR)
Figure 1: AVS/Express distributed rendering of CT data
The visualization of large datasets has become a key bottleneck in applications where validation of results or data acquisition from scientific equipment is required at an early stage. Such validation would allow correctness of methods (such as the set up of a physical experiment) to be determined prior to further computational or imaging machine resources being spent.
In this project we are specifically considering the volume datasets acquired from new X-ray imaging technologies, including the Diamond Light Source (mainly the I12 JEEP, Joint Engineering, Environmental and Processing beamline) and local facilities available at the University of Manchester. Such facilities are able to generate volume datasets in the order of 128Gb (4096 ^3 x16bits).
To validate that the physical experiment has been set up correctly such datasets must be rendered using suitable visualization techniques. Such datasets far exceed the capabilities of modern graphics hardware (GPUs) and so visualization systems are turning to parallel compute facilities to render them, performing rendering tasks on multiple CPUs. While the GPU is often considered faster at rendering than the CPU there are certain advantages to using compute facilities: availability of resources – the number of GPUs in a rendering cluster is often limited compared to the number of CPUs available in a compute system; location of data – typically the volume datasets will already be available to compute facilities for processing and analysis. Moving the datasets to a dedicated graphics cluster can introduce another bottleneck.
This project has ported an existing parallel rendering code to HECToR, improving the code’s scalability allowing large datasets to be visualized. Traditionally visualization codes have been used interactively allowing exploration of the dataset. Our intention was to provide a batch rendering facility where predefined visualization applications were able to generate suitable images for experimental validation without interactive use. This is similar to the work carried out by Bethune where a smaller scale visualization system was ported to HPCx. While this mode of operation can be used, the performance of the code on HECToR allows interactive use in some cases (dependent on dataset size and amount of geometry being rendered).
The existing visualization code comprises a number of components. The main application is AVS/Express, a commercial visualization application developed by AVS Inc. and ourselves. The Distributed Data Renderer (DDR) version of this product, developed at the University of Manchester, is able to render data on distributed compute nodes where no GPU hardware is available. In essence, data is distributed to a number of compute processes (via MPI) which perform a mapping of the data to geometry (e.g., by executing an isosurface algorithm). Geometry is then passed to a companion rendering process which renders the small sub-volume of the overall dataset to be rendered. Each rendering process produces a small rendered image of its sub-volume. These images are then composited together using a parallel compositing library, forming a final complete image of the dataset for the user to view. This type of rendering is referred to as sort-last rendering and allows much larger datasets to be rendered than could be handled by a single GPU, or indeed a single CPU.
Hence the components that have been ported are the main AVS/Express DDR application, an open source parallel compositing library (Paracomp from HP) and an open source software implementation of OpenGL (namely MesaGL). These components have been ported by the proposers to various versions of Linux, Irix and Solaris systems with up to 32 cores, and required further optimisation stages for the scales involved within HECToR.
AVS/Express allows specific visualization techniques to be composed as an application (or network in its terminology). Such applications can be used as batch rendering jobs where the user simply supplies the dataset to be rendered and the number of processes to use for rendering. The flexibility of this type of application development system is that special consideration can be given to certain stages of the visualization pipeline. In particular we have produced a module that provides parallel I/O facilities within the visualization application. This allows large datasets that have already been decomposed in to sub-volumes to be stored as such and then read in efficiently by the visualization application. Alternatively a single large volume of data can be read by the MPI processes using the parallel file system on HECToR. This is essential for the scalability of the proposed system as initial tests have shown that dataset I/O is the limiting factor rather than the actual visualization techniques.
To enable AVS/Express DDR to run on the HECToR system a number of changes are have been made to the express architecture. Normally express is run as an MPI job consisting of three process types: the main express process (always rank 0) displaying the familiar network editor user interface and visualization window; a number of parallel module processes which execute the AVS module codes (such as p_read_field, p_isosurface) on a dataset decomposed in to a number of domains; and a number of rendering processes which receive geometry from the parallel module processes. The rendering processes render their subset of geometry and execute an image compositing stage to generate the final image back in the visualization window. Figure 1 summarises this architecture.
Figure 1: AVS/Express DDR MPI processes
On HECToR the express process has been removed from the MPI job so that it can run on a login node where X11 facilities are available (but MPI functionality is not). To reduce the impact on the AVS code base an MPI forwarding library has been developed that allows the express process to make MPI function calls but have them executed by a proxy process running on a HECToR back-end node. See Figure 2.
Figure 2: MPI Forwarding
Another significant change was to replace the compositor library used by AVS, which contains no MPI communication layer, and provide an alternative compositing mechanism. In this case an implementation of 2-3 Swap compositing was developed. The core image compositing routines from the existing compositor may still be used to perform the blending of source images but all pixel data transport is now performed using MPI between the back-end rendering processes and the express process (for final image display).
There is a wiki within RCS at http://wiki.rcs.manchester.ac.uk/community/mrbv
RCS Case Study