(Radiographics. 1999;19:783-806.)
© RSNA, 1999
IMAGING & THERAPEUTIC TECHNOLOGY |
Three-dimensional Visualization and Analysis Methodologies: A Current Perspective1
Jayaram K. Udupa, PhD
1 From the Department of Radiology, University of Pennsylvania, 423 Guardian Dr, Philadelphia, PA 19104-6021. Received April 21, 1998; revision requested May 21 and received December 14; accepted December 21. Address reprint requests to the author.
 |
Abstract
|
|---|
Three-dimensional (3D) imaging was developed to provide both qualitative and quantitative information about an object or object system from images obtained with multiple modalities including digital radiography, computed tomography, magnetic resonance imaging, positron emission tomography, single photon emission computed tomography, and ultrasonography. Three-dimensional imaging operations may be classified under four basic headings: preprocessing, visualization, manipulation, and analysis. Preprocessing operations (volume of interest, filtering, interpolation, registration, segmentation) are aimed at extracting or improving the extraction of object information in given images. Visualization operations facilitate seeing and comprehending objects in their full dimensionality and may be either scene-based or object-based. Manipulation may be either rigid or deformable and allows alteration of object structures and of relationships between objects. Analysis operations, like visualization operations, may be either scene-based or object-based and deal with methods of quantifying object information. There are many challenges involving matters of precision, accuracy, and efficiency in 3D imaging. Nevertheless, 3D imaging is an exciting technology that promises to offer an expanding number and variety of applications.
Index Terms: Computed tomography (CT) Computers Computers, simulation Images, analysis Images, display Images, processing Magnetic resonance (MR) Single-photon emission tomography (SPECT) Ultrasound (US)
 |
INTRODUCTION
|
|---|
The main purpose of three-dimensional (3D) imaging is to provide both qualitative and quantitative information about an object or object system from images obtained with multiple modalities including digital radiography, computed tomography (CT), magnetic resonance (MR) imaging, positron emission tomography (PET), single photon emission computed tomography (SPECT), and ultrasonography (US). Objects that are studied may be rigid (eg, bones), deformable (eg, muscles), static (eg, skull), dynamic (eg, heart, joints), or conceptual (eg, activity regions in PET, SPECT, and functional MR imaging; isodose surfaces in radiation therapy).
At present, it is possible to acquire medical images in two, three, four, or even five dimensions. For example, two-dimensional (2D) images might include a digital radiograph or a tomographic section obtained with CT, MR imaging, PET, SPECT, or US; a 3D image might be used to demonstrate a volume of tomographic sections of a static object; a time sequence of 3D images of a dynamic object would be displayed in four dimensions; and an image of a dynamic object for a range of parameters (eg, MR spectroscopic images of a dynamic object) would be displayed in five dimensions.
It is not currently feasible to acquire truly realistic-looking four- and five-dimensional images; consequently, approximations are made. In most applications, the object system being investigated consists of only a few static objects. For example, a 3D MR imaging study of the head may focus on white matter, gray matter, and cerebrospinal fluid.
A textbook with a systematic presentation of 3D imaging is not currently available. However, edited works may be helpful for readers unfamiliar with the subject (13). The reference list at the end of this article is representative but not exhaustive.
In this article, we provide an overview of the current status of the science of 3D imaging, identify the primary challenges now being encountered, and point out the opportunities available for advancing the science. We describe and illustrate the main 3D imaging operations currently being used. In addition, we delineate major concepts and attempt to clear up some common misconceptions. Our intended audience includes developers of 3D imaging methods and software as well as developers of 3D imaging applications and clinicians interested in these applications. We assume the reader has some familiarity with medical imaging modalities and a knowledge of the rudimentary concepts related to digital images.
 |
CLASSIFICATION OF 3D IMAGING OPERATIONS
|
|---|
Three-dimensional imaging operations can be broadly classified into the following categories: (a) preprocessing (defining the object system to create a geometric model of the objects under investigation), (b) visualization (viewing and comprehending the object system), (c) manipulation (altering the objects [eg, virtual surgery]), and (d) analysis (quantifying information about object system). These operations are highly interdependent. For example, some form of visualization is essential to facilitate the other three classes of operations. Similarly, object definition through an appropriate set of preprocessing operations is vital to the effective visualization, manipulation, and analysis of the object system. We use the phrase "3D imaging" to collectively refer to these four classes of operations.
A monoscopic or stereoscopic video display monitor of a computer workstation is the most commonly used viewing medium for images. However, other media such as holography and head-mounted displays are also available. Unlike the 2D computer monitor, holography offers a 3D medium for viewing. The head-mounted display basically consists of two tiny monitors positioned in front of the eyes as part of a helmetlike device worn by the user. This arrangement creates the sensation of being free from one's natural surroundings and immersed in an artificial environment. However, the computer monitor is by far the most commonly used viewing medium, mainly because of its superior flexibility, speed of interaction, and resolution compared with other media.
A generic 3D imaging system is represented in Figure 1. A workstation with appropriate software implementing 3D imaging operations forms the core of the system. A wide variety of input or output devices are used depending on the application. On the basis of the core of the system (ie, independent of input or output), 3D imaging systems may be categorized as those having (a) physician display consoles provided by imaging equipment vendors, (b) image processing and visualization workstations supplied by other independent vendors, (c) 3D imaging software supplied independent of the workstation, and (d) university-based 3D imaging software (often freely available via the Internet).
Systems produced by scanner manufacturers and workstation vendors usually provide effective solutions but may cost $50,000$150,000. For users with expertise in accessing, installing, and running the software, university-based 3D imaging software is available that can provide very effective, inexpensive solutions. For example, for under $5,000 it is possible to configure a complete system running on modern personal computers (eg, Pentium 300; Intel, Santa Clara, Calif) that performs as well as or even better than the costly systems described in the other three categories.
Terminology
Some frequently used terms in 3D imaging are defined in the Table and illustrated in Figure 2.

View larger version (28K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 2. Drawing provides graphic representation of the basic terminology used in 3D imaging. abc = scanner coordinate system, rst = display coordinate system, uvw = object coordinate system, xyz = scene coordinate system.
|
|
The region of the image corresponding to the anatomic region of interest is divided into rectangular elements (Fig 2). These elements are usually referred to as pixels for 2D images and voxels for 3D images; in this article, however, we refer to them as voxels for all images. In 2D imaging, the voxels are usually squares, whereas in 3D imaging they are cuboids with a square cross section.
Object Characteristics in Images
There are two important object characteristics whose careful management is vital in all 3D imaging operations: graded composition and "hanging-togetherness."
Graded Composition.Most objects in the body have a heterogeneous material composition. In addition, imaging devices introduce blurring into acquired images due to various limitations. As a result, regions corresponding to the same object display a gradation of scene intensities. On the knee CT scan shown in Figure 3, both the patella and the femur exhibit this property (ie, the region corresponding to bone in these anatomic locations has not just one CT value but a gradation of values).

View larger version (142K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 3. Graded composition and hanging-togetherness. CT scan of the knee illustrates graded composition of intensities and hanging-togetherness. Voxels within the same object (eg, the femur) are assigned considerably different values. Despite this gradation of values, however, it is not difficult to identify the voxels as belonging to the same object (hanging-togetherness).
|
|
Hanging-togetherness (Gestalt).Despite the gradation described in the preceding paragraph, when one views an image, voxels seem to "hang together" (form a gestalt) to form an object. For example, the high-intensity voxels of the patella do not (and should not) hang together with similar voxels in the femur, although voxels with dissimilar intensities in the femur hang together (Fig 3).
 |
PREPROCESSING
|
|---|
The aim of preprocessing operations is to take a set of scenes and output computer object models or another set of scenes from the given set, which facilitates the creation of computer object models. The most commonly used operations are volume of interest, filtering, interpolation, registration, and segmentation.
Volume of Interest
Volume of interest converts a given scene into another scene. Its purpose is to reduce the amount of data by specifying a region of interest and a range of intensity of interest.
A region of interest is specified by creating a rectangular box that delimits the scene domain in all dimensions (Fig 4a). A range of intensity of interest is specified by designating an intensity interval. Within this interval, scene intensities are transferred unaltered to the output. Outside the interval, they are set to the lower and upper limits. The range of intensity of interest is indicated as an interval on a histogram of the scene (Fig 4b). The corresponding section in the output scene is shown in Figure 4c. This operation can often reduce storage requirements for scenes by a factor of 25. It is advisable to use the volume of interest operation first in any sequence of 3D imaging operations.

View larger version (102K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 4a. Preprocessing with a volume of interest operation. (a) Head CT scan includes a specified region of interest (rectangle). (b) Histogram depicts the intensities of the scene designated in a and includes a specified intensity of interest. (c) Resulting image corresponds to the specified region of interest in a.
|
|

View larger version (53K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 4b. Preprocessing with a volume of interest operation. (a) Head CT scan includes a specified region of interest (rectangle). (b) Histogram depicts the intensities of the scene designated in a and includes a specified intensity of interest. (c) Resulting image corresponds to the specified region of interest in a.
|
|

View larger version (118K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 4c. Preprocessing with a volume of interest operation. (a) Head CT scan includes a specified region of interest (rectangle). (b) Histogram depicts the intensities of the scene designated in a and includes a specified intensity of interest. (c) Resulting image corresponds to the specified region of interest in a.
|
|
The challenge in making use of the volume of interest operation is to completely automate this operation and to do so in an optimal fashion, which requires explicit delineation of objects at the outset.
Filtering
Filtering converts a given scene into another scene. Its purpose is to enhance wanted (object) information and suppress unwanted (noise, background, other object) information in the output scene. Two kinds of filters are available: suppressing filters and enhancing filters. Ideally, unwanted information is suppressed without affecting wanted information and wanted information is enhanced without affecting unwanted information.
The most commonly used suppressing filter is a smoothing operation used mainly for suppressing noise (Fig 5a, 5b). In this operation, a voxel v in the output scene is assigned an intensity that represents a weighted average of the intensities of voxels in the neighborhood of v in the input scene (4). Methods differ as to how neighborhoods are determined and how weight is assigned (5). Another commonly used method is median filtering. In this method, the voxel v in the output scene is assigned a value that simply represents the middle value (median) of the intensities of the voxels in the neighborhood of v in the input scene when the voxels are arranged in ascending order.

View larger version (91K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 5a. Preprocessing with suppressing and enhancing filters. (a) Head CT scan illustrates the appearance of an image prior to filtering. (b) Same image as in a after application of a smoothing filter. Note that noise is suppressed in regions of uniform intensity, but edges are also blurred. (c) Same image as in a after application of an edge-enhancing filter. Note that regions of uniform intensity are unenhanced because the gradient in these regions is small. However, the boundaries (especially of skin and bone) are enhanced.
|
|

View larger version (96K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 5b. Preprocessing with suppressing and enhancing filters. (a) Head CT scan illustrates the appearance of an image prior to filtering. (b) Same image as in a after application of a smoothing filter. Note that noise is suppressed in regions of uniform intensity, but edges are also blurred. (c) Same image as in a after application of an edge-enhancing filter. Note that regions of uniform intensity are unenhanced because the gradient in these regions is small. However, the boundaries (especially of skin and bone) are enhanced.
|
|

View larger version (101K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 5c. Preprocessing with suppressing and enhancing filters. (a) Head CT scan illustrates the appearance of an image prior to filtering. (b) Same image as in a after application of a smoothing filter. Note that noise is suppressed in regions of uniform intensity, but edges are also blurred. (c) Same image as in a after application of an edge-enhancing filter. Note that regions of uniform intensity are unenhanced because the gradient in these regions is small. However, the boundaries (especially of skin and bone) are enhanced.
|
|
In another method (5), often used in processing MR images, a process of diffusion and flow is considered to govern the nature and extent of smoothing. The idea is that in regions of voxels with a low rate of change in intensity, voxel intensities diffuse and flow into neighboring regions. This process is prevented by voxels with a high rate of change in intensity. Certain parameters control the extent of diffusion that takes place and the limits of the magnitude of the rate of change in scene intensity that are considered "low" and "high." This method is quite effective in overcoming noise but sensitive enough not to suppress subtle details or blur edges.
The most commonly used enhancing filter is an edge enhancer (Fig 5c) (4). With this filter, the intensity of a voxel v in the output is the rate of change in the intensity of v in the input. If we think of the input scene as a function, then this rate of change is given by the magnitude of the gradient of the function. Because this function is not known in analytic form, various digital approximations are used for this operation. The gradient has a magnitude (rate of change) and a direction in which this change is maximal. For filtering, the direction is usually ignored, although it is used in operations used to create renditions. Methods differ as to how to determine the digital approximation, which is extensively studied in computer vision (6).
Unfortunately, most existing suppressing filters often also suppress object information and enhancing filters enhance unwanted information. Explicit incorporation of object knowledge into these operations is necessary to minimize these effects.
Interpolation
Like filtering, interpolation converts a given scene into another scene. Its purpose is to change the level of discretization (sampling) of the input scene. Interpolation becomes necessary when the objective is (a) to change the nonisotropic discretization of the input scene to isotropic discretization or to a desired level of discretization, (b) to represent longitudinal scene acquisitions in a registered common coordinate system, (c) to represent multimodality scene acquisitions in a registered coordinate system, or (d) to re-section the given scene. Two types of interpolation are currently available: scene-based interpolation and object-based interpolation.
Scene-based Interpolation.The intensity of a voxel v in the output scene is determined on the basis of the intensity of voxels in the neighborhood of v in the input scene. Methods differ as to how the neighborhoods are determined and what form of the functions of the neighboring intensities is used to estimate the intensity of v (3,6,7). In 3D interpolation, the simplest solution is to estimate new sections between sections of the input scene, keeping the pixel size of the output scene the same as that of the input scene. This leads to a one-dimensional interpolation problem: estimating the scene intensity of any voxel v in the output scene from the intensities of voxels in the input scene on the two sides of v in the z direction (the direction orthogonal to the sections). In "nearest neighbor" interpolation, v is assigned the value of the voxel that is closest to v in the input scene. In linear interpolation, two voxels v1 and v2 (one on either side of v) are considered. The value of v is determined with the assumption that the input scene intensity changes linearly from the intensity at v1 to that at v2. In higher-order (eg, cubic) interpolations, more neighboring voxels are considered. When the size of v in the output scene differs in all dimensions from that of voxels in the input scene, the situation becomes more general, and intensities are assumed to vary linearly or as a higher-order polynomial in each of the three directions in the input scene.
Object-based Interpolation.Object information derived from scenes is used to guide the interpolation process. At one extreme (8), the given scene is first converted to a "binary" scene (ie, a scene with only two intensities: 0 and 1) with a segmentation operation (see "Segmentation"). The voxels with a value of 1 represent the object of interest, whereas the voxels with a value of 0 represent the rest of the scene domain. The "shape" of the region represented by the "1" voxels (the object) is then used to create an output binary scene with a similar shape (9,10) by way of interpolation. This is done by first converting the binary scene back into a (gray-valued) scene by assigning every voxel in this scene a value that represents the shortest distance between the voxel and the boundary between the "0" voxels and the "1" voxels. The "0" voxels are assigned a negative distance, whereas the "1" voxels are assigned a positive distance. This scene is then interpolated with a scene-based technique and is subsequently converted back to a binary scene by setting a threshold at 0. At the other extreme, the shape of the intensity profile of the input scene is itself considered an "object" to be used to guide interpolation so that this shape is retained as faithfully as possible in the output scene (11). For example, in the interpolation of a 2D scene with this method, the scene is converted into a 3D surface of intensity profile wherein the height of the surface represents pixel intensities. This (binary) object is then interpolated with a shape-based method. Several methods exist between these two extremes (12,13). The shape-based methods have been shown to produce more accurate results (811) than most of the commonly used scene-based methods.
Figure 6 demonstrates binary shape-based interpolation of an image derived from CT data at coarse and fine levels of discretization. The original 3D scene was first assigned a threshold to create a binary scene. This binary scene was then interpolated at coarse (Fig 6a) and fine (Fig 6b) levels and surface rendered.

View larger version (146K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 6a. Shape-based interpolation of a binary CT scene created by designating a threshold. (a) CT scene after shape-based interpolation at a "coarse" resolution and subsequent surface rendering. (b) The same scene after interpolation at a "fine" resolution and surface rendering.
|
|

View larger version (141K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 6b. Shape-based interpolation of a binary CT scene created by designating a threshold. (a) CT scene after shape-based interpolation at a "coarse" resolution and subsequent surface rendering. (b) The same scene after interpolation at a "fine" resolution and surface rendering.
|
|
The challenge in interpolation is to identify specific object information and incorporate it into the process. With such information, the accuracy of interpolation can be improved.
Registration
Registration takes two scenes or objects as input and outputs a transformation that, when applied to the second scene or object, matches it as closely as possible to the first. Its purpose is to combine scene or object information from multiple modalities and protocols to determine change, growth, motion, and displacement of objects as well as aid in object identification. Registration may be either scene-based or object-based.
Scene-based Registration.To match two scenes, a rigid transformation made with translation and rotation (and often scaling) is calculated for one scene S2 such that the intensity pattern of the transformed scene (S2') matches that of the first scene (S1) as closely as possible (Fig 7) (14). Methods differ with respect to the matching criterion used and the means of determining which of the infinite number of possible translations and rotations are optimal (15). Scene-based registration methods are also available for cases in which objects undergo elastic (nonrigid) deformation (16).

View larger version (37K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 7a. Scene-based registration. (a) Three-dimensional scenes corresponding to proton-density (PD)weighted MR images of the head obtained in a patient with multiple sclerosis demonstrate a typical "preregistration" appearance. The scenes were acquired at four different times. (b) Same scenes as in a after 3D registration. The progression of the disease (hyperintense lesions around the ventricles) is now readily apparent. At registration, the scenes were re-sectioned with a scene-based interpolation method to obtain sections at the same location.
|
|

View larger version (34K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 7b. Scene-based registration. (a) Three-dimensional scenes corresponding to proton-density (PD)weighted MR images of the head obtained in a patient with multiple sclerosis demonstrate a typical "preregistration" appearance. The scenes were acquired at four different times. (b) Same scenes as in a after 3D registration. The progression of the disease (hyperintense lesions around the ventricles) is now readily apparent. At registration, the scenes were re-sectioned with a scene-based interpolation method to obtain sections at the same location.
|
|
Object-based Registration.In object-based registration, two scenes are registered on the basis of object information extracted from the scenes. Ideally, the two objects should be as similar as possible. For example, to match 3D scenes of the head obtained with MR imaging and PET, one may use the outer skin surface of the head as computed from each scene and match the two surfaces (17). Alternatively (or in addition), landmarks such as points, curves, or planes that are observable in and computable from both scenes as well as implanted objects may be used (1820). Optimal translation and rotation parameters for matching the two objects are determined by minimizing some measure of "distance" between the two (sets of) objects. Methods differ as to how distances are defined and optimal solutions are computed.
Rigid object-based registration is illustrated in Figure 8. In contrast, deformable matching operations can also be used on objects (21,22). These operations may be more appropriate than rigid matching for nonrigid soft-tissue structures. Typically, a global approximate rigid matching operation is performed, followed by local deformations for more precise matching. Deformable registration is also used to match computerized brain atlases to brain scene data obtained in a given patient (23). Initially, some object information has to be identified in the scene. This procedure has several potential applications in functional imaging, neurology, and neurosurgery as well as in object definition per se.

View larger version (70K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 8. Rigid object-based registration. Sequence of 3D MR imaging scenes of the foot allows kinematic analysis of the midtarsal joints. The motion (ie, translation and rotation) of the talus, calcaneus, and navicular and cuboid bones from one position to the other is determined by registering the bone surfaces in the two different positions.
|
|
The challenge in registration is that scene-based methods require that the intensity patterns in the two scenes be similar. This is often not the case, however. Converting scenes into fuzzy (nonbinary) object descriptions that retain object gradation can potentially overcome this but may still retain the strength of scene-based methods. Deformable fuzzy object matching seems natural and appropriate in most situations but will require the development of fuzzy mechanics theory and algorithms.
Segmentation
From a given set of scenes, segmentation outputs computer models of object information captured in the scenes. Its purpose is to identify and delineate objects. Segmentation consists of two related tasks: recognition and delineation.
Recognition.Recognition consists of roughly determining the whereabouts of an object in the scene. In Figure 3, for example, recognition involves determining that "this is the femur and this is the patella." This task does not involve the precise specification of the region occupied by the object.
Recognition may be accomplished either automatically or with human assistance. In automatic (knowledge- and atlas-based) recognition, artificial intelligence methods are used to represent knowledge about objects and their relationships (2426). Preliminary delineation is usually needed in these methods to extract object components and to form and test hypotheses related to whole objects.
A carefully created "atlas" consisting of a complete description of the geometry and interrelationships of objects is used (16,27,28). Some delineation of object components in the given scene is necessary. This information is used to determine the mapping necessary to transform voxels or other geometric elements from the scene space to the atlas. Conversely, the information is also used to deform the atlas so that it matches the delineated object components in the scene.
In human-assisted recognition, simple assistance is often sufficient to help solve a segmentation problem. This assistance may take several forms: for example, specification of several "seed" voxels inside the 3D region occupied by the object or on its boundary, creation of a box (or some other simple geometric shape) that just encloses the object and can be quickly specified, or a click of a mouse button to accept a real object (eg, a lesion) or reject a false object.
Delineation.Delineation involves determining the precise spatial extent and composition of an object including gradation. In Figure 3, if bone is the object system of interest, then delineation consists of defining the spatial extent of the femur and patella separately and specifying an "objectness" value for each voxel in each object. Once the objects are defined separately, the femur and the patella can be visualized, manipulated, and analyzed individually.
Delineation may be accomplished with a variety of methods. Often, delineation is itself considered to be the entire segmentation problem, in which case related solutions are considered to be solutions to the segmentation problem. However, it is helpful to distinguish between recognition and delineation to understand and help solve the difficulties encountered in segmentation. Approaches to delineation can be broadly classified as boundary-based or region-based.
In boundary-based delineation, an object description is output in the form of a boundary surface that separates the object from the background. The boundary description may take the form of a hard set of primitives (eg, points, polygons, surface patches, voxels) or a fuzzy set of primitives such that each primitive has an assigned grade of "boundariness."
In region-based delineation, an object description is output in the form of the region occupied by the object. The description may take the form of a hard set of voxels or of a fuzzy set such that each voxel in the set has an assigned grade of "objectness." With the former method, each voxel in the set is considered to contain 100% object material; with the latter method, this value may be anywhere from 0% to 100%.
Object knowledge usually facilitates recognition and delineation of that object. Paradoxically, this implies that segmentation is required for effective object segmentation. As we have noted, segmentation is needed to perform most of the preprocessing operations in an optimal fashion. It will be seen later that segmentation is essential for most visualization, manipulation, and analysis tasks. Thus, segmentation is the most crucial among all 3D imaging operations and also the most challenging.
Knowledgeable human beings usually outperform computer algorithms in the high-level task of recognition. However, carefully designed computer algorithms outperform human beings in achieving precise, accurate, and efficient delineation. Clearly, human delineation cannot account for graded object composition. Most of the challenges in completely automating segmentation may be attributed to shortcomings in computerized recognition techniques and the lack of delineation techniques that can handle graded composition and hanging-togetherness.
There are eight possible combinations of approaches to recognition and delineation, resulting in eight different methods of segmentation.
With hard, boundary-based automatic segmentation, thresholding and isosurfacing are most commonly used (2932). In these techniques, a scene intensity threshold is specified and the surface that separates voxels with an intensity above the threshold from those with an intensity below the threshold is computed. Methods differ as to how the surface is represented and computed and whether surface connectedness is taken into account. The surface may be represented in terms of voxels, voxel faces, points, triangles, or other surface elements. If connectedness is not used, the surface obtained from a scene will combine discrete objects (eg, the femur and the patella in Fig 3); with connectedness, each of the objects can be represented as a separate surface (assuming they are separated in the 3D scene). In Figure 6, the isosurface is connected and is represented as a set of faces of voxels (29).
In addition to scene intensity threshold, intensity gradient has also been used in defining boundaries (33).
Another method of segmentation is fuzzy, boundary-based automatic segmentation. Concepts related to fuzzy boundaries (eg, connectedness, closure, orientedness) that are well established for hard boundaries are difficult and as yet undeveloped. However, computational methods have been developed that identify only those voxels that are in the vicinity of the object boundary and that assign each voxel a grade of "boundariness" (34,35). These methods use scene intensity or intensity gradient to determine boundary gradation (Fig 9).
In hard, boundary-based, human-assisted segmentation, the degree of human assistance ranges from tracing the boundary entirely by hand (manual recognition and delineation) to specifying only a single point inside the object or on its boundary (manual recognition and automatic delineation) (3641). In routine clinical applications, manual boundary tracing is perhaps the most commonly used method. On the other hand, boundary detection methods requiring simple user assistance based on intensity (36,37) and gradient criteria (3841) have been developed. However, these methods cannot be guaranteed to always work correctly in large applications.
There are many user-assisted methods besides those just described that require different degrees of human assistance for segmentation of each scene (4248). In view of the inadequacy of the minimally userassisted methods mentioned earlier, much effort is currently being directed toward developing methods that take a largely manual approach to recognition and a more automatic approach to delineation. These methods go under various names: active contours or snakes (4244), active surfaces (45,46), and live-wire (live-lane) (47,48).
In active contour and active surface methods, a boundary is first specified (eg, by creating a rectangle or a rectangular box close to the boundary of interest). The boundary is considered to have certain stiffness properties. In addition, the given scene is considered to exert forces on the boundary whose strength depends on the intensity gradients. For example, a voxel exerts a strong attractive force on the boundary if the rate of change in intensity of the voxel is high. Within this static mechanical system, the initial boundary deforms and eventually assumes a shape for which the combined potential energy is at a minimum. Unfortunately, the steady-state shape is usually impossible to compute. Furthermore, whatever shape is accepted as an alternative may not match with the desired boundary, in which case further correction of the boundary is needed. In assessing the effectiveness of these segmentation methods, it is important to evaluate their precision (repeatability) and efficiency (defined in terms of the number of scenes that can be segmented per unit time). Such evaluations have not been performed for methods described in the literature.
The principles underlying live-wire (live-lane) methods (47,48) are different from those for active boundary methods. In live-wire methods, every pixel edge is considered to represent two directed edges whose orientations are opposite each other. The "inside" of the boundary is considered to be to the left of the directed edge, and its outside to the right. Each directed edge is assigned a cost that is inversely related to the "boundariness" of the edge. Several local features are used to determine the cost and include intensity to the left (inside) and right (outside) as well as intensity gradient and its direction. In the 2D live-wire method, the user initially selects a point (pixel vertex) vo on the boundary of interest. The computer now shows a "live-wire" segment from vo to the current mouse cursor position v. This segment is an oriented path consisting of a connected sequence of directed pixel edges that represents the shortest possible path from vo to v. As the user changes v through mouse movement, the optimal path is computed and displayed in real time. If v is on or close to the boundary, the live wire "snaps" onto the boundary (Fig 10); v is now deposited and becomes the new starting point and the process continues. Typically, two to five points are sufficient to segment a boundary (Fig 10). This method and its derivatives are shown to be two to three times faster and statistically significantly more repeatable than manual tracing (47). Its 3D version (48) is about 315 times faster than manual tracing. Note that, in this method, recognition is manual but delineation is automatic.

View larger version (148K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 10. Live-wire segmentation. Section created on the basis of data from an MR image of the foot shows a live-wire segment representing a portion of the boundary of interest, which in this case outlines the talus.
|
|
To our knowledge, no fuzzy, boundary-based, human-assisted methods have been described in the literature.
The most commonly used hard, region-based, automatic method of segmentation is thresholding (Fig 11). A voxel is considered to belong to the object region if its intensity is at an upper or lower threshold or between the two thresholds. If the object is the brightest in the scene (eg, bone in CT scans), then only the lower threshold needs to be specified. The threshold interval is specified with a scene intensity histogram in Figure 11b, and the segmented object is shown as a binary scene in Figure 11c.

View larger version (114K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 11a. Hard, region-based, automatic segmentation with use of thresholding. Once the desired scene is selected (a), an intensity interval is specified on a histogram (b). The segmented object is then depicted as a binary scene (c).
|
|

View larger version (75K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 11b. Hard, region-based, automatic segmentation with use of thresholding. Once the desired scene is selected (a), an intensity interval is specified on a histogram (b). The segmented object is then depicted as a binary scene (c).
|
|

View larger version (63K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 11c. Hard, region-based, automatic segmentation with use of thresholding. Once the desired scene is selected (a), an intensity interval is specified on a histogram (b). The segmented object is then depicted as a binary scene (c).
|
|
Another commonly used method is clustering (Fig 12). If, for example, multiple values associated with each voxel are determined (eg, T2 and PD values), then a 2D histogram (also known as a scatter plot) represents a plot of the number of voxels in the given 3D scene for each possible value pair. The 2D histogram of all possible value pairs is usually referred to as a feature space. The idea in clustering is that feature values corresponding to the objects of interest cluster together in the feature space. Therefore, to segment an object, one need only identify and delineate this cluster. In other words, the problem of segmenting the scene becomes the problem of segmenting the 2D scene representing the 2D histogram. In addition to T2 and PD values, it is possible to use computed values such as the rate of change in T2 and PD for every voxel. In this case, the feature space would be four-dimensional. There are several well-developed techniques in the area of pattern recognition (49) for automatically identifying clusters, and these techniques have been extensively applied to medical images (5056).

View larger version (83K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 12a. Clustering. (a) Sections from an MR imaging scene with T2 (top) and PD (bottom) values assigned to voxels. (b) Scatter plot of the sections in a. A cluster outline for cerebrospinal fluid is indicated. (c) Segmented binary section demonstrates cerebrospinal fluid.
|
|

View larger version (63K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 12b. Clustering. (a) Sections from an MR imaging scene with T2 (top) and PD (bottom) values assigned to voxels. (b) Scatter plot of the sections in a. A cluster outline for cerebrospinal fluid is indicated. (c) Segmented binary section demonstrates cerebrospinal fluid.
|
|

View larger version (60K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 12c. Clustering. (a) Sections from an MR imaging scene with T2 (top) and PD (bottom) values assigned to voxels. (b) Scatter plot of the sections in a. A cluster outline for cerebrospinal fluid is indicated. (c) Segmented binary section demonstrates cerebrospinal fluid.
|
|
One of the popular cluster identification methods is the knearest neighbor (kNN) technique (49). Assume, for example, that the problem is segmenting the white matter (WM) region in a 3D MR imaging scene in which the T2 and PD values have been determined for each voxel. The first step would be to identify two sets XWM and XNWM of points in the 2D feature space that correspond to white matter and nonwhite matter regions. These sets of points will be used to determine whether a voxel in the given scene belongs to white matter. The sets are determined with use of a "training" set. Suppose that one or more scenes were previously segmented manually. Each voxel in the white matter and nonwhite matter regions in each scene contributes a point to set XWM or set XNWM. The next step would be to assign a value to k (eg, k = 7), which is a fixed parameter. The location P in the feature space is determined for each voxel v in the scene to be segmented. In this case, the seven points from sets XWM and XNWM that are "closest" to P are determined. If a majority (>4) of these points are from XWM, then v is considered to be in white matter; otherwise, v does not belong to white matter. Note that the step of obtaining XWM and XNWM need not be repeated for every scene to be segmented.
Note also that thresholding is essentially clustering in a one-dimensional feature space. All clustering methods have parameters whose values must be determined somehow. If these parameters are fixed in an application, the effectiveness of the method in routine processing cannot be guaranteed and some user assistance usually becomes necessary eventually.
Examples of other nonclustering methods have been described by Kamber (57) and Wells (58).
The simplest of the fuzzy, region-based, automatic methods of segmentation is fuzzy thresholding, which represents a generalization of the thresholding concept (Fig 13) (59). Fuzzy thresholding requires the specification of four intensity thresholds (t1t4). If the intensity of a voxel v is less than t1 or greater than t4, the objectness of v is 0. If the intensity is between t2 and t3, its objectness is 1 (100%). For other intensity values, objectness lies between 0% and 100%. Other functional forms have also been used. Figure 14 shows a rendition of bone and soft tissue identified with fuzzy thresholding on the basis of the CT data from Figure 3.
Many of the clustering methods can be generalized to output fuzzy object information. For example, in the kNN method described previously, if a number m of the k points closest to P is from XWM, then the objectness ("white matterness") of v is m/k. Note that the fuzzy thresholding described earlier is a form of fuzzy clustering. One approach to more generalized fuzzy clustering is the fuzzy c-means method (60). The application of this method has been investigated for segmenting brain tissue components in MR images (50). The idea is something like this: Suppose there are two types of tissues, white matter and gray matter, to be segmented in a 3D MR imaging scene, and that the feature space is 2D (composed of T2 and PD values). Actually, three classes must be considered: white matter, gray matter, and everything else. The task is to define three clusters in the 2D scatter plot of the given scene that correspond to these three classes. The set X of points to which the given scene maps in the feature space can be partitioned into three clusters in a large (although finite) number of ways. In the hard c-means method, the objective is to choose that particular cluster arrangement for which the sum (over all clusters) of the squared distances between the points in each cluster and the cluster center is the smallest. In the fuzzy c-means method, each point in X is allowed to have an objectness value between 0 and 1, making the number of cluster arrangements infinite. The distance in the criterion to be minimized is modified by the objectness value. Algorithms have been described for both methods that are designed to find clusters that approximately minimize the pertinent criterion.
As with hard clustering methods, the effectiveness of fuzzy clustering methods in routine applications cannot be guaranteed because some user assistance on a per-scene basis is usually needed.
The simplest of the hard, region-based, human-assisted methods of segmentation is manual painting of regions with a mouse-driven paintbrush (61). This method is an alternative to manual boundary tracing.
In contrast to this completely manual recognition and delineation scheme, there are methods in which recognition is manual but delineation is automatic. Region growing is a popular technique in this group (6264). At the outset, the user specifies a seed voxel within the object region with use of (for example) a mouse pointer on a section display. A set of criteria for inclusion of a voxel in the object is also specified; for example, (a) the scene intensity of the voxel should be within an interval t1 to t2, (b) the mean intensity of voxels included in the growing region at any time during the growing process should be within an interval t3 to t4, and (c) the intensity variance of voxels included in the growing region at any time during the growing process should be within an interval t5 to t6. Starting with the seed voxel, the algorithm examines its 3D neighbors (usually the closest six, 18, or 26 neighbors) for inclusion. Those that are included are marked so that they will not be reconsidered for inclusion later. The neighbors of the voxels selected for inclusion are in turn examined, and the process continues until no more voxels can be selected for inclusion.
If only criterion a in the preceding paragraph is used and t1 and t2 are fixed during the growing process, this method outputs essentially a connected component of voxels satisfying a hard threshold interval. Note also that for any combination of criteria a and b, or if t1 and t2 are not fixed, it is not possible to guarantee that the set of voxels (object) O(vl) obtained with a seed voxel vl is the same as object O(v2), where v2
v1 is a voxel in O(v1). This lack of robustness constitutes a problem with most region-based methods.
In the sense that the fuzzy region-based methods of segmentation described earlier eventually entail human assistance, they fall into the category of fuzzy, region-based, human-assisted methods. A recent technique that was designed to make use of human assistance is the fuzzy connected technique (65). In this method, recognition is manual and involves pointing at an object in a section display. Delineation is automatic and takes both graded composition and hanging-togetherness into account. It has been effectively applied in several applications including quantification of multiple sclerosis lesions (6669), MR angiography (70), and soft-tissue display for planning of craniomaxillofacial surgery (71).
In the fuzzy connected technique, nearby voxels in the voxel array are thought of as having a fuzzy adjacency relation that indicates their spatial nearness. This relation, which varies in strength from 0 to 1, is independent of any scene intensity values and is a nonincreasing function of the intervening distance. Fuzzy adjacency roughly captures the blurring characteristic of imaging devices.
Similarly, nearby voxels in a scene are thought of as having a fuzzy affinity relation that indicates how they hang together locally in the same object. The strength of this relation (varying from 0 to 1) between any two voxels is a function of their fuzzy adjacency as well as their scene intensity values. For example, this function may be the product of the strength of their adjacency and (l - | I[vl] - I[v2] |), where I[v1] and I[v2] are the intensity values of voxels v1 and v2 scaled in some appropriate way to the range between 0 and 1. Affinity expresses the degree to which voxels hang together to form a fuzzy object. Of course, the intent is that this is a local property; voxels that are far apart will have negligible affinity as defined in this function. The real "hanging-togetherness" of voxels in a global sense is captured through a fuzzy relation called fuzzy connectedness. A strength of connectedness is assigned to each pair of voxels (vl, v2) as follows: There are numerous possible paths between two voxels v1 and v2, any one of which consists of a sequence of voxels starting from v1 and ending on v2. Successive voxels are nearby and have a certain degree of adjacency. The "strength" of a path is simply the smallest of the affinities associated with pairs of successive voxels along the path. The strength of connectedness between v1 and v2 is simply the largest of the strengths associated with all possible paths between v1 and v2. A fuzzy object is a pool of voxels together with a membership (between 0 and 1) assigned to each voxel that represents its objectness. The pool is such that the strength of connectedness between any two voxels in the pool is greater than a small threshold value (typically about 0.1) and the strength between any two voxels (only one of which is in the pool) is less than the threshold value. Obviously, computing fuzzy objects even for this simple affinity function is computationally impractical if we proceed straight from the definitions. However, the theory allows us to simplify the complexity considerably for a wide variety of affinity relations so that fuzzy object computation can be done in practical time (about 1520 minutes for a 256 x 256 x 64 3D scene (16 bits per voxel) on a SPARCstation 20 workstation (Sun Microsystems, Mountain View, Calif). A wide spectrum of application-specific knowledge of image characteristics can be incorporated into the affinity relation.
Figure 15 shows an example of fuzzy connected segmentation (in 3D) of white matter, gray matter, cerebrospinal fluid, and multiple sclerosis lesions in a T2, PD scene pair. Figure 16a shows an MIP rendition of an MR angiography data set, whereas Figure 16b demonstrates a rendition of a 3D fuzzy connected vessel tree detected from a point specified on the vessel.

View larger version (123K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 15a. Fuzzy connected segmentation. (a, b) Sections from an MR imaging scene with T2 (a) and PD (b) values assigned to voxels. (ce) Sections created with 3D fuzzy connected segmentation demonstrate the union of white matter and gray matter objects (c), the cerebrospinal fluid object (d), and the union of multiple sclerosis lesions (e) detected from the scene in a and b.
|
|

View larger version (119K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 15b. Fuzzy connected segmentation. (a, b) Sections from an MR imaging scene with T2 (a) and PD (b) values assigned to voxels. (ce) Sections created with 3D fuzzy connected segmentation demonstrate the union of white matter and gray matter objects (c), the cerebrospinal fluid object (d), and the union of multiple sclerosis lesions (e) detected from the scene in a and b.
|
|

View larger version (120K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 15c. Fuzzy connected segmentation. (a, b) Sections from an MR imaging scene with T2 (a) and PD (b) values assigned to voxels. (ce) Sections created with 3D fuzzy connected segmentation demonstrate the union of white matter and gray matter objects (c), the cerebrospinal fluid object (d), and the union of multiple sclerosis lesions (e) detected from the scene in a and b.
|
|

View larger version (70K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 15d. Fuzzy connected segmentation. (a, b) Sections from an MR imaging scene with T2 (a) and PD (b) values assigned to voxels. (ce) Sections created with 3D fuzzy connected segmentation demonstrate the union of white matter and gray matter objects (c), the cerebrospinal fluid object (d), and the union of multiple sclerosis lesions (e) detected from the scene in a and b.
|
|

View larger version (35K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 15e. Fuzzy connected segmentation. (a, b) Sections from an MR imaging scene with T2 (a) and PD (b) values assigned to voxels. (ce) Sections created with 3D fuzzy connected segmentation demonstrate the union of white matter and gray matter objects (c), the cerebrospinal fluid object (d), and the union of multiple sclerosis lesions (e) detected from the scene in a and b.
|
|

View larger version (132K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 16a. Fuzzy connected segmentation. (a) Three-dimensional maximum-intensity-projection (MIP) rendition of an MR angiography scene. (b) MIP rendition of the 3D fuzzy connected vessels detected from the scene in a. Fuzzy connectedness has been used to remove the clutter that obscures the vasculature.
|
|

View larger version (88K):
[in this window]
[in a new window]
[Download PPT slide]
|
Figure 16b. Fuzzy connected segmentation. (a) Three-dimensional maximum-intensity-projection (MIP) rendition of an MR angiography scene. (b) MIP rendition of the 3D fuzzy connected vessels detected from the scene in a. Fuzzy connectedness has been used to remove the clutter that obscures the vasculature.
|
|
There are a number of challenges associated with segmentation, including (a) developing general segmentation methods that can be easily and quickly adapted to a given application, (b) keeping human assistance required on a per scene basis to a minimum, (c) developing fuzzy methods that can realistically handle uncertainties in data, and (d) assessing the efficacy of segmentation methods.
 |
VISUALIZATION
|
|---|
Visualization operations create renditions of given scenes or object systems. Their purpose is to create renditions from a given set of scenes or objects that facilitate the visual perception of object information. Two approaches are available: scene-based visualization and object-based visualization.
Scene-based Visualization
In scene-based visualization, renditions are created directly from given scenes. Within this approach, two further subclasses may be identified: section mode and volume mode.
Section Mode.Methods differ as to what constitutes a "section" and how this information is displayed. Natural sections may be axial, coronal, or sagittal; oblique or curved sections are also possible. Information is displayed as a montage with use of roam-through (fly-through) and gray scale and pseudocolor. Figure 17 shows a montage display of the natural sections of a CT scene. Figure 18 demonstrates a 3D displayguided extraction of an oblique section from a CT scene of a pediatric patient's head. This re-sectioning operation illustrates how visualization is needed to perform visualization itself. Figure 19 illustrates pseudocolor display with two sections from a brain MR imaging study in a patient with multiple sclerosis. The two sections, which represent approximately the same location in the patient's head, were taken from 3D scenes that were obtained at different times and subsequently registered. The sections are assigned red and green hues. The display shows yellow (produced by a combination of red and green hues) whe