|
|
||||||||
1 From the Department of Biomedical Informatics, Ohio State University, 3190 Graves Hall, 333 W 10th Ave, Columbus, OH 43210 (T.C.P., M.N.G., S.A.L., S.W.O., S.L.H., A.S., B.G.R., D.W.E., T.M.K., J.H.S.); VA Maryland Health Care System, Baltimore, Md (K.M.S., E.L.S.); and University of Maryland School of Medicine, Baltimore, Md (E.L.S.). Presented as an infoRAD exhibit at the 2005 RSNA Annual Meeting. Received August 17, 2006; revision requested September 22 and received November 8; accepted December 20. Supported in part by the National Cancer Institute, the National Science Foundation (CNS-0509326, CNS-0403342, ANI-0330612), the National Institutes of Health (NIBIB BISTI P20EB000591), and the Ohio Board of Regents (BRTTC BRTT02 0003, ODOD-AGMT-TECH-04 049). M.N.G. is a stockholder in iCAD. K.M.S. is a speaker for TeraRecon, San Mateo, Calif; cofounder of iVirtuoso, Baltimore, Md; and a member of the advisory board of GE Healthcare IT, Barrington, Ill. E.L.S. received research funding from GE Healthcare. All other authors have no financial relationships to disclose. Address correspondence to T.C.P. (e-mail: tpan{at}bmi.osu.edu).
| Abstract |
|---|
|
|
|---|
© RSNA, 2007
| Introduction |
|---|
|
|
|---|
Grid computing grew out of the work of many computer scientists in the 1980s and 1990s. Today, the most widely accepted grid software system is the Globus toolkit, which was developed by Ian Foster, Carl Kesselman, and Steve Tuecke. The Globus toolkit is an open-source framework for computer processing and storage management, security, data movement, and monitoring. Unlike traditional computer clusters or distributed computing, grid computing provides support for computation across multiple disparate administrative domains. Grid computing makes it possible to unite resources from different computer platforms, with different architectures, using different computer languages, and in multiple locations, over a single network by using open standards. This enables what has been referred to as virtualization of computing resources. It provides efficient and secure ways to share resources, including data, software applications, computational capabilities, and storage capacity, by using open protocols and standardized service interfaces. It creates unprecedented possibilities for new forms of collaborative investigation as well as more powerful clinical and research applications. The Globus toolkit has been used to support a wide variety of applications in the physical sciences, engineering, and biomedicine.
Perhaps the best-known application of grid computing is the project SETI@home (http://setiweb.ssl.berkeley.edu), in which hundreds of thousands of personal computers with many different software platforms (including Microsoft Windows, Linux, and Macintosh operating systems) are connected via the Internet in a cooperative and coordinated search for radio signals with nonnatural patterns that might indicate extraterrestrial intelligence. The application uses data collected by the Arecibo radio telescope in Puerto Rico and employs the Berkeley Open Infrastructure for Network Computing grid toolkit. The application coordinates the processing of the complex radio telescope signals by dividing them into multiple small segments that can be analyzed individually on personal computers using the SETI@home screen saver programs.
Grid computing is particularly well suited to complex and computationally demanding applications in medical imaging, such as computer-aided detection (CAD). Especially intriguing is the possibility of using CAD software programs from different vendors in a cooperative manner to enhance the performance and accuracy of lung nodule detection in a single image data set. The nodule candidates selected in a composite automated reading then could be combined or compared with those identified by one or more human observers for research purposes or for clinical image interpretation.
Lung cancer is currently the most common cause of cancer deaths among both men and women. According to a recent report from the American Cancer Society (2), more people in the United States died of this disease in 2005 than of breast, prostate, and colon cancers combined. A number of authors have suggested that a substantial percentage of clinically significant lung lesions missed in routine clinical interpretation of thoracic computed tomographic (CT) studies might be detected with the use of CAD systems (37). This technology is increasingly being applied to other imaging modalities and disease states as well. For example, CAD systems are used for the detection of masses and microcalcifications at mammography, lung nodules at chest radiography, and polyps at CT colonography (so-called virtual colonoscopy) (8,9). CAD algorithms have improved in accuracy and ease of use, while medical image data sets rapidly have increased in size, making such algorithms increasingly valuable in clinical practice. Changes in reimbursement also are spurring the rapid incorporation of CAD algorithms into routine practice, as is a growing body of literature supporting the ability of CAD systems to increase diagnostic accuracy (particularly sensitivity) when used in combination with human readers. However, the additional time required for use of CAD systems suggests a need for the development of modifications in workflow and in how CAD is used to streamline the interpretation process.
CAD can be thought of as a computer vision system that uses advanced pattern recognition and image analysis techniques to automatically detect medical abnormalities. Current commercially available and experimental CAD systems operate on local data sources. In most practices, a CAD system from a single vendor is used at a specific location. In this article, we describe gridCAD, a software system that integrates into a grid framework different CAD programs from multiple vendors, thereby creating an infrastructure that allows invocation of multiple CAD algorithms in parallel on one or more image data sets. The innovative use of grid computation in gridCAD offers the potential to greatly increase the accuracy and speed of image analysis by sharing data as well as computational resources. This approach also enables the creation of a consensus among multiple CAD systems and the combination of the CAD systembased interpretation with interpretations from one or more radiologists in one or more locations.
| Anatomy of a Health Care Grid |
|---|
|
|
|---|
|
The infrastructure provided by the health care grid data service allows a researcher, radiologist, or clinician to easily create, manage, and advertise the availability of new data; to search within existing data sources; to query and retrieve interesting data subsets; and, finally, to integrate the retrieved data according to specified research or clinical requirements. The retrieved data conform to well-defined published standard schemas, and their conformity obviates user knowledge of the particular data formats used by different researchers and clinicians.
Analysis Services
Once data (eg, CT images of the thorax) have been integrated into a virtual data repository, they can be processed to extract meaningful information. The algorithms, tools, and applications that perform image analysis and data mining operations are advertised as grid analysis services that are shared across the Internet. Existing applications can be incorporated into the grid as analysis services, or new tools may be developed, to leverage the distributed processing capability of the grid.
Computing Services
As data sets increase in size and resolution and as algorithms increase in sophistication, computational requirements may exceed local storage and computational capacities. In such cases, "compute farms," which consist of dedicated systems for computation, can enable large-scale analysis. Compute farms can advertise their computational capabilities on a grid. Users can direct their data and applications to these farms and can access the farms resources according to a preassigned level of authorization. Users and institutions can use grid computing to pool data, applications, and computing resources, thus creating a better and more collaborative research environment. Alternatively, users can offer excess computing capacity back to the grid, in a process analogous to an individual selling excess solar electricity to the power company on a supply grid.
Middleware Support Services
The distribution of resources within a grid necessitates means of locating services within that grid environment. Figure 1 shows the service registry, semantic registry, protocol registry, and security infrastructure, which are key components of the generic grid middleware infrastructure. The service registry provides a way for data and analysis services to advertise their existence and their capabilities. A user then can locate the appropriate services on the basis of capabilities, hardware and software requirements, input and output data formats, and parameters. Once the service location and invocation parameters are identified, the service can be invoked by using a properly formatted request.
Data and analysis service requests and responses must follow standard published protocols to ensure interoperability between the client and the service as well as among services. The protocols are published in globally and publicly accessible protocol registries. Communication protocols are typically specified by extensible markup language (XML) schemas, and requests and responses are transmitted as XML instance documents. Some grids (eg, the Cancer Biomedical Informatics Grid [caBIG]) contain semantic registries that are designed to manage vocabulary and data elements used by grid applications and services. These vocabularies and common data elements generally include controlled vocabularies and data models produced by various medical application communities and enable much more complex data mining and analysis than would be possible otherwise.
Data and algorithm sharing in a health care environment requires careful planning for security infrastructure. It is the responsibility of the security service to maintain proper access to the data through user and analysis service authentication and authorization. Some data may include patient information and thus must conform to requirements of the Health Insurance Portability and Accountability Act (HIPAA) and the Joint Commission on Accreditation of Healthcare Organizations, as well as specific requirements of individual states and institutional review boardapproved research protocols. In a research setting, the security service also may provide the deletion of patient-identifying information from clinical data. The security service can be used to restrict access to proprietary analysis services to specified personnel.
Health Care Grid Application User Interfaces
The components of the grid are linked via an application-specific user interface (Fig 2). The user interface defines workflow and, consequently, the flow of data among different components in the grid environment. The user interface also provides mechanisms for retrieving and reviewing data and analysis results.
|
| Data Sharing and Aggregation |
|---|
|
|
|---|
| Cancer Biomedical Informatics Grid |
|---|
|
|
|---|
GridCAD, the software application described in this article and demonstrated at the 2005 RSNA Scientific Assembly and Annual Meeting, was constructed by using caGrid, which was developed as part of the caBIG initiative (Fig 2). CaGrid is a middleware infrastructure that provides a communications layer for applications to interact across different hardware and network environments. It is also a toolkit that supports the development of grid-enabled, caBIG-compliant applications. The caGrid toolkit leverages the Globus toolkit, NCI Cancer Data Standards Repository, Global Model Exchange (an XML schema management system produced by the Mobius project), and Open Grid Services ArchitectureData Access Integration. The Globus toolkit provides the following set of core components for the development of grid applications (10): security, including authentication, authorization, and credential management; data management, including data transfer and associated optimizations; execution management and resource allocation; information services for monitoring and discovery; and a common runtime library for application development support.
| GridCAD Implementation |
|---|
|
|
|---|
As discussed previously, a grid computing environment has several components, including data services, analysis services, middleware support services, computing infrastructure, and user interfaces. Each of these is implemented and utilized in gridCAD for the lung cancer application. The objectives of the gridCAD framework are achieved by exposing a CAD algorithm as a grid-aware service and by facilitating the easy and secure exchange of images and CAD results. One approach for exposing an application or a data source as a grid service is to wrap it inside a layer that facilitates interaction and communication with other grid components while leaving the original unmodified. The wrappers that we used come from the caGrid toolkit. In gridCAD (Fig 2), the following grid components are implemented: CAD analysis services, which invoke CAD systems and manage the flow of data; image data services, which provide interfaces to the data repositories (eg, a picture archiving and communication system [PACS] server); middleware support services, which provide operational support such as storage and communication schemas, data security, application invocation, and CAD result storage (in repositories such as the Mako XML database, a product of the open-source Mobius project); and user interfaces, which allow query, original image preview, and CAD system marking review.
| Reference Implementation in Lung Cancer Detection |
|---|
|
|
|---|
CAD analysis services were implemented as wrappers for lung nodule CAD algorithms from Siemens Medical Solutions (Malvern, Pa) and iCAD (Nashua, NH). Each of these algorithms was treated as a black box with a well-defined command-line interface. A separate analysis service wrapper was developed for each CAD algorithm because the algorithms were not developed for a grid computing environment and have different interfaces for invocation and different format requirements for data input and output.
The outputs of the two CAD algorithms were presented to the user as overlays on the CT images. Each nodule candidate was marked with a circle or a square of different colors. The user could scroll through the images as well as change the intensity window and level during the review. Figure 3 shows the outputs of the two CAD algorithms. The demonstration of gridCAD at the 2005 RSNA annual meeting showed how a radiologist could incorporate this system into his or her routine workflow by performing image interpretation and then using a combination or consensus of markings from multiple CAD algorithms. The demonstration also illustrated how interpretations from other radiologists could be combined with the CAD system markings.
|
| Summary and Future Prospects |
|---|
|
|
|---|
The number of radiology-related applications of grid computing is increasing. Such applications include MammoGrid, a pan-European database that allows access to mammograms by using a grid-based software (11). One objective of the MammoGrid project is to extract tissue-level information (eg, the number and location of micro-calcifications) for use in clinical studies. In another European effort, a CAD system for mammographic analysis (Computer Assisted Library for Mammography) has been integrated with a grid-based mammographic reading environment for use in the detection of masses and microcalcifications (12). Grid-based CAD applications for the detection of breast cancer and Alzheimer disease also have been under development by the Medical Application on a Grid Infrastructure Connection5 project group (http://www.magic5.unile.it).
One of the most intriguing practical applications of the grid in health care is its use as a mechanism to achieve unified access to multiple analysis services. In addition to cancer detection, tasks such as tumor volume measurement, therapy efficacy assessment, and parameter extraction from dynamic contrast materialenhanced magnetic resonance (MR) imaging data could be performed at the same time by using the most suitable algorithms.
The development and evaluation of effective algorithms require access to a large number of cases from different geographic locations so that variations in the population are adequately modeled. A traditional approach would involve the collection and transfer of image data to a central location. With the use of grid computing, widely distributed data are easily shared and accessible for development and evaluation. Once developed, image processing and analysis algorithms can be validated by using a large number of cases from several institutions before regulatory approval is sought. A grid system thus may facilitate a faster and hence more affordable regulatory approval process.
One of the primary goals of grid computing is to remove the dependence of an application on the underlying hardware and to deliver the application as a utility. As the complexity of a clinical or research imaging study increases, more computing resources can be recruited easily. The gridCAD system can provide this capability by means of CAD compute farms where dedicated clusters of high-performance computers run various CAD algorithms. A user or an institution can leverage these grid computing resources without having to support and maintain the computer equipment and can obtain resources that are scalable to the complexity of the task.
Moreover, grid support for algorithm and application workflows allows algorithm components to be assembled into data processing pipelines. A typical CAD program consists of several independent software modules and is built for a single computer. For example, a lung nodule detection program may consist of modules for the detection of lung contours, the segmentation of nodule candidates, and the reduction of false-positive nodule candidates. Different modules have different computational and memory requirements. The ability to rewrite these algorithms to support the distribution of computational modules and data storage across many computing resources has significant potential for improving the speed and performance of CAD and other complex algorithms.
The service-based architecture of gridCAD allows easy integration of additional algorithms and data sources by using caBIG tools and standards. This ease of integration facilitates the development and deployment of other disease-specific applications, such as mammographic CAD. Unified communication protocols allow additional data sources and analysis services to be added easily and in an ad hoc fashion.
In the current implementation of gridCAD, much of the focus has been on creating an operational infrastructure for running CAD algorithms on a grid. For clinical and research deployment, additional security features must be incorporated, including authentication and authorization of users and grid services, masking or deletion of confidential or patient-identifying information from clinical data, and secure transmission of information. Our primary implementation focus, consequently, is to address these features. Security in gridCAD will leverage an ongoing effort in the caBIG architecture workspace, specifically caGrids Grid User Management Service and Common Attribute Management Service. These services provide security and authentication capabilities, leveraging the Globus toolkit security components for management of users, their credentials, and usage permissions.
As the diagnostic imaging data obtained with CT and MR imaging increase in spatial and temporal resolution as well as in overall complexity, the amount of data stored per patient also increases dramatically. Latency in the transfer of data across the grid for remote image review increases with data size, thus adversely affecting the users ability to dynamically interact with databases. An efficient method of transferring large amounts of image data is important to the success of grid-based systems such as gridCAD. Efforts are already under way in the biomedical field to support large data transfers by using approaches such as data streaming, data compression with the Joint Photographic Experts Group 2000 Interactive Protocol, multiresolution data compression, and region-of-interestbased data transfer (the transfer of portions of images). However, some of these technologies have not yet been incorporated as use cases for caGrids communication protocols. High-performance data transfer remains an area of future work for gridCAD as well as caGrid.
Although high-performance data transfer may improve the transfer latency for an individual data set, large numbers of image data sets still present challenges for efficient data movement. Clinical trials that involve multiple sites, each with large numbers of image data sets, may further compound the problem. The resulting data transfer time on the grid can become a bottleneck for CAD system performance.
Moving algorithms to data repositories instead of moving data to analysis services reduces the amount of data transfer and may enhance overall system performance in certain scenarios (Fig 4). This requires a significant amount of middleware support for runtime transmission of algorithms and installation at remote grid services. We plan to implement this capability in future gridCAD versions, with the support of the caGrid toolkit and CAD system vendors.
|
Grid computing has tremendous potential to create health care benefits; the medical imaging community has only begun to explore the possibilities. Promising grid computing applications include teleradiology services, distributed and remote image processing and analysis, quality assurance and research, and clinical data mining.
| TAKE-HOME POINTS |
|---|
|
|
|---|
Grid computing allows efficient and secure sharing of data, software applications, computational resources, and storage capacity by using open protocols and standardized service interfaces.
A generic grid may be composed of several services (eg, data, analysis, computation, and middleware support) and one or more user interfaces or applications that connect and interact with the grid components.
GridCAD is a software application that makes innovative use of grid computing to increase the speed and accuracy of radiologic image interpretation through the sharing of data and analysis resources.
GridCAD may be used to obtain a consensus interpretation by multiple CAD systems and human readers in one or more geographic locations.
GridCAD is built on the NCI Cancer Biomedical Informatics Grid (caBIG) architecture and is semantically and syntactically interoperable with services and applications in caBIG.
| Acknowledgments |
|---|
| Footnotes |
|---|
Abbreviations: CAD = computer-aided detection, HIPAA = Health Insurance Portability and Accountability Act, NCI = National Cancer Institute, PACS = picture archiving and communication system, RIDER = Reference Image Database to Evaluate Response, XML = extensible markup language
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
D. S. Mendelson, P. R. G. Bak, E. Menschik, and E. Siegel Informatics in Radiology: Image Exchange: IHE and the Evolution of Image Sharing RadioGraphics, September 4, 2008; (2008) 287085174. [Abstract] [Full Text] |
||||
![]() |
S. Langella, S. Hastings, S. Oster, T. Pan, A. Sharma, J. Permar, D. Ervin, B. B. Cambazoglu, T. Kurc, and J. Saltz Sharing Data and Analytical Resources Securely in a Biomedical Research Grid Environment J. Am. Med. Inform. Assoc., May 1, 2008; 15(3): 363 - 373. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Oster, S. Langella, S. Hastings, D. Ervin, R. Madduri, J. Phillips, T. Kurc, F. Siebenlist, P. Covitz, K. Shanbhag, et al. caGrid 1.0: An Enterprise Grid Infrastructure for Biomedical Research J. Am. Med. Inform. Assoc., March 1, 2008; 15(2): 138 - 149. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOGRAPHICS | RADIOLOGY | RSNA JOURNALS ONLINE |