|
|
||||||||
Editorial |
1 From the Department of Radiology, Thomas Jefferson University Hospital, 132 S Tenth St, Suite 1080B Main Building, Philadelphia, PA 19107. Received and accepted July 13, 2007. The author has no financial relationships to disclose. Address correspondence to the author (e-mail: Adam.flanders{at}jefferson.edu).
An estimated 11.5 billion plus pages exist on the publicly indexable Web (ie, surface or visible Web) (1), with another 9 billion pages available on the deep Web—that is, Web pages and content that are either hidden or potentially not reachable by spider or Web crawler programs, which search engines use to constantly index all available content on the public Internet. Moreover, hundreds of thousands of new Web pages spawn into life every day, all of which need to be indexed by a Web crawler before they can be easily reached by the average user.
Wading through billions of Web pages requires use of a search engine to find exactly what one needs. The most popular search portals, such as Google, MSN/Live Search, and Yahoo, lead the user to other Web sites that may (or may not) contain the information that was initially requested. Frequently, a user rarely finds what he or she is looking for in the first few tries; use of too few terms or those that are too general yields the return of too many results, whereas use of a search string that is too specific may yield nothing.
Not all search engines are created equal. In the competitive world of Internet search, no established standards exist in the area of Web indexing. Each indexing service closely guards its proprietary method(s) to catalog the content of the public Web, to measure relevance of the index terms within Web pages, and to present these data to the user. Because of these differences, the estimated intersection of the total indexes of the four largest search engines—Google, Yahoo, Ask, and MSN/Live Search—is low (28.85%) (1). This result suggests that loyalty to one search engine will not provide you with a complete search of the entire public Web and that routine use of two or more search engines is preferred.
Physicians have embraced the Web as a rapid means to retrieve reference material because it offers greater accessibility to information. Although all physicians today find themselves spending a greater proportion of their workday in front of a computer screen, radiologists are very seldom more than a few feet from an Internet-connected computer. It is no surprise that the Web search portal on the desktop computer has taken the place of the radiology reference library and has become the most used decision-support tool in the radiology reading room. The reference library and film teaching file have been relegated to the back office in favor of the instant gratification attainable through Web searching as part of the daily work flow. Gone are the days of sifting through a stack of textbooks or a trip to the radiology library in search of a clinical pearl or a relevant reference image to solve a diagnostic dilemma. Fingertip access to what seems to be an almost limitless amount of medical information can be both time-efficient and addictive. However, despite the compelling convenience of Web-delivered content, a recently published survey showed that a majority of radiologists (67%) still prefer printed media over Web-delivered content for education and continuing medical education (2).
Many Web-savvy radiologists have already discovered that the "quality" of searches obtainable with modern search engines varies substantially. Most often, although the quantity of material returned by the search engine is more than sufficient, the value or quality of that material is very heterogeneous. Reasons for this variability are multifactorial, ranging from poor search term choices by the user to differences in proprietary search algorithms to variable quality of the returned information. As is often the case, answering a specific clinical or radiologic question by using conventional Web search mechanisms requires persistence and luck.
Ultimately, the measure of "success" for any Web query is related to the relevance of the material retrieved by the users question. Relevance, in the parlance of computing or search engine terminology, represents a numeric score assigned to a search result that indicates how well that result fulfills the needs of the original query. Although search engines provided by Google or Yahoo rank order returned results partially based on frequency or proximity of search terms, it is up to the user to assess the value of the returned content to determine whether his or her query has been addressed appropriately. Even Googles relatively impartial Page Rank algorithm, which places increased "value" on Web pages that have more cross-link references from other pages, falls short in helping a user easily evaluate the quality of the content. Industry surveys suggest that domestic users are less satisfied with search engines than they were in the past. With the exponential growth of content in the public Web, this trend is likely to progress (3).
Radiologists want a more satisfying experience in using the Web as a point-of-care, decision-support resource. We want to use the Web to expand differential diagnoses, to obtain clinical snippets relevant to a specific entity, and to retrieve pertinent images and pathologic information. However, most of us do not have the time or motivation to develop the talents of a skilled reference librarian. What we need is a tool that innately understands what the radiologist is looking for. Innovation in the Web search experience lies in dynamically filtering searches or supplementing a search string with synonyms provided by pathology- or radiology-specific vocabularies.
Two inventive alternatives to conventional Web searching were demonstrated at the 2006 Annual Meeting of the Radiological Society of North America (RSNA). A custom search engine (ARRS GoldMiner; http://goldminer.arrs.org) was built to provide instant access to over 90,000 images published in selected peer-reviewed radiology journals. Images were extracted and stored, and a novel indexing schema was developed to map each figures caption to pertinent concepts in controlled vocabularies such as those compiled in the Unified Medical Language System (UMLS) and its subset vocabulary and the Systematized Nomenclature of Medicine (SNOMED). Use of these vocabularies enriched the index by adding synonymy and hierarchical relationships between the terms. This enhanced indexing schema allows the search engine to return content based on conceptual relationships to the search item, instead of simple matches to the term itself. Image references from a GoldMiner search include conceptually related items in addition to exact matches to the search terms (4). This integrated semantic intelligence greatly improves the search experience.
Another custom radiology-centric search engine (Yottalook; http://www.yottalook.com) was built as a supplement to the popular Google search engine. The goal of this project was to automate the process of refining radiology-specific searches by dynamically filtering search requests. The custom Yottalook search portal sits between the user and the Google search engine, filtering and brokering modifications to submitted search query strings by using semantic knowledge derived from UMLS, SNOMED, and RadLex (Radiology Lexicon), to improve the probability that the content returned is relevant to the radiologists needs. The Yottalook search engine uses a number of proprietary core technologies to analyze each query string for meaning, which then automatically supplements the search with additional concepts derived from controlled vocabularies. The intent is to return information that is closer to what the radiologist originally intended to find. Content can also be actively filtered by the user for items commonly requested by radiologists, ranging from specific modalities to books, protocols, journals, and products.
Both of these custom search engines are great examples of query expansion based on existing knowledge representations (in the artificial intelligence sphere, a knowledge representation is a system, process, or design for storing knowledge about a subject). Incorporation of automated reasoning and knowledge representation into the Web is not a novel concept. Nearly a decade ago, one of the architects of the Web (Tim Berners-Lee) envisioned that the Web would evolve into a state in which machines could also process the concepts contained within Web pages, enabling computers and people to work in a cooperative, rather than a master-servant, relationship (5). The application of intrinsic semantic knowledge and embedded logic to perform a controlled expansion of a Web query exploits some of the technology that futurists envision is the next major evolutionary extension of the World Wide Web: the Semantic Web. The fundamental construct of the Semantic Web is to extend the current human-readable Web content into a form that machines can process or "understand." The Semantic Web introduces new technologies that make the semantics (or meaning) of human-readable Web content explicit and machine processable, so that automated, roaming software-agents can identify similarities among Web resources. Although neither GoldMiner nor Yottalook by definition are actual components of the Semantic Web, they do provide a compelling demonstration of how the value of a Web application can be improved by exploiting one of the core technology components of the Semantic Web: ontologies that enable queries to include concepts instead of being limited to only simple phrases.
Natural language is the standard "currency" of the Web. It is used to initiate most informational transactions and to return information to the user. As such, most of the content on the Web was designed for humans to read and understand. Computers are particularly well suited for displaying Web page content; however, computers do not have a reliable method to process or make inferences from the concepts contained within the pages themselves. Teaching machines the concepts and relationships that are innate to basic human cognition requires development of knowledge representations to give a formal structure or framework to the concepts. Some of the technologies used to create these knowledge representations for the Semantic Web include XML (eXtensible Markup Language), RDF (Resource Description Framework), and OWL (Web Ontology Language).
An essential component of a knowledge representation is the implementation of ontologies. An ontology is a taxonomy that defines classes of objects, the relationships between the objects, and attributes of the objects. Ontologies can greatly enhance the accuracy of Web searches by matching pages that fit a precise concept rather than a particular word or phrase (especially since there are often many words for any given concept). Ontologies are the "glue" that bind disparate data and help augment the aggregation of related concepts from diverse sources. The RSNA-sponsored RadLex project is an ontologic representation of the majority of radiology concepts. RadLex is designed to be compatible with other Semantic Web technologies (eg, OWL), so that it can easily link to other medical knowledge representations. The RadLex vocabularies and concepts will form the basis of communications for completely new man-to-machine and machine-to-machine interfaces that will augment clinical work flow, clinical reporting, research, and education.
Radiologists now have the opportunity to test-drive some of the newest, most sophisticated decision-support tools at RSNA 2007. Some of the exhibitors are already incorporating semantic inference engines into their applications to improve both the efficiency and accuracy of radiology information retrieval. The RSNA is also exploiting these and other technologies to improve the lifelong learning experience for radiologists by creating more useful information-aggregation methods for education and research. If you want to glimpse into your future, take a trip to the Informatics section in the Lakeside Learning Center at RSNA 2007.
Acknowledgments
The author thanks Daniel Rubin, MD, MS, and David E. Avrin, MD, PhD, for their comments.
References
This article has been cited by other articles:
![]() |
S. R. Pomerantz Net Assets: Personal Technology for Productivity in Radiology Radiology, May 1, 2008; 247(2): 307 - 310. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| RADIOGRAPHICS | RADIOLOGY | RSNA JOURNALS ONLINE |