KNAW

Research

Visualization of Concept Relations

Pagina-navigatie:


Update Research data


Title Visualization of Concept Relations
Period 01 / 2005 - unknown
Status Current
Dissertation Yes
Research number OND1333831
Data Supplier Website ERIM

Abstract

The amount of information available in scientific literature is immense andis still growing at an incredible rate. A scientist can only deal with a limitedamount of information. As a consequence, no individual scientist is capableof studying all available scientific literature. Complete coverage is simplyimpossible. This forces scientists to specialize in a certain field of science.But even keeping up with everything that is going on in a scientist s ownfield takes too much time. In other words, scientists face an overload ofinformation.An illustrative example of the information overload in scientific fields isthe rapidly evolving biomedical field. Due to the discovery of new diseases,medicines, and treatments, scientific publications appear with high speed inthis field. As a consequence, the number of biomedical journals has doubledevery 19 years since 1870 [34]. An estimated two million articles are publishedin the biomedical literature each year [1]. For a scientist to read everythingof possible biomedical relevance, it would be necessary to read about 6, 000articles a day [15]. Clearly, due to time constraints, it is far from possible for ascientist to read all these publications. The same phenomenon of informationoverload can be seen in other scientific fields as well, for example in the fieldsof business and economics.Due to the information overload in scientific fields, for a scientist it ishard to keep an overview of the structure and the development of his field.A possible effect is that relevant information is ignored because it is neveruncovered or read. This effect brings about some unfavorable consequences.A scientist without knowledge of all relevant information on his researchtopics probably makes less progress than he would have made if he had hadall relevant information. Also, different scientists might be solving the sameproblem without being aware of each others efforts.In order to tackle the problem of information overload, since the 1940sattempts have been made to create computer systems for the retrieval ofrelevant information (e.g., [2, 32]). Traditional information retrieval systemstry to find documents in a given document collection that satisfy a giveninformation need of the user. Typically, information retrieval systems operate2using a retrieval process which consists of three phases. First, the user ofthe retrieval system has to express his information need by a query in thelanguage provided by the system. This normally implies specifying a setof keywords which describe as well as possible the information need of theuser. Then, the retrieval system investigates how close the contents of eachdocument in the collection satisfy the query of the user. In this phase,the retrieval system estimates the relevancy of each document. Finally, thedocuments that are considered to be sufficiently relevant are presented tothe user. The retrieved documents are usually ranked according to theirlikelihood of relevance.An example of one of nowadays best-known and most popular informationretrieval systems is the Web search engine Google1. In addition to normalWeb searching, Google also provides a tool that is specifically intendedfor searching and retrieving scientific literature. This tool is called GoogleScholar2, and it enables users to search the Web for scientific literature, suchas scientific articles, books, theses, preprints, abstracts, and technical reports.Recently, Microsoft has launched a similar tool called Windows LiveAcademic Search3. Like Google Scholar, this tool enables users to search forscientific literature. Currently, users can only search literature in the fieldsof computer science, electrical engineering, and physics. Microsoft plans toadd more subject areas in the near future.Although information retrieval systems help to find relevant documents,they do not help directly to obtain an overview of the structure and thedevelopment of a scientific field. Scientist have to obtain such an overviewthemselves, because the documents retrieved using an information retrievalsystem have to be read in order to uncover the information that is enclosedin them. This still takes a lot of time.The above analysis indicates that there is a need for systems that do morethan just information retrieval. To assist scientists in the knowledge discoveryprocess, there is a need for systems that extract and review informationthat is enclosed in the literature of a scientific field. The proposed research is concerned with the development of such a system. The research focuseson a system that is able to extract the most important information from individualdocuments, aggregate the extracted information from multiple documents,and visually present the extracted and aggregated information in aninteractive way.The system that will be developed is expected to be of importance in anumber of ways. First of all, the visualizations generated by the system willprovide an overview of the structure of a scientific field. This can be accomplishedby showing the important concepts in a field as well as the relationsbetween these concepts (for an example, see Figure 1). The visualizationscan be used in several ways, for example to familiarize oneself with the terminologyof a field, to obtain indications of a field s important research topics,and to get more insight into the various subfields that exist within a field. Inaddition, the visualizations offered by the system may reveal developmentsthat are going on in a scientific field, like introductions of new concepts, increasesor decreases in the popularity of research topics, and changes in therelations between different subfields. The system to be developed also aimsto significantly increase the efficiency of the process of scientific knowledgediscovery. The system will assist scientists in the process of knowledge discoverynot only by providing the aforementioned visualizations, but also bysuggesting concept relations that are currently not explicitly known in theliterature of a scientific field.Before presenting our research questions in Section 5, we will first discuss atext data mining approach to knowledge discovery in Section 4. The researchquestions can then be related to the different steps in the text data miningprocess.

Related organisations

Related people

Supervisor Prof.dr.ir. R. Dekker
Co-supervisor Dr.ir. J. van den Berg
Doctoral/PhD student Dr. N.J. van Eck

Classification

D16400 Information systems, databases

Go to page top
Go back to contents
Go back to site navigation