12
Mapping Scientific Frontiers: The Quest for Knowledge Visualization

Mapping Scientific Frontiers: The Quest for Knowledge Visualization978-1-4471-0051-5/1.pdf · Mapping scientific frontiers; the quest for knowledge visuaiisatiOn/Chaomei Chen. p

Embed Size (px)

Citation preview

Mapping Scientific Frontiers: The Quest for Knowledge Visualization

Springer London Berlin Heidelberg New York Barcelona Hong Kong Milan Paris Singapore Tokyo

Chaomei Chen

Mapping Scientific Frontiers: The Quest for Knowledge Visualization

Springer

Chaomei Chen, PhD, MSc. SSc College of Information Science and Technology, Drexel University, Philadelphia. USA

British Library Cataloguing in Publication Data Chen, Chaomei, 1960-

Mapping scientific frontiers; the quest for knowledge visualiz.ation I. Knowledge representation (Information theory) 2. Visualisation 3. Discoveries in science I. Title 006.3'32 ISBN 1852334940

Library of Congress Cataloging-in-Publication Data Chen, Chaomei, 1960-

Mapping scientific frontiers; the quest for knowledge visuaiisatiOn/Chaomei Chen. p. cm.

Includes bibliographical references and index. ISBN 1-85233-494-0 (acid-free paper) l. Communication in science-Graphic methods. 2. Visual communication. I. Title. Q223.C48 2002 501'A-dc21 2002026827

Apart from any fair dealing fo r the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publ ishers, or in the case of reprographic reproduct ion in accordance with Ihe terms of licences issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers.

ISBN 1-85233-494-0 Springer-Verlag London Berlin Heidelberg a member of BerteismannSpringer Science+Business Media GmbH http://www.sprioger.co.uk

© Springer-Verlag London Limited 2003

The use of registered names, trademarks elc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use.

The publisher makes no representation, express or implied, with regard to Ihe accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.

Whilst we have made considerable efforts to contact all holders of copyright material contained within this book, we have failed \0 locate some of them. Should holders wish to contact the Publisher, we will be happy to come to some arrangement.

Typesetting: Gray Puhlishing, Tunbridge Wells, UK Printed and bound at Kyodo Printing Co (S'pore) Pte Ltd 34/3830-543210 Printed on acid-free paper SPIN 10335562

Contents

Preface . .. .. .. .. .. .. .. .. .. .. .. .. It; .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .. .... vii

1 The Growth of Scientific Knowledge .......................... 1 1.1 Scientific Frontiers. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 1 1.2 Message in a Bottle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 13 1.3 Mapping Scientific Frontiers. . . . . . . . . . . . . . . . . . . . . . . . . . . .. 32 1.4 Organization of the Book. . . . . . . • . . . . . . . . . . . . . . . . . . . . . .. 35

2 Mapping the Universe • • • • • • . • • . • • • • • • • • . • . . • • • . . • • • • • • . • •• 39 2.1 Cartography. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 39 2.2 Terrestrial Maps. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 44 2.3 Celestial Maps .......•................................ 45 2.4 Biological Maps . . . . . . • . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . .. 60

3 Mapping the Mind . • • • • • • • • . • • . • • • • • . . . • • • • . • • • • • • . • • • . • •• 67 3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 67 3.2 Dimensionality Reduction I: Classic Methods ......•......•. 73 3.3 Concept Mapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 83 3.4 Dimensionality Reduction II: Isomap and

Locally Linear Embedding Algorithms .................... 86 3.5 Network Analysis ............................•......... 90

4 Enabling Techniques for Science Mapping. . • • . . . • . • • •• • • •• •. 101 4.1 Information Visualization. . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 101 4.2 Displaying Structures. . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . .. 111 4.3 Behavioral Semantics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 116 4.4 Discussion. . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . . . . . . .. 131 4.5 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 131

5 On the Shoulders of Giants. • . • • .. • • .. .. • .. • • • . • • • • • .. .. ... 135 5.1 Success Breeds Success. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. 136 5.2 Co-word Maps. . . . . . . . . . . . . . . . • . . • . . . . . . . . . . . . . . . . . .. 139 5.3 Co-citation Analysis. . . . . . . . . . . . . . . . • . . . . . • . . . . . . . . . . .. 144 5.4 Other Examples. . . . . . . . . . . . . . . . . . . . • . . . . . . . . . . . . . . . .. 159 5.5 Summary. . . . . . . . . . . . . . . . . . . . . . . . . . • . . . • . . . . . . • . . . . .. 163

6 Tracing Competing Paradigms .. • • • • • . • • • • • .. • .. . .. • .. • • ... 167 6.1 Domain Analysis in Information Science .•.•........... " 167 6.2 Case Study I: The Mass Extinction Debates . . . . . . . . . . . . . .. 171

v

vi Mapping Scientific Frontiers

6.3 Case Study II: Supermassive Black Holes .................. 183 6.4 Conclusions .......................................... 188

7 Tracking Latent Domain Knowledge ......................... 191 7.1 Introduction .......................................... 191 7.2 Knowledge Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192 7.3 Case Study I: Swanson's Impact .......................... 201 7.4 Case Study II: Pathfinder Networks ....................... 203 7.5 Case Study III: BSE and vCJD ........................... 206 7.6 Summary ............................................ 219 7.7 The Future ........................................... 220

Appendix: List of Figures ..................................... 225

Index ...................................................... 237

Preface

Mapping scientific frontiers is a topic that has been persistently pursued by generations of scholars and engineers from a diverse range of perspectives. Science and technology have long become an integral part of our modern life. Scientific frontiers are where we meet the unknown. Philosophers, social scientists, information scientists, computer scientists, cognitive psycholo­gists, and many others study various aspects of scientific knowledge and scientific literature. For example, philosophy of science is concerned with the nature of scientific knowledge and the patterns of scientific discoveries. Information scientists seek for the best use of scientific literature. Computer scientists investigate techniques to augment our abilities to handle mountains of data so that we can extract salient interrelationships. Scientometrics is a field in which researchers use quantitative methods to study science as opposed to qualitative studies typically seen in social sciences. Theories and methods for mapping scientific frontiers have been out there for several decades. Different approaches to mapping scientific frontiers over recent years are like streams running from several different sources. There are now clear signs that these streams are merging into something very big: the quest for knowledge visualization.

Our perception is closely associated with our cognition and our understanding. There is an argument that all scientific activities share an ultimate goal, that is, to create images of the world so that we can see them and understand them. Scientific discoveries frequently involve visual thinking, from the discovery of the structure of DNA to the discovery of the "Great Wall" of galaxies. The first goal of this book is to provide a broad overview of similar ways of thinking and visualizing a variety of phenomena in different scientific disciplines. We want to identify the most fundamental aspects of mapping across these disciplines. This book describes mapping scientific frontiers from the perspective of visual thinking and visual exploration. The central theme is the construction of visual-spatial rep­resentations that may convey insights into the dynamic structure of scientific frontiers. Concepts such as intellectual structures, invisible colleges, and competing paradigms are instantiated by real examples of scientific debates. An integrated approach is taken to highlight the great potential of the synergy of several contributing disciplines, including philosophy of science, information retrieval, scientometrics, domain analysis, and information visualization.

The steadily growing interest in information visualization and the esta­blished field of studying scientific literature are among the driving forces of

vii

viii Mapping Scientific Frontiers

such integration. On the one hand, the technical advantages of information visualization have reached a critical turning point. It is time to consider design issues that go beyond the pretty pictures, and even beyond the excitement at the first sight of a revealing visualization. The question is: how do we fit a neat picture smoothly into the information flow of our work? On the other hand, on the shoulders of giants is a metaphor used in this book as we unfold the roadmap of science mapping. Focusing on the structure and dynamics of science as a whole and that of specialties and knowledge domains is a long tradition in information science in general and scientometrics in particular. Information scientists have developed theories and methodologies to a great extent independently from techno­logical and engineering disciplines such as computer science, knowledge engineering, knowledge discovery, and data mining. If we regard informa­tion visualization and information science as two camps, until recently there have been relatively few cross-camp intellectual fusions. This book aims to stimulate and foster interdisciplinary research between the two fields. The book is intended to provide the basic touchstones for readers with different disciplinary backgrounds. Mapping scientific frontiers provides an exciting and comprehensive challenge to information visual­ization, while sophisticated information visualization techniques offer the opportunity of augmenting our abilities to handle the phenomenon of knowledge growth on a very large scale. In this book, we contrast the classic methods and the new developments to form a basis for the new generation of innovations and applications to take place.

Another goal of the book is to introduce a specific way to operationalize the identification of scientific paradigms. This approach emphasizes a problem-driven process as opposed to general visual exploration. Users of a knowledge visualization system have a specific research question in mind. Research questions are distinguished from search questions. Their focuses are on different levels of cognition.

The thoughts developed in this book are influenced and inspired by a series of pioneering works in several fields of study. The idea of virtual link structures outlined in Frank Halasz's seminal seven-issue paper and the idea of dynamic linking implemented in the Microcosm system of Wendy Hall and her group at Southampton University, UK, provided the earliest signposts that led us onto the road of visualizing an intrinsic information structure. The idea of dynamic linking led to the development of our generic framework for structuring and visualizing information: Generalized Similarity Analysis (GSA). Two keynote speeches at the ACM hypertext conferences were particularly inspiring: the opening keynote speech by John Smith of the University of North Carolina at Hypertext'97, entitled The King is dead, long live the King, and the opening keynote speech by John Leggett of Texas A&M at Hypertext'98, entitled Camping on the banks of the hypermedia literature: waiting for (a hyperliterate) civilization to arrive. Both speeches addressed issues at the level of scientific commu­nities. John Smith's talk highlighted the issues raised by the World Wide Web for the hypertext community. Compared with many elegantly crafted hypertext systems, the World Wide Web in 1997 was seen as an ugly duck that ignored the hard-won knowledge of the hypertext community,

Preface ix

simplified the data model, ignored problems of large-scale navigation, and declared that link integrity is irrelevant. The message of his speech, however, was that if the hypertext community wants to continue and to create value for its knowledge, it must embrace the web, not just tolerate it. John Leggett's keynote summed up the history of the hypertext community in an analogy of invisible camps and tribes and traced a list of missing persons to illustrate the phenomenon of runners between camps. When we came across Howard White and Katherine McCain's article on visualiz­ing disciplines, we realized that integrating discipline-oriented co-citation analysis and information visualization would be a fruitful route to proceed. White and McCain illustrated their effort in tracking high-level movements of a scientific community in terms of their citation-based groupings. Henry Small's work in specialty narratives and visualizing science was also a major source of inspiration, especially in the connection between citation analysis and Thomas Kuhn's notion of scientific paradigms. The next major signpost was from the BBC's 50 minute science program series Horizon in 2000, which featured supermassive black holes. The fact that new evidence for supermassive black holes could tip over existing theories of galaxy form­ation provided a concrete example to take our paradigm-focused visualiza­tion approach for a test drive. This book in part reflects the research built on these pioneering works and I am grateful for these intellectual milestones.

A lot of work needs to be done to cultivate knowledge visualization as a unifying subject matter that can join several disciplines. A special issue of the Journal of the American Society for Information Science and Technology is scheduled for 2003 on visualizing scientific paradigms. The first inter­national symposium on knowledge domain visualization will take place in 2002 at the IEEE International Conference on Information Visualization in London, UK. The first issue of a new peer-reviewed international journal, Information Visualization, is now published. This new journal provides a unique forum for knowledge domain visualization and the synergy between various disciplines.

I hope you will enjoy reading this book.

Chaomei Chen College of Information Science and Technology

Drexel University Philadelphia, Pennsylvania

x Mapping Scientific Frontiers

Acknowledgements

I would like to thank a number of people for their constant encouragement and support from the fields of information visualization, information science and hypertext, in particular, Ben Shneiderman, George Robertson, Mary Czerwinski, Daniel Keirn and Ebad Banissi from the information visualization camp, Eugene Garfield, Henry Small, Howard White, Katherine McCain, and Tony Cawkell from the science mapping camp, and Wendy Hall, Leslie Carr, and Roy Rada from the hypertext community.

I would like to thank my colleagues and collaborators with whom I have the pleasure to work at various places, especially the members of the VIVID Research Centre at Brunei University in England, including Ray J. Paul, Jasna Kuljis, Lynne Baldwin, Timothy Cribb in, Sonali Morar, and Chiladda Chennawasin.

Thanks to all copyright holders for kindly allowing the reproduction of their fascinating works as a unique feature of the book. Thanks to Mary Ondrusz, Rebecca Mowat, and others at Springer-Verlag, London, for all the effort they have put into the book.

Special thanks to Katherine McCain for detailed comments and discus­sions on an earlier draft of a number of chapters.

To my family, my wife Baohuan, Calvin (9), and Steven (3); I simply cannot thank them enough for the love and happiness, for their under­standing, and encouragement.

Preface xi

Abbreviations

ACA ACM AGN AIDS ANT ASIS ASIS&T

BFS BSE CFA CBIR CISC CJD DCA 2dF DFS DNA ETM+ GSA GSS HMM HST IEEE IC lSI JASIS JASIS&T

LLE LSI MCN MDS MST NASA NGC PCA PFNET PNNL PrP RISC SCI SOM SSCI SPSS SVD

Author Co-citation Analysis Association of Computing Machinery Active Galactic Nuclei Acquired Immunodficiency Syndrome Actor Network Theory The American Society for Information Science The American Society for Information Science and Technology Breadth-First Search Bovine Spongiform Encephalopathy Harvard-Smithsonian Center for Astrophysics Content-Based Image Retrieval Complex Instruction Set Computing Creutzfeldt - Jakob Disease Document Co-citation Analysis Two-Degree Field Spectrograph Depth-First Search Deoxyribonucleic Acid Enhanced Thematic Mapper Plus Generalized Similarity Analysis Gerstmann-Striiaussler-Scheinker disease Hidden Markov Model Hubble Space Telescope Institute of Electrical and Electronics Engineers Index Catalogue Institute for Scientific Information Journal of the American Society for Information Science Journal of the American Society for Information Science and Technology Locally Linear Embedding Latent Semantic Indexing Minimum-Cost Network Multidimensional Scaling Minimum Spanning Tree National Aeronautics and Space Administration New General Catalogue Principal Component Analysis Pathfinder Network Pacific Northwest National Laboratory Prion Protein Reduced Instruction Set Computing Science Citation Index Self-Organized (Feature) Map Social Science Citation Index Statistical Package for Social Sciences Singular Value Decomposition

xii Mapping Scientific Frontiers

Text Retrieval Conference TREC TSE TSP USPTO vCJD VLSI

Transmissible Spongiform Encephalopathy Traveling Salesman Problem

Metric units km em

Names

United States Patent and Trademark Office New Variant CJD Very Large-Scale Integration

kilometer centimeter

Francis Bacon (1561-1626) John Bernal (1901-1971) Samuel Bradford (1878-1948) Pieter Bruegel (1525-1569) Vannevar Bush (1890-1974) John Louis Emil Dreyer (1852-1926) Maurits Comelis Escher (1898-1972) Leonhard Euler (1707-1783) John Flamsteed (1646-1719) Alexander Fleming (1881-1955) Belver Griffith (1931-1999) Arthur Holmes (1890-1965) Aldous Huxley (1894-1963) Manfred Kochen (1928-1989) Thomas Samuel Kuhn (1922-1996) Alfred Lotka (1880-1949) Rene Magritte (1898-1967) Charles Mezzier (1730-1817) Robert King Merton (1910-) Charles Joseph Minard (1781-1870) Ithiel de Sola Pool (1917-1984) Wilhelm Conrad Rontgen (1845-1923) Derek John de Solla Price (1922-1983) Gerald Salton (1964-1988) John Godfrey Saxe (1816-1887) John Snow (1813-1858) Kekule von Stradonitz (1829-1896) Alfred Lothar Wegener (1880-1930)