49
A Proposal to the National Science Foundation CISE RESEARCH INFRASTRUCTURE: SCALABLE MULTIMEDIA INFORMATION PROCESSING Principal Investigator: A. V. Aho Department of Computer Science Columbia University [email protected] Phone: 212-939-7000; FAX 212-666-0140 1

A Proposal to the National Science Foundation CISE ... · A Proposal to the National Science Foundation CISE RESEARCH INFRASTRUCTURE: SCALABLE MULTIMEDIA INFORMATION PROCESSING Principal

Embed Size (px)

Citation preview

A Proposal to the National Science Foundation

CISE RESEARCH INFRASTRUCTURE:SCALABLE MULTIMEDIA INFORMATION PROCESSING

Principal Investigator:

A. V. AhoDepartment of Computer Science

Columbia [email protected]

Phone: 212-939-7000; FAX 212-666-0140

1

A TABLE OF CONTENTS

Contents

A TABLE OF CONTENTS 2

B EXECUTIVE SUMMARY 4

C RESEARCH INFRASTRUCTURE DESCRIPTION 7C.1 Overview : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7C.2 Requested Experimental Facilities : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 7

D RESOURCE ALLOCATION 12D.1 Current Departmental Facilities : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 12D.2 Description of Requested Equipment : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 13D.3 Rationale : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14D.4 Maintenance Costs : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14D.5 Access by Local and Remote Users : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14D.6 Space Renovation : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14D.7 Institutional Cost Sharing : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 14

E MANAGEMENT STRUCTURE 16

F BUDGET 17

G RESEARCH 24G.1 HIGH PERFORMANCE INTEGRATED MULTIMEDIA INFORMATION SYSTEMS: : : : : : : : 24

G.1.1 ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 24G.1.2 BACKGROUND OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : 24G.1.3 SUMMARY OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : : : 25G.1.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE : : : : : : : : : : : : : : : : : : : : 30G.1.5 INTERACTIONS WITH OTHER PROJECTS : : : : : : : : : : : : : : : : : : : : : : : : 31

G.2 VISUAL INFORMATION PROCESSING: : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32G.2.1 ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 32G.2.2 BACKGROUND OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : 32G.2.3 SUMMARY OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : : : 34G.2.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE : : : : : : : : : : : : : : : : : : : : 39G.2.5 INTERACTIONS WITH OTHER PROJECTS : : : : : : : : : : : : : : : : : : : : : : : : 39

G.3 MOBILE MULTIMEDIA USER INTERFACES: : : : : : : : : : : : : : : : : : : : : : : : : : : : 40G.3.1 ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 40G.3.2 BACKGROUND OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : 40G.3.3 SUMMARY OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : : : 40G.3.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE : : : : : : : : : : : : : : : : : : : : 45G.3.5 INTERACTIONS WITH OTHER PROJECTS : : : : : : : : : : : : : : : : : : : : : : : : 45

G.4 SCALABLE SYSTEMS FOR MOBILE AND PORTABLE COMPUTING: : : : : : : : : : : : : : : 45G.4.1 ABSTRACT : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 45G.4.2 BACKGROUND OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : 46G.4.3 SUMMARY OF PROPOSED RESEARCH : : : : : : : : : : : : : : : : : : : : : : : : : : 47G.4.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE : : : : : : : : : : : : : : : : : : : : 48G.4.5 INTERACTIONS WITH OTHER PROJECTS : : : : : : : : : : : : : : : : : : : : : : : : 48

G.5 Bibliography : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : : 49

H STAFF CREDENTIALS 54

I RESULTS FROM PRIOR RI AWARD 81

3

B EXECUTIVE SUMMARY

Over the last 25 years, computing and communications have been converging to create a powerful informationinfrastructure that can benefit all of society. This convergence has resulted in fundamental changes in the ways inwhich people use computers and access information. The introduction of the ARPANET in the early 70’s allowedresearchers to communicate by computer (albeit slowly, over low bandwidth) and to successfully form an onlinecommunity, with limited sharing of facilities and knowledge. Development of local network technology in the early80’s changed the computing model from one based on mainframes to one based on linked workstations that sharesoftware and hardware resources. In each case, the merging of computing and communications resulted in dramaticincreases in the complexity of the problems researchers could attack, and the numbers of participants and resourcesthey could bring to bear on problems. In the 90’s we have seen the emergence of the world-wide web, and the abilityto access information that is truly global in scope and beyond all expectations of size.

As computing and communications have merged to make information potentially available to many millionsof users, there are still many hard technical problems that need to be solved to create systems with global-scaleinformation storage and access that are effective and efficient. Even though a user may have the entire globalcomputing community resources at his/her disposal, accurately identifying relevant information sources, accessingthem effectively and integrating them with new and different environments that dynamically change over space andtime remains a challenging unsolved problem.

This group of researchers is engaged in research and development of new algorithms and systems to harness thepower of networked information systems. An array of interrelated and mutually supportive research projects are boundby a common dream: to provide a computing environment of moderate to low cost enabling any user anywhere toeffectively and efficiently access and present any desired information in any format and modality.

In order to make substantial progress in solving these problems, we propose to develop new algorithms and systemsguided by theory and demonstrated by working prototypes. The areas of research covered by our faculty that directlyrelate and contribute to this effort include the following:

� high performance storage systems to handle the scale and scope of growing information sources maximizingavailable network bandwidths;

� database theory and practice focused on efficient query processing over massive data;

� search and pattern matching algorithms to develop a new generation of powerful and intelligent browsers andsearch engines to access relevant information efficiently;

� natural language processing, knowledge representation, graphics and user interfaces that provide a new meansfor presenting information in various modalities and on portable devices;

� vision, robotics and computer perception that aims to enhance and integrate the modalities of sight, touch andsound when interfacing and operating over network information sources; and

� low-cost and low-power mobile systems that provide the hardware and distributed computing and communica-tions underpinnings for access anywhere and anytime to the global information repository.

These interrelationships between these areas of research are best illustrated by the type of new computing environ-ment we envisage with the following components and capabilities:

� Mechanisms for effective mobile access to information in a variety of formats, including text, speech, sound,images, video, animation, 3D models, and virtual worlds.

� Inexpensive, low-power, portable computing devices.

� Efficient network-management and service-management systems.

� Intelligent information filters geared to the current computing state.

4

� Agent-based browsers able to access and interface to the new global computing model, mining multimedia data,including text, video, graphics, and sound.

� User interfaces that communicate information effectively by designing multimedia presentations on the fly.These presentations will be customized to the individual user and situation, taking into account the availabledisplays and interaction devices, which may change as the user moves about.

� Plug-and-play software and hardware that allow computing and information models to be upgraded dynamically.

� Scalable high-performance storage systems to meet the growing demands of an information-intensive environ-ment.

� Systems that can dynamically construct models of the surrounding 3D world. This will make it possible to createuser interfaces that integrate virtual information with the physical environment.

We are requesting funds to create a computing infrastructure that will enable realistic and practical demonstrationsof our collective vision for future computing environments. The proposed infrastructure will allow us to perform largescale experiments, develop systems in a reasonable time frame, and ultimately better understand and solve many ofthe problems that have arisen from this new computing paradigm. The explosion of vast collections of informationresources, and the concomitant and relentless performance improvements and cost reductions in computing hardwareand storage devices, have provided an unprecedented opportunity to develop a new computing environment: anenvironment that will provide a realistic and relatively inexpensive means of intelligently accessing and presenting avery large and rich set of multimedia data, including text, speech, video and animation.

This proposal involves four primary areas of research grouping our 13 faculty in a natural way to draw upon theirexpertise and technical leadership to cooperatively solve one class of problem:

HIGH PERFORMANCE INTEGRATED MULTIMEDIA INFORMATION SYSTEMS: Professors Al Aho,Leana Golubchik, Gail Kaiser, Kenneth Ross and Sal Stolfo The enormous amount of information that is nowonline or near-line has become difficult to manage and process effectively. Searching data resources with a mixture ofmodalities as available on the WWW has become increasingly slow, expensive, and ineffectual when based solely onkeyword searches over texts or indices. We propose new algorithms that will allow complexes of integrated learningand searching agents to cooperatively access large data sources in parallel. Our goal is to make information accessscalable. These searching agents will utilize new pattern matching algorithms that use context and approximatemeasures to find matches, across all multimedia objects including images and sound. Intelligent indexing using a newdatabase join algorithm is also being proposed that has many favorable scaling qualities. Scale also affects storagemedia, where consistent, paced information access over time and space is desired. We are proposing work in stagedstorage systems that can deliver multimedia information in a cost-effective manner with accompanying high qualityservice, while users move geographically. Finally, we are proposing to build collaborative hypermedia softwareenvironments that will allow users to fully utilize distributed resources such as the WWW, and easily interoperate withexisting applications to work in these new multimedia environments.

VISUAL INFORMATION PROCESSING: Professors Peter Allen, John Kender and Shree Nayar Imagery isfast becoming one of the dominant modes of multimedia information. The dominant means through which humansinterface with their environment is vision. It is only natural that future information access appliances will likewiseneed to interact through vision as well. However, the use of imagery, and in particular 3-D data, poses serious researchchallenges in the areas of data acquisition, storage, search, compression, fidelity, and summarization. We are proposingresearch across this spectrum, drawing from our past work in image understanding. This work includes building of anew real-time, frame rate 3-D camera; using this system to automatically generate 3-D models of objects for both realand virtual worlds; creating fast hardware and software search engines that can search multimedia data sets and beused to derive spatial location from imagery; creating systems to send and receive 3-D objects via a FAX like interface;integrating language with imagery to automate diagnosis and help systems; and extending the range of multimediainput devices to include human gesture.

5

MOBILE MULTIMEDIA USER INTERFACES: Professors Mukesh Dalal, Steven Feiner, and Kathleen McKe-own We foresee an environment in which information and the computing power to process it become ubiquitous. Justas communications devices have become portable and omnipresent through miniaturization and wireless technology,so will information appliances. Mobile computing is now in the prototype stage, with low-bandwidth networks andlow-capability hardware. As it matures, high-bandwidth, high-capability, mobile multimedia computing will allowusers to routinely access information from any location, at any time, in a form appropriate to the individual andsituation.

For this vision to succeed, we must develop user interfaces that address the convergence between computingand communications. The user-interface research that we propose explores three key directions that will make thispossible: generation of concise written and spoken summaries of otherwise unmanageably large amounts of text anddata; automated design of multimedia presentations that combine synthesized speech, text, graphics, and animationto communicate information effectively; and development of a user-interface infrastructure that coordinates a mobileuser’s constantly changing set of available displays and devices, to create a cohesive, hybrid information space.

SCALABLE SYSTEMS FOR MOBILE AND PORTABLE COMPUTING: Professors Steven Nowick andYechiam Yemini Networks are fundamental to making multimedia information ubiquitous. Our research is directed atusing agent technologies to support extensible, dynamic networks of information, including highly mobile applications.As users move through space and time, the network support must likewise be adjusted and updated to the user’s newrequirements. A focus of our research is in language independent agents, which are capable of also supporting themore widely known agent scripting languages.

As we continue to include mobility and portability in our computing model, issues relating to performance, storageand efficient management of constrained resources such as power become increasingly important. This problem can besolved by using self-timed systems, which demand only low power since there is no need to compute every clock cycle.Self-timing also may provide high performance since clock synchronization issues disappear altogether allowing gainsin performance. Modularity and plug-and-play capabilities are also inherent in self-timed systems since they can beeasily composed of integrated components that operate at different clock speeds. We expect self-timed systems to bean important part of the computing landscape of the near future. Likewise future information access appliances willlikely utilize these design features.

SUMMARY This proposal has four key foundations: 1) the realization that the changes in the way computing is donethat we have outlined above are already underway, highlightingthe importance and timeliness of our proposed research,2) the recognized strength of the Columbia faculty in the relevant areas of research proposed, 3) the demonstrablesynergism among our faculty to work together on these projects; the various interactions among faculty are detailed infour subsections in section G of the proposal, and 4) the strong commitment of the University administration to matchthis grant with real dollars above the minimum matching funds level. To further enhance our efforts and increasethe synergism, each of the four groups of faculty researchers have appointed an individual group leader (ProfessorsAllen, Feiner, Nowick, and Stolfo) who will work collaboratively with Professor Aho, Department Chair and PrincipalInvestigator, to provide overall project coordination and responsibility for managing the staging, acquisition anddeployment of equipment over the five year term of the grant. These five faculty will ensure that the overall goals ofthe grant and the needs of the individual efforts will be met in an appropriate and satisfactory manner.

6

C RESEARCH INFRASTRUCTURE DESCRIPTION

C.1 Overview

In this section, we provide a summary description of the requested equipment, describing why it is needed for theproposed research. The research facilities created by this proposal will form a scalable multimedia informationprocessing testbed and will be treated as a shared resource among all project participants.

The research infrastructure of this proposal would create leverage and synergy among four major areas of interrelatedresearch in the department:

1. High Performance Integrated Multimedia Information Systems

2. Visual Information Processing

3. Mobile Multimedia User Interfaces

4. Scalable Systems for Mobile and Portable Computing

This Research Infrastructure grant would allow us to upgrade our storage, networking, and processing capabilitiesto conduct the proposed research. High performance multimedia information processing is computationally intensive,and requires large amounts of storage and fast networking to transfer data and results between servers and clients.Furthermore, since most projects are transferring, searching, storing, or creating multimedia data, we need high-performance processors that support multimedia, particularly animated graphics and video. Requested equipment thatis critical to all projects and will be treated as a shared resource includes a data farm, a multimedia/visualization su-percomputer, multimedia workstations, high-performance multimedia PCs, and high-bandwidth networking hardware.The equipment acquisition strategy is to put in place the basic storage, computational, and networking componentsin the first two years of the project and then use university-provided equipment matching grants to augment theinfrastructure to maximize its impact in the later years. Section D of this proposal contains a listing of the equipmentrequested each year.

C.2 Requested Experimental Facilities

In this section, we first describe the shared resources that are critical to the research success of each area, and thenprovide details on other items that are specific to the needs of one or more individual groups.

Data Farm: Massive amounts of storage and high-bandwidth networking for transferring multimedia and otherdata are essential for all areas.

The study and implementation of data layout, scheduling, and fault tolerance schemes in Prof. Golubchik’sresearch on multimedia storage servers demands significant amounts of storage space, as well as computational andcommunication power. To store multimedia data for experimentation, we require storage devices, at all levels of thestorage hierarchy. Prof. Kaiser’s laboratory consumes nearly 33GB of disk space for current prototype software anddocumentation, including some imported utilities. Prof. McKeown’s research requires storage of large text corporafor analysis in order to derive constraints on vocabulary for summarization systems. Statistical analysis also requiresspace for intermediate results (typically as large as the corpora themselves). Thus, for training data alone, her groupplaces high demands on disk usage.

To experimentally verify the jive-join techniques proposed by Prof. Ross, we need to create a realistic operatingenvironment with high-bandwidth storage units (fast disks or disk arrays) and high-bandwidth networks. Since thesetwo components are typically the bottleneck in a distributed query processing system, the speed of such components iscrucial. Funding for the purchase and maintenance of such equipment would have a significant impact on the researchdescribed here. Additionally, once our query processing system is operational, it may be made available to studentsand faculty within the department for posing multimedia queries. A high-bandwidth storage and network infrastructurewould lead to reduced response times.

7

The Visual Information Processing group has large storage needs for 2-D and 3-D image data (including motiondata sets at 30 frames/sec rate), CAD models of objects, and 3-D surface reconstructions. The appearance matchingalgorithms also require substantial data storage for storing multiple views of the world.

High-end Multimedia/Visualization Supercomputer: This machine will supplement the requested desk-topmultimedia workstations to provide far higher processing, multimedia, and graphics performance. An example con-figuration is an SGI 4-processor Onyx with (soon to be available) RealityEngine3 graphics subsystem. Profs. Feiner,Golubchik, Allen, Nayar, and Kender will use this machine for its ability to manipulate and visualize significantlylarger 3D models than can be supported by the desktop multimedia workstations. In fact, for Golubchik’s proposedwork on real-time delivery of multimedia data to 3D display applications (e.g., various virtual world environments), itis the only SGI machine that can handle the necessary graphics in real-time. The multimedia search-engine-generationresearch proposed by Prof. Aho requires networking and server capacities to allow experimentation with massivemultimedia data sets. Search engines of varying capabilities will be generated, but the complex ones will requireprocessing power on the order of GIPS for experiments involving very large information repositories.

Multimedia Workstations: To conduct the proposed research in active hypermedia, testing of multimedia de-livery to 3D graphic displays, development of automatically-generated 3D multimedia briefings, and searching andpresentation of multimedia data, we require high-performance workstations that are multimedia ready. We requestfunds for desktop 3D graphics workstations (e.g., the SGI Indigo2 High IMPACT). These machines are tuned toprovide high 3D graphics performance. (Note that we have currently been experimenting with high-end Pentium PCswith OpenGL 3D graphics boards, such as the 3DLabs GLINT 300SX processor, to provide a lower-priced alternativeto graphics workstations. Although their paper specs are impressive, these graphics boards do not currently provideanywhere near the performance of a graphics workstation, for a variety of architectural reasons. We will continue tomonitor progress in this arena, however, so that we can make the most cost-effective purchase decision.) We alsorequire funds to upgrade aging SparcStation 2 workstations to SparcStation 20s that support multimedia, particularlyanimated graphics and video. This means much faster CPUs with more memory and disk resources, video boardsand cameras, a new operating system (the video boards require Solaris instead of SunOS), large color monitors withadequate resolution, and fast networking sufficient for live video transmission without overloading the departmentallyshared facility. Given the need to transport multimedia data from remote as well as various inhouse sites, reliablenetwork connectivity is also a must.

High-performance PCs: Similarly, since many of our projects (in particular our research group in High Perfor-mance Integrated Multimedia Information Systems) intend to provide a system to a wide audience of Internet users, weshall develop our code both for UNIX-based workstations, generally available in our current environment, as well asPCs running Windows with Linux partitions. We prefer to run our experiments and develop our research tools using alarge number of state-of-the-art PCs, based upon the forthcoming P6 processor. Our experiments require a substantialamount of computing power. For example, computing power for our meta-learning experiments can be measured inthe many billions of operations per experiment and this can be delivered by a network of low cost PCs. Furthermore,a user of an information-seeking development environment will demand a low cost computational environment todesign research tools. For these reasons, we believe it is much more cost effective to use a network of PCs to allowwide distribution of our results and to maximize the number of processing sites available for our experiments.

The requested equipment includes high-performance PCs with a proper network configuration envisioned for theexperiments on meta-learning and data mining. We expect that the PC marketplace will continue its fast-paced changeover the course of the five-year term of the proposed research. Therefore, this item should be regarded as a genericterm for the type of system of networked multiple PCs one may expect to see over the next few years.

High-speed Networking: At present most of the research groups in the department are confined to 10Mb sharedbandwidth and the department has only 10Mb access to the campus backbone. With the proposed networkingequipment each research group can have 100Mb Fast Ethernet, switched 10Mb ethernet access between the desktopworkstations and the servers within the group and 100Mb access to the high speed router. This will increase network

8

throughput of each research group. Additional 10Mb ethernet ports will make it feasible to create more virtual LANsand more subnets for each research group. The router with 2Gb backplane speed and FDDI interface will increasethe departmental connection to the campus backbone from 10Mb to 100Mb thus increasing the network uplink to theinternet. This high-speed state-of-the-art router has slots for future expansion, with the feasibility of having ATM,Fast Ethernet, and FDDI to be housed in the same chassis. Overall, the proposed networking infrastructure will helpachieve high speed, better throughput, better switching technology within the group and to the outside world.

The specific equipment we are requesting includes a Cisco 7523, a state-of-the-art router with 2Gb backplane speedwhich can house FDDI, ATM and Fast Ethernet adapters. It will have 11 empty slots, thus giving room for futureexpansion. Ethernet interfaces are requested to allow us to have more subnets. Fast Ethernet cards will give us 100Mb LAN on twisted pair in the department. This will allow each research group to have 100Mb bandwidth, instead ofour current limit of 10Mb. Each Fast Ethernet Hub will have 100Mb, switched 10Mb, FDDI and ATM combo ports init, so that the servers and graphics dedicated workstations needing high speed I/O can be connected to 100Mb ports,while personal desktop light-duty workstations can be connected to switched 10BT ports.

Robotic Tape Storage Library: We are requesting a Terra Box robotic tape storage library (Model MLL4C-52-4DLT4), in order to provide research facilities for multimedia storage experiments as well as providing facilities forback up of data and systems.

Color Printer: We request a high-resolution, continuous-tone, dye-sublimation, color printer (e.g., a TektronixPhaser 440 dye sublimation printer) to provide photographic quality output for publication and presentation by all themultimedia-related projects. This will supplement the “draft-quality” wax color printer that we currently have.

Wall-sized Display: We request a high-resolution (1280�1024) color video projector (e.g., an Electrohome ECP4500), which we will install in one of the department’s common areas for use in multimedia presentations. In addition,in Prof. Feiner’s research it will serve as a “wall-mounted” display whose graphics will be overlaid with additionalmaterial presented on see-through head-worn displays.

Dual-Frequency Carrier-Phase Measurement GPS System: The requested equipment consists of one basestation and one rover (mobile unit), capable of achieving centimeter accuracy position fixes in real-time with a 2 Hzupdate rate. This level of accuracy is possible only with a dual-frequency carrier-phase measurement GPS system,such as the Trimble Site Surveyor SSi with real-time kinematic option. (A high-quality differential GPS system, suchas the one that we currently have, can achieve only submeter accuracy.) The rover must be in constant contact withthe base station. This will be accomplished using our existing spread-spectrum radio infrastructure, supplemented byadditional wireless modems, funded by the grant, which will cover a larger portion of the campus. (The bandwidthrequired by base station to rover communications is only about 1200 baud.)

The GPS system will be used for the collaboration among Profs. Nayar and Feiner on guiding vision-based trackingwith initial estimates from GPS position sensors and orientation sensors to provide precise position and orientationtracking for augmented reality. It will be used by Prof. Nayar to obtain rough approximations of a moving userwhile he/she travels around the physical world. Visual images obtained by the user’s head-mounted camera will thenbe used to refine these rough positional estimates to determine the user’s exact coordinates. These coordinates willenable precise overlay of information on the head-mounted display as proposed by Prof. Feiner. In Prof. Golubchik’swork, it will make it possible to know a user’s position and velocity within a wireless base station’s cell, allowing thedevelopment of better predictive algorithms to handle cell-to-cell handoffs when transmitting multimedia streams.

Photometric Camera and Datacube Image Processing Hardware: The most widely used CCD cameras produce512�480 pixels per image with 8 bits of intensity per pixel. CCD technology has made significant strides in the pastseveral years. This has led to the development of new CCD cameras with high spatial and brightness resolution. ThePhotometric camera we wish to purchase has 1024�1024 pixels and provides up to 16 bits of precision in brightness.Such image sensors are key to ensuring realism in virtual worlds. Profs. Nayar and Allen plan to acquire realistic 3Dmodels of the physical world from a stream of visual images. The recovery of 3D shape as well as visual recognition

9

for dead reckoning of the moving observer can be robust only if the visual features measured by the sensor arerobust. The Photometric camera would enable us to create and use virtual worlds to an extent that is impossible withcommonly-used video cameras.

A set of standard image processing operations need to be applied to the acquired images before they are used formodel acquisition and visual recognition. To make model acquisition practical, it must be done in real-time as theuser moves around in the world. Such real-time performance demands fast image processing hardware. The Datacubeboards have been optimized for real-time operations such as convolution.

Electronic Whiteboard: The requested electronic whiteboard (Softboard, supplied by Microfield Graphics) caninterface with a computer, recording color pen strokes so that a user can display them on his or her computer screen.Software is provided that enables one to “replay” the writing session as it occurred, stroke by stroke. There are twosignificant uses for such a device. First, it can be used as an input device for multimedia applications that require therecording of text and graphical information generated on the fly at informal meetings. This data can be stored for easylookup. Second, several groups will use it to generate hardcopy of information written during meetings rather thanrequiring participants to make explicit notes.

Mobile Personal Computers: We request wireless base stations and mobile personal computers with compatiblewireless transceivers to supplement our current wireless infrastructure. To maintain compatibility, we propose to useat first AT&T NCR WaveLan PC card transceivers for the mobile computers and AT&T NCR WaveLan ISA cards forthe base stations (which can be generic PCs), switching to newer technologies later in the grant period if appropriate.The base stations will be attached to the departmental LAN and will serve the mobile computers. Multiple basestations are needed to fully cover our two buildings and surrounding parts of the campus because of the limited rangeof current spread-spectrum radio systems. (Our current wireless installation serves only parts of one building, facingrooms of some other buildings that are in line of sight, and an outdoors area at the middle of campus.) Each mobilenotebook computer (e.g., IBM 760CD with Flexcam CCD camera, and WaveLan PC card transceiver) will supportmultimedia I/O. We will use these facilities to explore the use of agents to facilitate adaptive multimedia exchangesin a bandwidth-constrained mobile environment. Agents will be dispatched to support compression algorithms andconfiguration of remote resources as needed for such exchanges. We will also use the base stations to support our workin mobile augmented reality.

CAD Package for Digital Design: We request the Cadence CAD (computer-aided design) Synthesis Package(including FPGA PIC Designer, Synergy Synthesizer and Optimizer, Composer, Verilog XL Turbo, Leapfrog VHDLSimulation, ASIC design tools) for the development, design, simulation and analysis of digital designs for low power.It includes FPGA design tools and hardware, schematic/textual design capture, cell-library-based layout system, andVHDL and Verilog simulation tools. We currently have no support packages to build prototypes and the Cadencepackage will allow new approaches to be explored, and provide high-quality support for our design efforts. Thispackage will be used both by Prof. Nowick’s group and by Prof. Nayar’s group.

Speech Recognition System: A speech recognition system (such as the Corona Developer’s Toolkit, includingSpeech Recognition Engine, Application Programming Interface, and the Toolkit, which runs on SGI and Sun platforms)is needed in order to provide an additional mode of input, speech, to our multimedia user interfaces. (We alreadysupport speech output in our research.) Speech recognition would be used by the mobile multimedia user interfacegroup, including Prof. McKeown, Prof. Feiner, and Prof. Dalal, and would also be useful for interfaces for some ofthe information search facilities.

10

D RESOURCE ALLOCATION

In this section we describe the way in which the requested funds will be used to acquire the experimental facilitiesdiscussed in Section C to support the research projects.

D.1 Current Departmental Facilities

The department has laboratory areas for research in robotics, computer vision, distributed and mobile computing, com-puter graphics and user interfaces, natural language processing, programming environments, and parallel architectures.The research equipment and computing facilities currently in the department include:

Fileservers and Parallel Machines

2 Sun S1000 (2 Processors)5 Sun 4/630 (4 Processors)1 Sun 4/6703 Sun 4/3901 Sun 4/4908 HP 9000/735 Parallel Cluster(FDDI Ring)1 Sun S10 (2 processors)

Workstations and Xterminals

2 DEC Alpha1 DEC 50004 DEC 31003 HP 9000 715/7354 HP 9000 370/350

32 Sun 4/757 Sun 4/40

34 Sun 4/509 Sun 4/65

11 Sun 4/6019 Sun Sparc 518 Sun Sparc 1019 Sun Sparc 202 Sun 3/2601 Sun 3/1601 Sun 4/2602 Sun 4/280

70 Tektronix Xterminals(Xp 334/354/358/11/115)2 SGI Indigo 2 Extreme

Microcomputers

50 Pentium/48625 Macintosh6 Toshiba 5200 and 486 base stations (with NCR Wavelan radios)4 Toshiba 1910 portables (with NCR Wavelan radios)2 Toshiba 2400 portables (with NCR Wavelan radios)

60 ASCII HP 700/43 terminals

12

Printers and Other Peripherals

14 HP Printers( IIIsi, IVsi, IV, 5MP )4 Laser Writers ( IIntx, II )1 Tektronix Color printer (wax process)1 Envision PC scanner4 8mm Exabyte Tape drive1 Exabyte Automated Tape backup system(Freezer Box)5 CD Drives2 1/4" Tape Drives

105 GB disk drives

Networking Equipment

3 Cisco routers ( AGS+, AGS, IGS)2 Crescendo FDDI Hubs1 ATM FORE switch

30 Multi-Port repeater Hubs(3 Com)3 Fast Ethernet Hubs

36 High Speed modem lines( Multitech+Telebits+ US Robotics)5 ISDN Lines

10BT connection and Fiber connection to each office

Other Equipment

1 Adept 1 Robot1 DataCube Image Processor9 CCD Cameras1 Denali Graphics Acclerator2 PUMA 560 Robots3 Robotic Hands (Utah, Barrett, Toshiba)1 PIPE Image Processing Engine1 Servorobot laser scanner8 Logitech Ultrasonic Trackers1 Ascension Flock of Birds Magnetic Tracking system1 Origin Instruments Dynasight Optical Radar 3D position tracker3 Virtual i.O color see-through head-mounted displays w/orientation tracking1 Virtual i.O high-resolution greyscale see-through head-mounted display1 Digital Image Design Cricket 3D Interaction Device1 custom-built see-through head-mounted display1 VPL DataGlove1 Crystal River Beachtron spatial sound processor1 Trimble DSM submeter differential GPS receiver2 StereoGraphics CrystalEyes stereo eyewearAssorted 3D graphics accelerators for SGI, Sun, HP, and PC machines

D.2 Description of Requested Equipment

A description of the equipment, maintenance, and support requested each year is shown in the equipment acquisitionschedule. Detailed descriptions of the individual pieces of equipment, including a representative manufacturer andmodel number, are given in Section C as part of the description of requested experimental facilities.

13

D.3 Rationale

The basic high-performance storage, computational, and networking infrastructure will be put in place in the first twoyears of the grant. The university-provided equipment-matching funds will be used to augment the infrastructure in thelater years to maximize its impact in the areas where it has proven most successful. The rationale for the specific piecesof equipment is contained in Section C. The overall goal is to provide a scalable multimedia information processingresearch testbed on which a variety of experiments can be conducted in collaboration with other groups inside andoutside the department.

We have included support for a technical staff member with a starting salary of $50,000 per year and a staffprogrammer with a starting salary of $35,000 per year. The technical staff member would be responsible for installingand maintaining the equipment in the research testbed. This is a full-time job that is essential for the success of thisproject. The second staff member would assist the first and maintain key software systems needed for the operation ofthe facility. We have shown increasing university support for the second staff person in the later years of the project.

D.4 Maintenance Costs

We have budgeted the following maintenance costs for this project:

Year 1: $20,000Year 2: $20,000Year 3: $25,000Year 4: $30,000Year 5: $35,000Total: $130,000

This estimated budget is based on our past experience with maintenance, our current maintainence expenses of$20,000 per year, and preliminary maintenance contract quotations from vendors for selected pieces of new equipment.Our computing facility staff provides much of the support for maintaining our equipment. For routine maintenance,we do not purchase maintenance contracts, but purchase parts when needed and have our technical staff install it. Wepurchase maintenance contracts when we don’t have the in-house expertise to maintain it ourselves.

For the equipment requested, we propose to purchase maintenance contracts on the SGI ONYX (quoted at $12,000from the vendor for the initial configuration), on the SGI High Impacts, and on the Cisco 7523 router. We alsoanticipate that we will build up an inventory of spare parts for some of the equipment.

All of the maintenance costs would come from the matching contributions of Columbia University to this project.

D.5 Access by Local and Remote Users

Equipment purchased under this grant will be made readily available to all research projects within the department.However, those pieces of equipment that are project specific will be primarily used within that group. Local accessto equipment will be facilitated by both our current networking environment and the proposed high-speed networkingequipment we are requesting under this grant. Regular modems and ISDN modems are already in place for remoteaccess. In addition, equipment can be accessed remotely over the internet. Finally, the wireless modems requested byour mobile and portable computing area will allow high-speed local access by mobile users walking around campusand will also facilitate remote access from home.

D.6 Space Renovation

No renovation is needed to accommodate the new equipment.

D.7 Institutional Cost Sharing

Columbia University will provide $885,884 in cost sharing, which is 45.9% of the proposed NSF budget of $1,928,656.The University cost sharing grows from $134,200 in the first year of the project to $257,279 in the last. The details

14

of both budgets are contained in Section F. Columbia University’s matching budget for this proposal includes salaryand benefits of $78,119 for one programmer, $160,000 for equipment maintenance and software licenses, $147,765 inindirect costs, and $500,000 for matching equipment purchases. (The University has agreed to match half of NSF’sequipment contributions.)

15

E MANAGEMENT STRUCTURE

The current NSF CISE Institutional Infrastructure grant has contributed significantly to the research capabilities ofthe department. The management and operation of the current CISE II grant are functioning very smoothly andsuccessfully. We propose to adopt our current management organization to the oversight of this grant.

The project will be supervised by a Research Infrastructure Steering Committee consisting of representatives fromeach of the four major research areas (Profs. Allen, Feiner, Nowick, and Stolfo), the Department Chairman (Prof.Alfred Aho), and the Computing Research Facility manager. This committee will meet biweekly with the facilitiesstaff to oversee the equipment and research issues associated with the project. The facility staff members will besupervised by the Computing Research Facility manager.

The steering committee will have two primary responsibilities. At the start of each grant year, it will consult withthe project researchers and formulate equipment purchase priorities in accordance with this proposal.

The installation and maintenance of the equipment will be the responsibility of Computing Research Facility.The research facilities created by this proposal will form a scalable multimedia information processing testbed. Theequipment will become part of the department’s overall computing research facility and will be accessible to allresearch faculty and staff, but with priority to project members.

The other main responsibilityof the steering committee will be monitoring and guiding the progress of the proposedresearch. The representatives from each area will coordinate research activities within their areas and communicatethe results of the research to the rest of the department at its regularly scheduled faculty meetings. Publication ofresearch findings to the external community will be done through the established scientific research societies, meetings,journals, and proceedings.

16

F BUDGET

17

G RESEARCH

G.1 HIGH PERFORMANCE INTEGRATED MULTIMEDIA INFORMATION SYSTEMS:

Alfred Aho, Leana Golubchik, Gail Kaiser, Kenneth Ross, Salvatore Stolfo

G.1.1 ABSTRACT

This group of researchers is engaged in developing algorithms and experimental systems to facilitate the effectiverepresentation, storage, and access of various forms of multimedia data. A common theme binds all of the proposedefforts. How might we empower a user or system developer with tools and systems that effectively and efficientlyfind and present relevant information from a large sea of expanding multimedia information sources? A number ofsolutions to various aspects of this problem are proposed. High performance storage systems are required to be ableto maximize network and I/O bandwidth and throughput to quickly deliver the results from intelligent and powerfulsearch engines that operate over multimedia information. Certain searches may involve ad hoc queries that requirefast query processing over massive warehouses of data. Systems that provide these capabilities need to be architectedby programming environments that manage the integration of large complexes of cooperating information systems.

G.1.2 BACKGROUND OF PROPOSED RESEARCH

Networked information is rapidly changing the way people are conducting business, educating themselves, deliveringentertainment, and interacting with one another. As storage technology improves, there is greater demand for puttingmore and more data either on-line or near-line. Organizations see greater benefit to putting all of their data into largeknowledge repositories from which requests for increasingly complex queries can be made. Multimedia capabilitiesprovide the potential for supporting new educational, medical, informational, and entertainment applications.

The new networked multimedia information environment presents substantial technical problems before it candeliver its full benefits to society. We need to understand how to design storage hierarchies that can deliver multimediainformation in a cost-effective manner with the requisite quality of service. Designing multimedia storage systemshas recently been an active research area, but most of the efforts have concentrated on video-on-demand servers. Indesigning and building large high performance multimedia storage systems, one must consider a whole spectrum ofapplications, from relatively low bandwidth, high throughput, and “just-in-time” delivery of video-on-demand serversto very high bandwidth, relatively low volume, and “ASAP” delivery of supercomputing/scientific applications. Suchsystems must be able to accommodate the various storage, performance, and reliability requirements of the differenttypes of media and applications. Our goal is to study cost-effective designs of multilevel multimedia storage hierarchiesto support the needs of a variety of applications; part of the effort will be simply to understand the constraints andgoals well enough to appreciate what is possible.

As more corporations are putting their business data into information repositories, they need efficient decision-support systems to operate over extremely large amounts of "warehoused" data. In this context, the traditional relationalquery processing algorithms in use today are stressed to their limits of performance. We propose a new algorithm, thejive-join algorithm, that has many favorable scaling characteristics for efficiently executing complex ad-hoc queriesover very large data repositories.

As more of the world’s information becomes accessible over the Internet, users need more effective mechanismsto determine what information exists and how to find what they need. Most search engines provide the means ofefficiently searching indices or text-based keywords. We propose a new approach to developing effective searchengines. We propose building agent-based systems that launch machine learning programs to remote databases tolearn models or classifiers of that data. Integrated collections of these data models then serve as complex patterns forsearching over remote data sources more intelligently than simple keyword or text retrieval searches.

We foresee the need for customizable multimedia search engines that can efficiently locate multimedia informa-tion stored in diverse distributed knowledge repositories. Over the past few years, a succession of more versatileinformation-presentation tools have appeared on the Internet. The progression has included systems such as NetNews,FTP, Gopher, WAIS, Mosaic and Netscape. These tools have resulted in an explosion of Internet traffic. But these

24

tools are having difficulty coping with the volume and diversity of the various network-accessible information sources.We propose a research program to develop search-engine generators for emerging multimedia application areas.

We also see the need to develop hypermedia collaboration environments to make hypermedia a truly activeinformation resource. Today hypermedia collaboration is based primarily on individual accesses by browsers or clientsto independent servers supplying individual files one at a time. To support active hypermedia, we propose a newprogramming environment that allows servers to negotiate services that they will supply to enabled clients and to eachother in a collaborative large scale information system.

Each of these topics is described separately in the following pages.

G.1.3 SUMMARY OF PROPOSED RESEARCH

Large High Performance Multimedia Storage Systems: Leana Golubchik Our main goal in this research isto develop cost-effective designs of multimedia mass storage servers, which are integrated systems consisting of amultitude of (possibly network attached) storage devices, organized in some hierarchical fashion, whose function it isto provide cheap and rapid access to vast amounts of multimedia data. The “solution” to such problems has never beento adhere to the limitations of the current technology, but rather to store and retrieve data intelligently. Through properresource management, which includes data layout schemes, scheduling techniques, access control methods, and so on,we can perform load balancing, reduce latencies, and improve data transfer rates as well as throughputs of storagesystems (e.g., [Golubchik et al.-September 1995]). Thus proper resource management is essential in providing, in acost effective manner, the required quality of service in large high performance multimedia storage systems.

A great deal of research in the area of multimedia storage servers thus far has largely concentrated on design ofefficient video-on-demand storage servers, including our own work, where we emphasized cost-effective fault tolerantdesigns [Berson et al.-May 1995] and novel data sharing techniques [Golubchik et al.-May 1995]. Although a lot ofimportant aspects of multimedia information servers have been illustrated in the context of such systems, this is onlythe tip of the iceberg, in a sense that video-on-demand is a relatively simple application. There exists a multitude ofinteresting and important applications, and they all require support of an efficient storage system. We briefly describea few of these applications and the associated storage challenges below.

As wireless and mobile communication matures, mechanisms for effective mobile access to multimedia informationare becoming increasingly important. We would like to investigate delivery of multimedia data, including continuousmedia (such as video), over relatively low bandwidth channels to devices with relatively little memory. In our currentwork [Berson et al.-May 1995, Golubchik et al.-May 1995], we have already assumed very simple display stationsin the context of video-on-demand servers, i.e., in our data layout, scheduling, etc. scheme we have assumed thatthe “pacing” of data delivery and the associated burden of resource management (e.g., buffering), have fallen on thestorage server. Thus, we expect good results when applying and modifying these techniques for use in wireless andmobile access to multimedia data. Furthermore, new problems, unique to mobile computing environments, must alsobe addressed; for instance, the issue of hand-off (from one base-station to another) during a long video delivery.

As a user moves about in a mobile computing environment, or even as new users arrive to a more “traditional”multimedia access system, there is a necessity to dynamically adjust delivery of multimedia information to the changesin the system’s workload. One of our research goals is to investigate selective multimedia data delivery, which willallow adjusting of data delivery to current availability of resources. For instance, we may exploit the “additive”property of image representation to adjust the quality of display, depending on the availability of communicationbandwidth. Or, due to lack of resources, we may dynamically substitute delivery of a video object by delivery of acombination of “corresponding” audio and image objects.

Another interesting extension of our current work is to investigate (real-time) delivery of multimedia data to3D display or virtual/augmented reality applications. These applications exhibit similar (to VOD servers) real-timerequirements but are far more interactive. For instance, this includes architecture applications, such as visualizationof building structures, various scientific visualization problems, distributed multimedia warfare simulations, medicalimaging applications, etc. Most current, state of the art, systems can only handle 3D or virtual worlds that fit in mainmemory. Current needs call for worlds that will not even fit on disk-based systems. Thus, to allow access to reasonablesize virtual worlds, we must consider the design of multilevel storage hierarchies.

When it comes to dealing with a multilevel storage hierarchy, one of the challenging issues that must be addressed is

25

the migration of multimedia data through the multiple levels of the hierarchy. Our goal is to investigate the problems ofproper modeling of different storage architectures (disks, tapes, etc.) in a simple but effective manner so that movementof data through the storage hierarchy and its reorganization on different architectural devices can be optimized, in thecontext of various applications.

In summary, there is a multitude of real-time and interactive applications of multimedia storage servers, includingvideo-on-demand, scientific visualization, simulated world environments, and many others. Technological advancesin digital signal processing, data compression techniques, and high speed communication networks have made suchsystems feasible. Proper infrastructure will allow us to show that through efficient use of resources in multimediastorage servers we can make these applications accessible.

Data Warehousing: Kenneth Ross Data warehousing is the process of putting an organization’s data into a singlelarge repository from which complex queries can be made. Typically, data warehousing systems are separate fromthe day-to-day operational information systems and serve to enable “decision-support” queries that have value in thedomain of the application. One model for multimedia information repositories is that of a data warehouse, from whichvast collections of information can be queried.

Efficient decision support systems promise significant added value to an organization’s information resources. Thebasic challenge today is to develop a range of sophisticated query processing techniques that can turn this promise intoreality for a wide variety of potential users. One of the main problems considered in our research is how to processqueries over very large databases, where the data is much larger than can fit in main memory. In such a scenario, aquery plan has to read the data in stages, keeping only a fraction of the total amount of data in main memory at anytime. One would like to avoid plans that have to read data into the database more than once at different stages of queryexecution. The largest component of the time spent in a database system is the time taken to transfer data betweensecondary storage (such as disks) and main memory. It is thus essential to try to minimize this cost.

Jive-Join: One way to speed up queries is to make use of some precomputed access structures such as join indexes.Valduriez studied this problem and proposed an algorithm for computing a join using a join index [Valduriez-1987].More recently, we have proposed a new technique called “Jive-Join” [Ross and Li-1995, Ross-1995]. For very largedatabases, whose size is many times that of the machine’s main memory, we demonstrate that Jive-join has very goodperformance characteristics, and outperforms Valduriez’s algorithm, often by a large margin.

Jive-join requires one pass through each input relation, one pass through the join index, and two passes through atemporary file whose size is half that of the join index. For small memories and large input relations this performanceis a significant improvement over Valduriez’s algorithm, which needs to make multiple passes over one of the inputrelations. Jive-join has a number of good properties:

� Jive-join applies under a relatively lenient condition in which the smaller relation is assumed to take a numberof blocks less than half the square of the number of blocks in memory.

� Jive-join is skew-resistant, and will perform equally well independent of the skew.

� Jive-join writes the output result in two phases, into two separate files. By doing so it avoids having to rereadpart of the join result.

� Jive-join performs better than its competitors in a wide range of settings. For small memories and large inputrelations the improvement can be dramatic.

� Jive-join’s performance is worse than Hybrid-hash join only when the size of the join index is particularly large.In this case, the join result will also be particularly large, and the cost of writing the output result will dominatethe total cost.

� Jive-join has recently been implemented at Columbia, and our initial experimental results are very encouraging.Jive-join (and a closely related algorithm developed at Columbia called Slam-join) appear to perform muchbetter than their competitors for large databases, as predicted.

Our proposed research plan is as follows:

26

� We plan to design general query-processing techniques based on Jive-join. We have a preliminary architecturefor such a query-processing system, and show that it can, under reasonable circumstances with large inputrelations, perform much better than traditional approaches. This component of the research may involve newalgorithms in addition to new ways of combining well-understood techniques.

� We plan to build an actual query processing system based on the techniques discussed in [Ross and Li-1995, Ross-1995] and techniques developed during the initial period of the CISE grant. The implementation will focus onachieving good response times on realistic decision-support queries over large data sets. An important componentof the implementation is extensive experimentation and performance measurement. Once a robust centralizedalgorithm is developed, parallelization techniques will be investigated and implemented, and techniques forminimizing contention between concurrent queries.

� We expect to be able to derive techniques for automatically generating good plans for a significant class of SQLqueries, including selections, projections, joins and aggregations.

The work outlined in this proposal tackles the hard problem: answering queries that require accessing gigabytes orterabytes of data. Much recent research in decision-support query processing has focused on special kinds of selectivequeries that can be answered efficiently by looking only at a small fraction of the data, even when the database isvery large. While this recent research is useful for the special cases considered, it does not address the more difficultproblem addressed in this proposal.

This research will have a fundamental impact on the practice of answering decision-support queries. Our researchis distinguished from commercial research and other research efforts by its use of the novel Jive-join algorithm, forwhich a patent has already been filed. We have demonstrated that a query processing based on Jive-join can, inprinciple, outperform traditional techniques. With this research we shall be able to demonstrate this performance inpractice, and quantify the performance gains.

Meta-learning Agents for Scalable Data Mining: Sal Stolfo Access to information sources over the Internet todayis easily available either to the most knowledgeable and sophisticated user, or to those with seemingly unbounded timeto surf about hoping to find relevant information. Once some source is found, the user is responsible to define themost appropriate search criteria in terms of syntactic constraints imposed by the information provider, i.e. keywords.At times, the keywords available for searching may bare no apparent relationship to the desired topic. (For example,point your favorite web searching engine to find information at stis.nsf.gov, and try to map the keyword “6855” to itsintended meaning.) Although the paper document metaphor with hyperlinked hot words and scrolling lists providesa useful standard interface to aid and organize the user’s search, it is still the end-user’s responsibility to formulatetheir own search criteria with little or no direction. The user must plan their attack on some set of seemingly relevantsources, and trace down as many links as is necessary to scroll through pages of text and graphics to quarry thefew nuggets of information they seek. This browse/keyword-search/stream/sift paradigm of information extractionnaturally consumes large amounts of user time and network bandwidth, and stresses current Internet capabilities to itslimits at peak usage times.

A number of research efforts seeking to provide intelligent information extraction over a network base their effortsupon (logic-based) semantic models of the information sources available on the network. In this way, users or theiragent proxies, may search a data source based upon high-level, semantic-based queries, rather than keyword searches.Rather than demanding that each information source provider produce their own constantly updated and consistentlymaintained semantic model, we propose to explore a radically different path. We propose to explore IntelligentInformation Retrieval by empowering the user with the ability to automatically form their own semantic model ofsome arbitrary information source. We view the extraction of relevant information from some source of data as auser-initiated learning task, and the data as training data from which knowledge may be gleaned automatically by amachine learning agent. Machine learning agents launched by end-users for their own directed purposes may extractknowledge in the form of classifiers or models that can be used directly to support semantically driven access to otherdata sources (for example, general query processing or selection of relevant data, merging data on common topicsfrom multiple sources, and in support of tasks to build ontologies or thesauri).

27

Most modern machine learning research focuses on generating and evaluating the single best model, or at mostselecting one model from a set of learned models. Recently, however, there has been a considerable amount of researchon various techniques that aim to compute a number of different models and then to “merge” their collective predictionsin some principled fashion. Such an approach allows one to exploit the variety inherent in a set of different learnedmodels; by integrating the models, often the predictive accuracy and/or the training efficiency of the overall systemcan be improved.

The various multiple models learned over a set of training data can be generated by using multiple learningalgorithms, different training example distributions, different output classification schemes, different hyperparametersettings or training heuristics of the learning algorithm, disjoint data partitions, and/or different parameters of thelearned model. Previous work has shown that there is utility in all of these approaches. In many of the cases reported,a classification system composed by the integration of a number of separately learned classifiers or models tends toimprove overall accuracy achievable by any individual model. Furthermore, several proposed methods are amenableto direct parallel or distributed computation for improved efficiency and scalability of machine learning applied tovery large databases. The latter is perhaps most important for contexts where scaling is crucial to operate over largeamounts of distributed data available over a network of remote sites, eg., web database sites.

Our approach to solve the scaling problem is to execute a number of learning processes (each implementedas a distinct serial program) on a number of data subsets (a data reduction technique) in parallel (eg. over anetwork of separate processing sites) and then to merge the collective results through a process we call meta-learning.Here, meta-learning serves as the means of “gluing” multiple knowledge sources together, guided by a number ofpreviously reported techniques that have consistently shown improvements in overall accuracy. The various meta-learning strategies we have implemented and reported in several articles (most recently in [Chan and Stolfo-1995a,Chan and Stolfo-1995b, Chan and Stolfo-1996]) do not require a mapping between complex “concept representationlanguages” into one standard language to share knowledge among classifiers. Instead, meta-learning is implementedby integrating the “behavior” of a number of models on common training data. This means that we learn howclassifiers correlate with each other on common training data. This approach, therefore, provides a relatively easymeans of relating distributed knowledge sources without the daunting problem of developing either a “logical crossbar switch” to convert between multiple representation languages, or forcing standard algorithms to adopt a singlestandard representation language.

The general meta-learning techniques we have demonstrated are independent of the underlying learning algorithmsthat may be employed. Furthermore, the approach is independent of the computing platform used. Thus, meta-learningis scalable as well as portable and extensible, and amenable to direct execution in network computing environments.

The system capability we intend to demonstrate will provide an important new way to view remote data of allsorts. By extending data sources from collections of (seemingly) associated keyword terms, to structured collectionsof user-classified data, end-users can explore a new paradigm for intelligently extracting information from any source.Each database may be viewed by a user in their own way as training data to learn a semantically richer model of thedomain from which the data is sampled. What a user may be able to automatically learn from one source can be usedas a semantically guided means of extracting data from other sources. (And indeed, the user might learn automaticallywhat the keyword “6855” at stis.nsf.gov actually means!)

Multimedia Search-Engine Generators: Al Aho The ultimate goal of this research is to develop effective methodsfor finding relevant multimedia information based on search criteria that include textual, graphic, video, and audioinformation. In the past twenty-five years, string pattern-matching research has developed remarkably effectivealgorithms for finding patterns in textual databases[Aho-1990]. Efficient algorithms have been developed for findingkeywords, sets of keywords, regular-expression patterns, hierarchical patterns, and various kinds of graphical patterns.These algorithms are now used routinely to construct indexes for textual databases, search for keywords in documents,find lexical and syntactic patterns in programs, look for bibliographic data, and many other related tasks.

We seek to develop information-finding algorithms that can be applied to the new problems presented by thegrowing volume of on-line information and the increasing diversity of current applications. We would like toincorporate sophisticated models of context information with existing and future search strategies in order to moreaccurately find relevant information. A technique we would like to explore is the incorporation of Boyer-Moore

28

methods with the Aho-Corasick algorithm to create search engines that can very efficiently look for complex searchpatterns. Preliminary experiments indicate that these methods are successful in efficiently finding patterns withhundreds of thousands of elements in massive amounts of data. In addition, we propose to investigate the use of thesemethods for approximate search where we can find information that is close to what we ask for under various distancemetrics. It would also be interesting to see whether these automata capable of searching for complex patterns can besuccessfully harnessed to look for patterns in image databases.

If these methods are successful, then the algorithms can be incorporated into search-engine generators that can beused by various applications to construct pattern-matchers for specific application areas. The Glimpse search enginein Harvest provides an example of approximate pattern matching in more limited context.

Interoperability and feature-interaction problems threaten to impede the development and large-scale deploymentof new services and applications in large distributed environments[Aho and Griffeth-1995]. While the automaticconstruction of search engines can facilitate interoperability of application programs with heterogeneous informationsources, we also propose to develop algorithms for tools that can test for seamless interoperability between searchengines and information sources. A good foundation for this approach has been been laid by our earlier work in thedevelopment of the now widely used algorithms for protocol conformance testing[Aho et al.-1991].

Active Hypermedia Collaboration Services: Gail E. Kaiser We propose to develop an infrastructure for HYper-media COllaboration Environments (HYCOEs). The rationale for introducing HYCOEs, compared to conventionalhypermedia environments, is to make hypermedia truly active rather than serve primarily as a passive informationresource with relatively limited interaction via submission of “forms”. Annotation of shared hypermedia documents,desktop video conferencing, and individual groupware tools allow a degree of relatively informal collaboration, butcurrently do not facilitate the structure needed to support engineering design, software development, business practices,managed healthcare, and other applications over hypermedia.

HYCOEs center on hypermedia, rather than hypertext documents alone, so that all materials plausibly relevant tothe users’ daily work and/or educational activities will be at their fingertips via search and consistency maintenanceengines. Hypermedia entities may represent source and executable code, designs and layouts, architectural models,configurations, test cases and diagnostics, documents and forms, email and newsgroups, scanned-in diagrams sketchedon napkins, 3D visualizations, video snippets and audio annotations, digital library resources, etc. HYCOEs organizesuch entities externally to the conventional unstructured links via structured composition, cooperating systems of tools,task-oriented dependencies, workflow, and synchronization.

HYCOEs cannot be constructed using only currently available technology: many critical pieces are missing. Wefocus on process and transactions, since we see these services as among the most challenging as well as where wecan bring substantial expertise to bear. (Note we use the term “process” as it is employed in the software engineeringcommunity, no relation to an operating system process or programming language thread but closer to the businessnotion of workflow, and the term “transaction” approximately as it is employed in the database community, not tomean a round-trip network request and response.) These services will employ a novel tool integration mechanism andoperate on shared, writable (as well as readable) hypermedia workspaces (e.g., exploiting the emerging what-you-see-is-what-you-get or even what-you-see-is-what-I-see hypermedia authoring tools). There are numerous commercialand research systems supporting process and/or transactions, but none of them are suited to operating on and withingeneric hypermedia.

We propose to develop process (workflow) theories, languages and algorithms to model processes for hypermedia,retrofit existing processes to operate on hypermedia, intelligently filter, augment and present hypermedia entitiesaccording to their dynamic use within the collaborative workflow, and organize webs of existing non-hypermediaartifacts. A new process engine, based on our Marvel system but with with tailorable enactment semantics as wellas process model [Tong et al.-1994], will enforce application-specific constraints on hypermedia (to ensure that allrelevant entities are updated during a change); sequence workflow steps, their prerequisites and consequences (toguarantee that required tasks are completed, consistency is maintained, and management is continually notified ofstatus); guide users regarding what to do next (e.g., ordering an agenda of enabled tasks to reflect deadlines andpriorities); monitor all activities (to automatically capture rationale and history); and assist resource management (e.g.,regarding dependent and concurrent tasks).

29

We also propose to develop extensible transaction models and supporting algorithms for application-specificconcurrency control and recovery over writable hypermedia stores. Long-duration transactions subsume the “checkout”model often used in software engineering and CAD/CAM, but updates to multiple entities are committed, or aborted(undone), together. Recovery rolls back or compensates to restore a consistent state, and may be triggered by requestor a system failure. Concurrency control prevents unprivileged users, tools, etc. from viewing or overwriting partialresults of in-progress work. We will extend our previous work on “Split Transactions” [Kaiser and Pu-1992] tomodeling and enacting coordination policies that relax conventional failure-atomicity (e.g., to record unsuccessful orinterrupted attempts in the project history) and concurrency-serializability (e.g., to allow collaboration rather than thetraditional database isolation, as well as managerial or instructor access to fine-grained status).

Both services will depend on a tool integration and structured hypermedia substrate. The main investigationshere will be concerned with enveloping (or “wrapping”) of cooperating systems of off-the-shelf tools that may notbe hypermedia-aware (most are not), and structuring of distributed hypermedia subwebs. It would be nonsensical(and prohibitively expensive) to support, say, transactions potentially distributed across the entire Internet. Instead,a subweb will determine a virtual boundary on those hypermedia entities of relevance for a particular instance of aservice (there may be numerous instances, which may or may not be aware of each other). For workflow, this meansthat access to entities within the relevant subweb must abide by the defined process model and enactment semantics, butinterleaved accesses to entities outside the subweb (e.g., following links into digital libraries or on-line entertainment)would be ignored. Analogously, if a user, tool, etc. happens to access an entity outside the indicated subweb while inthe midst of a transaction, locks would not be requested.

All proposed facilities require negotiation and interoperability among decentralized, autonomous hypermediaservers and clients in order to scale to the many tens of thousands of Internet sites and many millions (potentiallybillions) of users. Here we propose to adapt the International Alliance metaphor pioneered in our previous work onOz [Ben-Shaul and Kaiser-1995], and develop formal models and efficient algorithms to dynamically define “Treaties”among independent and private servers and clients regarding the expectations and requirements imposed on each otherwith respect to process, transactions, and other services, and to support “Summits” that execute within the virtualboundaries to verify and enforce Treaties and implement the services.

Since the World Wide Web (WWW) is non-proprietary and nearly universally available, and the industry ObjectManagement Group has announced plans to formulate the CORBA “plug-and-play” distributed computing standard asa WWW service, we plan to employ WWW as our main implementation vehicle. This will ease technology transitionof our HYCOE infrastructure, since most target user organizations and vendors already operate WWW sites and havedeveloped some expertise with WWW applications and toolkits. However, our technology will not be inherently tiedto either WWW or CORBA protocols and formats (among other reasons because these are rapidly changing even asthis proposal is written, and can be expected to evolve in unanticipated directions over the next five years).

G.1.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE

G.1.4. Importance of the Proposed Infrastructure

An NSF Research Infrastructure grant would allow us to upgrade our storage, networking, and processing capabilitiesto conduct the research proposed. In order to be able to experimentally verify the proposed database query processingtechniques proposed by Professor Ross, we need to be able to create a realistic operating environment with highbandwidth storage units (fast disks or disk arrays) and high bandwidth networks. The multimedia search-engine-generation research proposed by Professor Aho requires networking and server capacities approaching hundreds ofMIPS to allow experimentation with massive multimedia data sets. Professor Stolfo’s experiments require a substantialamount of computing power, measured in the many billions of operations per experiment, that can be delivered by anetwork of low cost state-of-the-art PC’s. The study and implementation of data layout, scheduling, fault toleranceschemes in Professor Golubchik’s research requires significant amounts of storage space as well as computational andcommunication power. To conduct the proposed research in active hypermedia, Professor Kaiser requires SparcStation20s that support multimedia, particularly animated graphics and video. A detailed description of our infrastructureneeds is given in Section C.

30

G.1.5 INTERACTIONS WITH OTHER PROJECTS

Professor Aho’s algorithms for search-engine generation can be used in projects at Columbia and other institutionsthat require pattern matchers. Aho is also collaborating with researchers at AT&T Bell Laboratories on algorithmsfor protocol conformance testing and with researchers at Bellcore on resolving the feature- interaction problem intelecommunications services.

Our work on large high performance multimedia storage systems will benefit from interactions with many projectsdescribed in other parts of this proposal. For instance, our efforts in designing a storage system for 3D or aug-mented/virtual world environments can greatly benefit from Professor Allen’s work on automatic 3D model generation;similarly we can benefit from the collaborative work of Professors Feiner and Nayar on 3D visual mobile databasesas well as Professor Feiner’s work on augmented reality. Furthermore, these projects can benefit from a storagesystem designed for applications that have very large bandwidth and storage requirements with real time constraints.The issues of providing necessary QoS through efficient resource management, which arise in the design of highperformance multimedia storage systems, are very similar to the issues being currently addressed (in the same context)by the networking community. In order to build successful systems, it is essential that storage and networking projectscoordinate their efforts; hence, we also plan to collaborate with Professor Yemini’s QuAL group.

To accomplish the task of demonstrating meta-learning agents in realistic environments, Professor Stolfo’s grouphas teamed with a partner who will provide a commercial product that implements an infrastructure to apply remotelearning processes, and data access mechanisms for direct use over server sites on the Internet. SMART’S (SystemManagement Arts Inc.) SOS agent technology provides a simple and direct means for developing network agents of allsorts. SMART’s agent technology is a derivative of the work conducted by Professor Yemini and the DCC Laboratory,whose research is described elsewhere in this proposal. Joint efforts will naturally continue between these two groups.Therefore, the learning agents developed shall encapsulate all of our existing tools as SOS-agents and demonstratetheir operation over a number of sites on the Internet. The results we achieve will be able to be easily ported to otheragent infrastructures that may be developed in the DCC lab or by other groups.

Professor Ross’s work on decision support queries will interact strongly with the Medical Informatics departmentat Columbia. Professor Stephen Johnson of that department is working with Professor Ross on the developmentof an object-oriented database system for storing and managing patient and hospital data. As a component of thiscollaboration, our decision-support system will be applied to queries over huge medical datasets. There is also potentialfor collaboration between Professor Golubchik and Professor Ross on developing physical data storage architecturesspecifically for supporting decision-support databases.

Hypermedia for collaborative work may consist of seemingly random snippets of information from arbitrary oreven anonymous authors (not necessarily all involved in the collaboration), which is difficult to search and draw usefultask-oriented and user-appropriate knowledge. The hypermedia contents and its meaning is generally domain-specific,with a priori known user “roles”, for the particular collaborative work application. But since there are so many plausibledomains and more are emerging daily a “generator” system of explanatory utilities based on domain/role knowledgerepresentation is needed. For these reasons, we see a natural synergy between the work of Professor Kaiser’s group andthe work on Computer Human Interaction of Professors Feiner and McKeown. Furthermore, collaborative work has astrong mobile nature. Participants may access at low-bandwidth (dialup telecommuting, infrared/wireless wanderingaround the organizational campus) and may disconnect entirely for arbitrary periods of times Workflow providessemantics for intelligent prefetching, to download in advance data that will be needed for the new work the user plansto do. It is natural, therefore, that interactions will develop with the research on mobile systems described elsewherein this proposal.

31

G.2 VISUAL INFORMATION PROCESSING:

Peter K. Allen, John R. Kender, Shree K. Nayar

G.2.1 ABSTRACT

We propose eight projects, each of which is concerned with exploiting visual and spatial information in service tothe needs of more traditional man-machine interface issues. These projects capture, encode, store, and process visualdata into forms more amenable to human consumption–forms ranging from depth and location maps, through fullobject reconstructions, to descriptive English sentences. Our eight projects include the development of novel real-timesensors for the acquisition of three-dimensional depth information; the design of a head-mounted camera system torefine estimates of a person’s three-dimensional location; further exploitation of patented algorithms for recognizingobjects regardless of lighting conditions; the design and construction of a search engine with applications (amongothers) to storage and retrieval within multimedia image databases; the development of a complete system for thereverse engineering of physical objects by capturing their detailed three-dimensional spatial construction; the useof this system to encode and transmit spatial information and reconstruct objects via a three-dimensional FAX; thedevelopment of representations and filtering techniques for deep spatial relationships in a system that describes thecontents of medical radiographs; and the exploitation of the knowledge of human gesture grammars in a system thatenables screen menus to be selected visually without a mouse.

G.2.2 BACKGROUND OF PROPOSED RESEARCH

Each of these projects has its own history and significance. We enumerate them in turn.

Real-Time Three-Dimensional Sensors A large fraction of the databases of multimedia systems can be expectedto be visual in nature. Almost all of today’s databases include 2D images that have been acquired using off-the-shelfCCD cameras. While these sensors may suffice in the case of 2D scenes (such as documents, paintings, engineeringdrawings, etc.), our visual world (including, terrain, buildings, trees, sculptures, etc.) remains 3D and its recovery andrepresentation requires the use of more sophisticated sensors that can directly extract 3D structure. Such sensors havefor long been acknowledged as key components of future virtual reality engines. Given a small number of views ofa scene taken from different viewpoints, the recognition of an intermediate (novel) view is known to be ill-defined inmost cases. Both recognition [Nayar and Bolle-1996] and rendering [Oren and Nayar-1994] [Nayar and Oren-1995]of novel views becomes possible when the 3D shape and reflectance properties of the scene are known.

Head-Mounted Camera for Location Refinement Related to the topic of 3D visual databases, is the problemof self-calibration. Prof. Feiner has outlined a research plan that addresses the problem of interacting with visualdatabases. Such technology would enable a user wearing a head-mounted display to walk around a university campusfor instance and have textual information overlaid on the image of the scene seen by the user. In order for such anaugmented reality system to function, it would require fairly precise knowledge of the location of the user at anypoint in time. Dead reckoning estimates can be provided by devices that use, for instance, a GPS system. However,estimates provided by GPS could be anywhere from a few inches to a few meters away from the actual coordinates ofthe user. Depending on the viewpoint of the user and the structure of the 3D scene, such discrepancies could result inlarge errors in the user’s perspective of the scene.

Appearance Matching The research problem described above lies in the realm of visual recognition. In the aboveexample, the system recognizes low-level visual features scattered around the scene to aid registration. In contrast,a recognition system that can identify and estimate the pose of large objects would be valuable to an interactivemultimedia system. For instance, in the above example, if it is possible to recognize objects in the camera imagetaken from the user’s perspective, it would be possible to communicate detailed object-level information to the user.Of course, if it were possible to store the entire visual world in the database, such a recognition capability would be oflittle use. However, any 3D visual database is prone to change, given that most scenes are dynamic in nature and can

32

be assumed to be constantly updated with new physical objects. Hence, the multimedia system would have to copewith the appearance of previously unseen objects. In this context, a recognition system that can identify a new objectand provide information regarding it, can be used to (a) provide additional information to the user and (b) update the3D visual database.

Search Engine for Multimedia The final stage of visual recognition, namely, matching, can be viewed as a high-dimensional search problem. As such, the problem is no longer restricted to vision; other sensory domains, and eveninformation domains that are purely abstract, can benefit from efficient implementations of multi-dimensional searchalgorithms, particularly if they exploit custom-designed but inexpensive hardware [Nene et al.-1994].

Automated 3-D Model Generation The ability to model the 3-D world with high precision is an important goalin computer science. Increases in memory size, computing power and graphics’ engine capabilities has resulted inthe creation of very realistic simulations and virtual worlds that can often replicate the real world. This has affectedengineering disciplines, medicine, and training/education functions. Rather than use real machines and environments,significant cost savings and ancillary benefits can be derived from interactive, computer modeled environments.

A key component in accurate and complete modeling of the physical world is the automatic generation of 3-Dmodels into appropriate computer compatible formats (e.g. Computer Aided Design data structures). Currently, muchof model building is done “by hand”. A designer will use a combination of measurements and 3-D viewing graphicsto create a “realistic” model of an object, and then hope it is both accurate and complete. Simple objects are easilygenerated, but as the complexity of the object increases, so does the design/acquisition time. Complex objects usuallycan only be modeled by skilled CAD designers.

Further, there are real issues with fidelity of the modeled object, particularly when viewed on a 2-D screen. Manytimes the graphical appearance of an object is all that matters, and tricks can be used to create the correct visualeffect (“thats how they do it in the movies...”), while corrupting or ignoring the physical reality of the modeled object.However, as the demands of simulation and virtual worlds change to include physical effects such as forces, collisions,occlusions etc., the models are required to be, in fact, physical analogs of the real objects.

Spatial Information for OBJECT FAX This model building is also useful in the emerging field of Rapid Prototyping(RP). This is a new manufacturing technology which allows physical models of parts to be quickly built directly fromComputer Aided Design (CAD) data. These systems aim to cut manufacturers’ product development cycles bydecreasing the time from design to model and thereby reduce the time-to-market of new products. The modelsproduced are used for visualization purposes, testing, or casting. In addition, physical prototypes are often more usefulthan drawings for estimating manufacturing costs. The end result of the successful integration of RP with computeraided design and manufacturing (CAD/CAM) is a faster time to market, reduced product delivery times, and increasedoverall efficiency in the design/manufacturing cycle.

We foresee an opportunity to develop an important new class of technology products and office machines. Thesenew systems, which we call OBJECT FAX machines, will allow physical objects to be scanned at one site, transmittedefficiently and with minimal loss of data over standard telephone links, and then reconstructed by a Rapid Prototypingmachine in another, remote location. As an example, a product designer in Michigan may have sculpted a small claymodel of a new auto body, and the executives in New York would like to see it at a meeting. With an OBJECTFAX system, the car body can be scanned in Michigan and a 3-D replica of it can be produced in New York. Salesorganizations needing models of new products can create them with a phone call, and can even have each client inthe field dial up a model, transport it over the phone, and replicate it using their office’s OBJECT FAX machine.Advertising agencies can obtain a physical copy of a client’s new products directly, and an opportunity for samples ofnew products can be created on internet sites, for easy distribution to each office’s OBJECT FAX machine.

Imagery to English Descriptions Computer vision has had a long history of concentration on its lower levels(signal-related image processing) and middle levels (surface and object model matching). The higher level aspects ofvision, in which the imagery frees itself from two-dimensional and three-dimensional coordinate systems and becomesuseful information in a symbolic form, has largely been neglected.

33

The most significant shortfall has been in the communication and use of imagery, where there has been preciouslittle work on the description of the contents of imagery. These descriptions depend upon the integration of what isknown of human visual and spatial processes. Transitting from parametric models of objects to the full concepts ofmultiple objects within complex contexts is a difficult system task. It requires the ability to selectively interpret thesignificance of objects, depending on the objects around them, and to map these deep relationships within a contextinto useful output, whether it is English language descriptions, or direct commands to a computer.

We believe that the hardware platforms are now robust enough, and computer vision algorithms are now matureenough, that full systems that emphasize the human interface aspect are beginning to be possible. We propose todevelop and extend two running prototypes of systems that interpret spatial relationships from imagery input.

Human Gesture Understanding Data is not information. Image data is even less so, as there usually is so muchmore of it. It is difficult to transform data into information without a vision of its use in a full system, includingthe needs and limitations of the ultimate human user. Although much is known about how a computer can imitateand exploit the strengths and limits of the human visual system, comparatively little is known about how machinescan interpret the stylized and grammatically well-formed suite of human hand gestures. Replacing the mouse with acamera would not only free up a user’s hand, it would enable a more natural interfaces for standard menu tasks.

G.2.3 SUMMARY OF PROPOSED RESEARCH

Real-Time Three-Dimensional Sensors: Shree Nayar We propose the development of new sensors for real-timeestimation of 3D structures. Our initial effort in this direction [Nayar et al.-1995] has resulted in a real-time depthsensor that uses focus analysis to produce 512x480 depth estimates (high resolution depth maps) at 30 Hz (see Figure1). This is the first demonstration of a high-resolution, video-rate 3D camera. This sensor, however, relies on the useof active illumination for depth estimation and hence is operational only in structured environments. Insights obtainedfrom this work will be exploited to develop passive 3D vision sensors that can recover structures of outdoor scenesin real-time. To acquire a 3D database, such a sensor can be hand-held and used to scan an entire scene without userinteraction. We expect such sensors to have a profound impact on multimedia technology, enabling future systems toaccomplish recognition, tracking, and editing tasks that are known to be unreliable when only 2D images are available.For instance, a 3D camera would allow a user to visit a virtual museum (stored as a 3D visual database) and viewpieces of art from any desired viewpoint.

Head-Mounted Camera for Location Refinement: Shree Nayar Collaborating with Prof. Feiner, we plan todevelop computer vision algorithms that can refine an initial location estimate of the user’s vantage point to quicklyconverge at an accurate estimate. The idea here is to use the GPS coordinates and our 3D visual database to predict thelocations of a set of visual features in the 2D image obtained by a camera mounted on the user’s head-mounted unit.The discrepancies between predicted and actual features can be used to refine the user’s coordinates. We propose thedevelopment of fast and robust algorithms for coordinate refinement. We estimate that such algorithms should be ableto perform at video-rate making it possible for the user to be continuously calibrated with respect to the world. Thiswould allow the overlay of textual or other forms of information at precise locations on the head-mounted display asthe user walks around the world.

Appearance Matching: Shree Nayar We plan to investigate a new class of recognition algorithms that are basedon the notion of appearance matching [Murase and Nayar-1995]. These are real-time algorithms that are based notsolely on object geometry but rather object appearance, which is a function of object shape, reflectance, pose, andillumination conditions. First, a set of images will be acquired by showing an object to the system in a variety of posesand under different illumination conditions. This image set is then compressed to obtain a low-dimensional subspacein which object appearance is represented as a parametrized manifold. Novel object images can then be mapped to thesubspace and the closest manifold point reveals the object’s identity and it’s pose in the scene. The proposed subspaceapproach will enable the development of recognition algorithms that are efficient in both space and time.

34

(a)

(b)

(c)

Figure 1: (a) A video-rate 3D camera based on depth from defocus and active illumination. (b) Two images taken simultaneouslyby the 3D Camera at different focus levels. (c) A 512x480 pixel depth map of the scene computed from the two images in (b).

Search Engine for Multimedia: Shree Nayar We propose the development of a search engine for multimediaapplications that is efficient in finding nearest neighbors in high-dimensional spaces. We have already evaluated asoftware implementation of our algorithm [Nene et al.-1994]. Our benchmarks have shown our algorithm to be severaltimes more efficient that previously proposed ones based on kd-trees and R-trees. We have estimated a further speed-upof 100 times if the algorithm were implemented in hardware using off-the-shelf programmable FPGA chips. Our designfor the search engine architecture permits it to be interfaced with almost any commercially available workstation. Thisallows the search engine to be used like any other standard peripheral device such as a hard drive or a video card.

Automated 3-D Model Generation: Peter Allen Our research goal is to create realistic environments automatically,thus speeding up the simulation and world modeling task. We are currently at a point where the appropriate sensortechnology, software algorithms, CAD data structures, and high performance computing engines are converging tomake this task very feasible, with a large payoff in an increase in the scope and efficiency with which automated 3-Dmodels can be acquired.

We propose an approach to automated model acquisition that combines work in range data acquisition, segmentationand polyhedral model construction. We motivate the use of Binary Space Partitioning trees (BSPT’s) as an intermediatedata structure that can easily be derived from low-level scanned range data, and from which multiple views can beefficiently merged into a single B-rep description from which a CAD model may be derived. The BSP tree representsvolumes by partitioning space with planes, and therefore is limited in that it may only represent polyhedra. It does,however, have other attributes which make it a very attractive primitive for modeling 3-D objects, including bothrobustness and the existence of efficient algorithms for set operations. We have already demonstrated this techniqueon real range data objects in our lab [Reed et al.-1995].

Our approach consists of a series of steps which are repeated until a satisfactory model is built. Generation ofa model from a single image is the basic step in the process of building a complete model of an object. It involves

35

acquiring a single range image by moving the laser scanner to a specific position/orientation and then scanning animage. The image is then filtered in some way to remove spike noise and smooth the data somewhat. A segmentationphase then splits the data into regions of data points that are on the same geometric entity. Surfaces are then fit tothese entities, and entered into cyclic graph that keeps track of their topological attributes. From this graph a model isbuilt that includes not just the imaged surfaces but also the volume of occlusion. Figure 2 shows laser range data for 3views of an object, figure 3 shows the segmentation of the data, and figure 4 shows the models built from each viewwith their occlusion volumes. Each time a single-image model is generated it is added to the composite model usingboolean operations. These models can then be merged into the composite model.

Figure 2: Real range data of part taken from three different views.

Figure 3: Segmentations of the real range data.

Figure 4: BSP tree models of the segmented faces and occlusion volumes.

36

For planning the next view, we take advantage of the model data acquired thus far. It is possible to use thedata acquired to plan the next scan in such a way that it 1) maximizes the new information obtained, minimizingthe redundant data, 2) reorients the scanner to look at regions which were obscured by occlusions, and 3) minimizesthe total number of scans required to obtain the complete surface of the part. The net result is a scanning processwhich obtains more complete data of a wider array of parts in less time. Our past research in planning viewpoints formachine vision tasks has led to the development of algorithms which compute regions of occlusion, given a model ofan object to be viewed and models of other objects in the environment [Tarabanis et al.-1991, Tarabanis et al.-1995a,Tarabanis et al.-1995b, Tarabanis et al.-1994]. These occlusion volumes represent the set of points in space fromwhich a target object can not be completely viewed due to occlusions from some other objects. This has also beenextended to include sensor planning in the presence of object motions [Abrams et al.-1993, Abrams and Allen-1995].

A scanner planning algorithm can make use of this information to compute the next scan so that it looks at theseregions. For example, once a few views have been merged, we can generate an occlusion/visibility object model thatwill let us analyze the unexplored volume of the object and plan where to scan next. We can also use this method tomark significant features which will be used as the basis for the merging of the BSPTs which represent the scannedobject so far. This may also entail tracking of objects features over time to correctly register the scans [Allen etal.-1993]

Spatial Information for OBJECT FAX: Peter Allen Development of hardware and software (some of which isthe focus of the project) will reduce the cost and increase the functionality of an OBJECT FAX system to allow it tobe purchased and used in most manufacturing, design, graphics, sales, and production organizations. Many analogieswith image transmission and standard FAX issues map directly into this new 3-D environment, which is in many waysa natural merging of computer and manufacturing technology.

An important issue is the quantity of data produced by a scan. An .STL file (the standard input format for a rapidprototyping machine) defining an object with a (relatively modest) 30,000 triangular facets will be over 1.5 MB; pointdata to represent this same shape will be significantly larger (perhaps by an order of magnitude). Standard compressionalgorithms which typically reduce data by a factor of, at best, 2.5, will not suffice for reducing the transmission ofOBJECT FAX data to a reasonable time. What is needed is a data structure and (possibly lossy) compression algorithmwhich yields significant reduction in data while still producing output geometrically within tolerance to the originalpart. The lossy compressions used in JPEG encoding of images and PASC encoding of audio significantly reducethe required data rate and produce output which, for all but the most critical applications, is nearly indistinguishablefrom the input. We would like to develop a 3-D shape encoding technique which similarly reduces the data needed totransmit a shape while still producing output that is nearly indistinguishable from the input. In 2-D image processingRadha et al. [Radha et al.-1991] have used a BSPT representation of a 2-D image as a method of data compression.This method allows non-orthogonal segmentation of complex images. We envision using a similar BSPT scheme for3-D object transmission.

In order to resolve the above issues, more research into the nature of the replication problem is needed. How muchscanned data is necessary to represent a given shape adequately? How redundant is this data? In how few scans can thisdata be obtained? Can the data be reduced if the definition of “adequately” is relaxed? How much can this definitionbe relaxed before the output shows visible artifacts of the data reduction process? What can be done to reduce theseartifacts while maintaining data compression? What is the complexity of the compression and expansion algorithms?

Another issue is the encoding of color and other material properties. Current rapid prototyping systems producemonochromatic parts. However, neither the .STL data format nor any of the current slice data formats have provision forstoring color, reflectance, or material properties. The research described above needed for developing a OBJECT FAXsystem will be more useful if it includes provisions for multiple colors and materials. This increases the complexity ofthe scanner hardware, the planning algorithms, the data formats, and, therefore, the compression algorithms. Significantresearch will be needed to incorporate these features into an OBJECT FAX.

In summary, research is needed to determine the appropriate tradeoffs in scanning, computation, transmission, andbuild times so that a OBJECT FAX system can be made practical. Our research group has a unique blend of experiencein the field of model acquisition, sensor planning, and rapid prototyping to address many of these overall system issuesaffecting OBJECT FAX technology.

37

Imagery to English Descriptions: John Kender We propose to explore and extend one of the very few systems thatgoes from imagery to text. It is the only one known system that rests on research on human processing of prepositionsand other grammar constructs related to spatial location, and the only one known that filters its knowledge based ondependency relations between “deep” semantic prepositional forms [Abella et al.-1995].

One of its objects is to identify and describe in English, kidney stones and other related densities in radiographs,producing sentences of comparable quality to those recorded by radiologists. More specifically, since some of that isalready possible, the objectives are to improve and extend the research system already demonstrated, which producedfrom pelvic X-rays statements such as “The right upper quadrant contains a density which probably represents a stonein the upper pole calyx,” and “A stone is seen level to L4 on the left. This probably represents a mid-ureteral stone.”

A second object is to improve and extend these concepts as they apply to a second, very different domain: that ofgenerating directions to buildings, given maps of Epcot Center and related sites. The existing system already describesthe spatial location of a particular physical site (here, an attraction of the park). Subjects were asked to “follow” thesedescription of a location given by the system; high rates of success were achieved, even given highly cluttered imagery,even if sometimes the description was three sentences long.

Research emphasis will be on a keener understanding of people’s use of relative locations [Abella and Kender-1993], and exploration of the ways people differ in their use of prepositions, such as “above”. In addition, conceptsof relationships in three dimensions of space, and the additional dimension of time (“towards”, “through”, etc.) willbe explored. Related results in allied research by the principal investigator (for example, the definition and use oflandmarks for “navigation in the large” [Park and Kender-1996, Park and Kender-1993] has indicated that, in general,individual spatial relationships tend to errorful, but careful selection of multiple objects and relationships can reduceerror, even to the point of creating error-correcting descriptions.

Human Gesture Understanding: John Kender We propose to develop a second image-to-symbol system that isan innovative human-computer interface based on the combined expertise of computer vision and human psychology.Specifically, it uses specialized image hardware, neural net and task grammar software, and an understanding of humangesturing to provide a direct, mouse-less pointing interface to a menu selection system [Kjeldsen and Kender-1995].The foundation of the system is the observation that humans are limited by the cognitive complexity of using theirhands and their eyes. Consequently, the meaning of a hand gesture appears to be structured by and derived from a“gestural grammar” that both provides and disambiguates additional meaning from context.

The existing system captures images of the computer user by means of a camera attached to the workstation. Bymeans of a short training program, it tunes itself to the colors of the user’s skin and the background; additionally, itlearns to discriminate several single hand gestures, such as “point up”, or “open hand”. Using special purpose hardwareto track the hand, it exploits the observation that human gestures follow a prepare-move-pose sequence. Only whenthe image of the hand is relatively still does it use neural net software to classify the pose; a three-dimensional modelof the hand and its joint structure is cleverly obviated. A gestural grammar further disambiguates what is seen, whetherit is the motion or the pose; motions, for example, are also used to indicate spatial extents. The next result is that menuselections on the screen can be selected, and icons moved and sized, without the use of a mouse.

Research continues to be needed, however, in extending the system to handle a wider range of gestures, particularlyin full three-dimensions, and to make more robust and accurate certain other skills that mouse input still has anadvantage over. In particular, human gestures can be used to approximately indicate objects and concepts that range inscale over several magnitudes; using hand gestures to indicate scale and to select within scale more than plus or minusfive percent relative error remains to be explored.

Further, relatively little is known of the “parts of gesture” and the universality of gestures. Artificial languagessuch as American Sign clearly indicated that there are constraints on both the transmission and reception of gesturalinformation, but these limits are largely unexplored and unexploited in computer vision. Being able to quantitatethese concepts so that they can be used to develop gestural interfaces that are convenient, but nevertheless technicallyfeasible, remains a challenge.

38

G.2.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE

All the above projects are massively computationally intensive. Further, the sensor technology in all of them isdemanding. Raw power is necessary not only to make any of these contemplated systems useful, but also to allow usas researchers to better explore the design and solution spaces of each project. Our equipment needs include a fastethernet interface between our machines that do sensing, computing, and display. We will also need a high performancegraphics engine and associated workstations (such as the SGI Onyx and High Impact machines) to properly acquire,display and manipulate these large visual imagery data sets, which will also require lots of disk space. Further, aDatacube image processing engine will be required to offload low-level image acquisition functions as a preprocessingstep, along with a high resolution Photometric camera. The GPS system will be used with the spatial location matchingto give a rough estimate of position that will be refined by the appearance matching techniques described above.

G.2.5 INTERACTIONS WITH OTHER PROJECTS

These eight projects have rich and varied overlap with other projects; we present them in turn.The real-time three-dimensional sensor effort (Prof. Nayar) generates massive amounts of data: full frames of

video at full frame rates. No more effective input can be imaged for driving and stressing the large high-performancemultimedia storage systems (Prof. Golubchik), or their proposed accessing engines (Prof. Aho). The head-mountedcamera for location refinement (Prof. Nayar) was inspired by the difficulty of obtaining accurate head-trackinginformation from the technologies currently being used in Prof. Feiner’s user interfaces project. The existing andproposed system to determine objects based upon their appearances under varied lighting conditions (Prof. Nayar)is a novel type of indexing for imagery databases, and is easily subsumed as one of the variations on the themeof multimedia search engines (Prof. Aho). The proposed hardware implementation of search engine for nearestneighbors in high dimensional spaces overlaps with virtually all other multimedia work, making it more efficient, andenriching the experimental and production environment. It allows the fuller exploitation, on an applications level, ofthe underlying large high-performance storage systems (Prof. Golubchik) and their search engines (Prof. Aho); it canbe used to speed references through semantic databases, including those that provide a foundation for natural languagesummaries (Prof. McKeown); and it can serve as a target implementation for the developing technology of scalablelow-power digital systems (Prof. Nowick).

The system for accurately generating three-dimensional models (Prof. Allen) will serve as a test system for thenew 3-D sensor (Prof. Nayar). The model generation system can also be used to create real and virtual multimediaworlds (Prof. Feiner). It will also provide large amounts of data to test and exploit the large high-performance storagesystems (Prof. Golubchik); the data, because of its various forms (both range, and binary space partitioning trees),will provide unusual problems of interaction and synchronization. Once models have been derived, the problems ofefficiently accessing and transmitting them as object faxes (Prof. Allen) will expand the definition of hypermedia,and challenge the implementations of active hypermedia collaboration services (Prof. Kaiser). The system for turningvisual imagery into English descriptions (Prof. Kender), pursued in collaboration with the Department of MedicalInformatics, is an additional beneficiary as well as stressor of the large high-performance multimedia storage systems(Prof. Golubchik) and their proposed accessing engines (Prof. Aho). However, its closest ties are with the project forthe generation of natural language summaries (Prof. McKeown); in fact, the project borrows its generator from theexisting work on summaries, modifying it to handle concepts unique to visual input. The project to replace the mousewith camera imagery of human gestures (Prof. Kender) will provide a new form of data base query, interacting withthe data warehousing project (Prof. Ross), the search engine effort (Prof. Aho), and it will provide the opportunity ofenriched interface possibilities for the project on active hypermedia collaboration (Prof. Kaiser).

39

G.3 MOBILE MULTIMEDIA USER INTERFACES:Mukesh Dalal, Steven Feiner, Kathleen McKeown

G.3.1 ABSTRACT

If we are to take advantage of the union of computing and communications, we must develop user interfaces thataddress the ways in which these technologies interact. The user-interface research that we are proposing exploresseveral key aspects of this interaction:

� Language generation. To help guide users through the increasingly unmanageable thicket of online information,we will develop systems that create written and spoken summaries of textual documents and data.

� Multimedia. Complementing current efforts to improve the ways we hand-craft multimedia presentations, wewill develop systems that can generate concise multimedia presentations automatically to meet the immediateneeds of an individual user and situation. This research will involve generation of individual media, coordinationof multiple media, and representation and reasoning about temporal and spatial relationships among the media.

� Mobility. Overcoming the “one-user/one-display” paradigm ingrained in most current user interfaces, we willdevelop an infrastructure that exploits the changing set of displays and interaction devices in a mobile user’senvironment, combining hand-held, head-worn, desk-top, and wall-mounted units to create a hybrid informationspace.

G.3.2 BACKGROUND OF PROPOSED RESEARCH

In this age of information overload, systems that could automatically summarize text and data would make it possiblefor computer users to control the quantity of information that they process. Online summaries could aid users indetermining if a set of documents or data is relevant to their goals. Alternatively, they could allow users to get the gistof a document by reading only its summary. Automatically generated written and spoken summaries could also becombined with other generated media to provide high-level multimedia briefings that concisely convey key informationfrom a variety of data sources to enable quick decision making in time-critical situations.

Multimedia documents are presently authored by hand using multimedia editing systems. However, developingeffective multimedia presentations requires great skill, and is difficult and time-consuming for even the most talentedand experienced authors. Furthermore, handcrafted presentations cannot in general be customized on the fly to theneeds of the individual user and situation. To overcome these barriers to effective information presentation, we willdevelop knowledge-based, presentation systems that communicate the desired information by creating multimediapresentations automatically that are tailored to the needs of the individual user and situation.

Current computers work only or mostly on a desk or a lap, typically offering a single CRT or flat-panel displayto a single user. Mobile systems, as represented by the early generations of PDAs, trade off power for portability,which is all too evident in the size and functionality of their present user interfaces. Over the coming decade, however,advances in hardware and wireless networking will make it possible to radically change the shape of our computationalenvironment to provide user-interface support for mobile, interacting users as they move about among large numbersof wall-mounted, desk-top, hand-held, and head-worn displays. For this to happen, it will be necessary to transcendtoday’s one-user/one-display user-interface metaphor. We propose to explore this possibility by developing softwaresupport for user interfaces that take advantage of a mobile user’s dynamically changing set of displays and interactiondevices.

G.3.3 SUMMARY OF PROPOSED RESEARCH

Generating Natural Language Summaries (McKeown) While some recent approaches use statistical techniquesfor summarizing a single document with modest success, summarization in general has remained an elusive task. Wepropose to develop a system to summarize full text input using symbolic natural language techniques integrated withstatistical tools. Unlike previous approaches, our system will summarize a series of news articles on the same event,producing a one or more sentence paragraph summary. The issues we will investigate include:

40

� Use of wording and sentence structure that allows concise expression of the facts.

� Use of symbolic natural language tools to produce summaries of news articles in a specific domain.

� Integration into the summary of facts from relevant online data sources, as well as facts extracted from thearticles themselves.

� Use of statistical techniques to extend our domain specific approach to handle wider domains.

We have developed a prototype system, SUMMONS (SUMMarizing Online NewS articles), to summarize fullnews articles, focusing on techniques to summarize how perception of an event changes over time, using multiplepoints of view over the same event or series of events. Input to SUMMONS is a set of templates, modeled on thoseproduced by the ARPA message understanding systems. Each template is a record consisting of attribute-value pairsthat represent specific pieces of information that have been extracted from the original article. For example, in adomain focusing on acts of terrorism, these systems extract the identities of the victims and perpetrator, number ofvictims, type of terrorist event, etc.

Our work will build on and extend SUMMONS, drawing on our related current work on summary generation[Robin and McKeown-1993, Robin and McKeown-1995, McKeown et al.-1994, McKeown et al.-1995], which showsthat summaries typically pack information in through the use of modifiers, ellipsis (the deletion of redundant wordsin similar clauses), and words that can simultaneously convey several pieces of information. This is in direct contrastto previous generation systems that typically associate a single word with each concept in the input. We will usea variety of existing tools that will reduce the development effort, including a sentence-generation system (FUF[Elhadad-1991, Elhadad-1993]) and a large grammar of English (SURGE [Elhadad-1993, Robin-1994]). To speedlexicon development, we will use tools we developed in earlier work [Smadja-1991, Smadja and McKeown-1990,Hatzivassiloglou and McKeown-1993], which automatically identify words, phrases, and constraints on their use in agiven domain by statistically analyzing large text corpora.

Our work involves further design and development of SUMMONS to improve its robustness, to incorporate otherdata sources in a summary, and to use statistical techniques to extend the approach to other domains. For the first task,we are collecting a corpus to empirically shape the development of all components of the system and to evaluate theresults. Our corpus includes threads of articles on the same event and we have noted that later articles usually containseveral sentences summarizing earlier articles. These summary phrases serve as our targets for automated generation.For the second task, we are moving towards an agent-based architecture that will allow for incremental incorporation ofdynamically changing data sources on the web to incorporate facts about an event that are not expressed in the article.Finally, we are investigating domain-independent creation of templates using statistical techniques that segment thearticle, identify key sentences for each segment, and use shallow parsing techniques to build templates that can thenbe fed back into the summary generator.

Automated Generation of Multimedia Briefings: Mukesh Dalal, Steven Feiner, Kathy McKeown Our researchwill address generation of a variety of individual media (written text, speech, static graphics, and animation); fine-grained coordination of multiple media to create an effective, coherent presentation; and knowledge representationand reasoning algorithms to represent both the data being presented and the complex temporal and spatial constraintsamong the presentation’s media.

In previous work, we developed a system that designs multimedia presentations in which the generation ofwritten text and static 3D graphics is coordinated, including the use of cross references [Feiner and McKeown-1991,McKeown et al.-1992]. We are extending this work significantly to address the use of temporal media, such asanimation and speech, and to explore techniques for summarization. In essence, we are interested in building systemsthat generate multimedia briefings. Much like natural language summaries, multimedia briefings should succinctlyconvey information at the right level of abstraction using minimal words and appropriate graphics. In addition,multimedia briefings should use whatever combination of media is most appropriate to convey the desired information,and should coordinate these media effectively in both space and time.

Working with researchers from Columbia Presbyterian Medical Center, we have begun to apply these concepts bybuilding a system for presenting healthcare data. Figure G.3.3 shows an early version of the output created by theindividual speech and graphics generation components of this system. In this case, the user is a nurse in an intensive

41

Voice: Ms. Jones is an 80-year-old, hy-pertensive, diabetic, female patient of Dr.Smith, undergoing CABG. Presently, sheis 30 minutes post-bypass and will be ar-riving in the unit shortly. The existinginfusion lines are two IVs, an arterial line,and a Swan-Ganz with Cordis. The patienthas received massive vasotonic therapy,massive cardiotonic therapy, and massive-volume blood-replacement therapy. . . .

Figure 5: Automatically generated graph-ics, text, and speech from a multimediabriefing to be presented to an intensive-care-unit nurse. The briefing describesthe condition of a patient who has just un-dergone a coronary bypass operation andis being brought to the unit. It is generatedfrom data captured during the operation.

care unit, who is being briefed by the system about the imminent arrival of a new patient. To determine how to usedifferent media effectively, our work has involved formative evaluation with potential end users, including nurses,residents, and an anesthesiologist. Our findings provide constraints on how and when to use different media to conveyinformation quickly, a primary concern of busy caregivers. For example, spoken language can be more qualitative andcasual than written text, as long as the accompanying text (which can be perused while listening) provide the moreprecise, and often lengthier, full description. Further research is required to identify such constraints and encode themin a general manner so that they can be used in multiple domains.

Another issue is how to select words and syntactic form for spoken output. With few exceptions, almost all previouswork in language generation has focused on written text. While written text could simply be passed on to a speechsynthesizer, it is well known that spoken language differs substantially from text. Although it is unlikely that we willwant our system to generate grammatical errors (a common feature of spoken language), we do want to use languagethat can be easily understood when spoken. For example, long, complex sentences, which are common in written text,may not be easily understood when spoken. Thus, one part of our effort will be to characterize differences in spokenand written text that influence comprehension. While we will not look at the effects of intonation and prosody, we willstudy the effects of word choice, sentence length, and sentence structure. We will modify our grammar and lexicalchooser to reflect these effects, developing a tool for spoken sentence generation. This is a totally new direction forlanguage generation research that will be necessary as we move to UIs that incorporate spoken language as output.

Our graphics generation work will build on our previous work on knowledge-based graphics, which addressesthe use of AI techniques to automate the design of graphics in a variety of domains, including maintenance andrepair documentation [Seligmann and Feiner-1991], explanatory animation [Karp and Feiner-1993], and visualizationof abstract multivariate relations [Beshers and Feiner-1993], and user-interface interaction history [Kurlander andFeiner-1992]. We will be characterizing a variety of graphical techniques, ranging from 2D highlighting to 3D cameramotion, for use in automated presentation planning.

Effective coordination of media is a key issue for our research. For example, as references to objects are spoken,their representations in graphics and written text could be highlighted. This requires temporal synchronization amongmedia. This is a difficult issue since the exact order and length of each spoken reference is not known until the fullspoken sentence has been generated. Meanwhile, the graphics generator could position objects in a spatial and temporalorder that conflicts with their spoken references. Together we will address these problems of media coordination,which require knowledge of and control over the individual media generators and the complex temporal and spatial

42

constraints they impose on each other. We are developing an architecture that uses interacting partial-order plannersto resolve these constraints to fulfill a presentation’s communicative goals.

Tractable Knowledge Representation and Reasoning for Multimedia Generation: Mukesh Dalal The goalof our research in knowledge representation and reasoning (KR) is to build and understand computer systems thatexplicitly represent large amounts of knowledge and efficiently reason with them for solving a variety of problems.We are particularly interested in developing techniques for declaratively representing different kinds of knowledge(for example, categorical and probabilistic), for efficient deductive reasoning (exact or approximate) with it, and forrevising and integrating knowledge from possibly-conflicting sources.

Over the past year, we have been investigating how to represent and reason with complex temporal and spatialconstraints for knowledge-based generation and coordination of multimedia briefings. Knowledge-based systems foraccessing relevant information and generating multimedia briefings are typically built around several other kinds ofknowledge bases, including:

� Dynamic user models representing user’s intentions and preferences that are useful for filtering information andcustomizing presentations;

� Presentation models representing display and interaction capabilities and techniques that are useful for knowledge-based multimedia generation;

� Data models representing structure of available information that is useful for accessing it efficiently.

Efficient techniques for knowledge representation and reasoning are crucial for constructing and exploiting theseknowledge bases. Developing these techniques for general-purpose knowledge bases is the focus of our KR research.

In [Dalal-1995c, Dalal-1995b, Dalal-1992a], we presented a tractable method, Fact Propagation (FP), for incompletereasoning with propositional knowledge bases. FP extends boolean constraint propagation [McAllester-1990], awidely-used linear-time incomplete reasoner for clausal theories, to non-clausal theories. Our quadratic-time algorithmfor FP runs in linear time for clausal theories. FP was proved to be more complete than CNF-BCP, a previously-proposed extension of BCP to non-clausal theories [de Kleer-1990]. We have implemented FP and are currently testingit on small knowledge bases.

In [Dalal-1995c, Dalal-1995a], we extended FP to an anytime family of sound and tractable reasoners. An anytimefamily of reasoners is a sequence `0;`1; : : : of reasoners such that each `i is tractable, each `i+1 is at least as completeas `i, and for each theory there is an `i complete for reasoning with it. Given any reasoning task, one could startwith `0, and successively proceed to the next reasoner if more time is available. Our technique will generate such ananytime family given any reasoner satisfying certain simple properties.

In [Dalal-1995c, Dalal-1992b], we presented a new property, called bounded intricacy, which is shared by a varietyof tractable classes that have been identified in the literature, for example, in the areas of propositional satisfiability,constraint satisfaction, and OR-databases. We have used bounded intricacy to uncover new tractable classes inconstraint satisfaction and propositional satisfiability. Filtering out classes with unbounded intricacy may be used as a“first cut” in eliminating intractable classes of constraint problems.

In [Dalal and Etherington-1992], we presented a framework for obtaining efficient but approximate representationsand reasoners. Our approach led to two special kinds of approximations—those that are unsound but complete andthose that are incomplete but sound. These two, taken together, provide approximate, but tractable, upper and lowerbounds on the results of exact, but intractable, sound and complete reasoning.

Our proposed work involves implementing, testing, and modifying these tractable reasoners on large knowledgebases requiring short response times in several domains, including constraint satisfaction, temporal reasoning, usermodels, and multimedia presentation models. In particular, we propose to use the anytime family of reasoners ingenerating multimedia presentations where the quality and coordination among various media improves with the timeavailable for reasoning with various constraints. We are also interested in using these tractable reasoners for developingmultimedia knowledge representation systems with content-based querying using descriptive features. Shortcomingsin these reasoners discovered during this phase of work will be used in developing new techniques for designing thenext generation of tractable reasoners.

43

Figure 6: Architectural anatomy. Viewthrough our see-through head-mounteddisplay shows the steel reinforcing barsin a concrete column in the wall of ourlab and part of its structural analysis inan X11 window.

Hybrid User Interfaces for a Constantly Changing World: Steven Feiner We propose to develop a flexible,hybrid, user-interface infrastructure for mobile computing in which head-worn, see-through displays are used insynergistic combination with available stationary and hand-held displays, to benefit from the advantages of each. Forexample, multiple users wearing see-through head-worn displays could participate in a shared virtual environmentwithin which selected public information was presented on a physical wall-mounted display. Each user’s head-worndisplay could privately overlay their personal annotations on the wall-mounted display, while additional personalmaterial might appear on each user’s hand-held display. We refer to the idea of managing large numbers of objects onpossibly large numbers of displays in the virtual and real surround as environment management in analogy to windowmanagement. This is an especially challenging task when the set of available devices can change rapidly as a mobileuser moves.

The subfield of virtual environments that addresses the use of see-through head-worn displays is often knownas augmented reality and builds on Ivan Sutherland’s pioneering work [Sutherland-1968]. Our augmented realityresearch has explored the coordinated use of head-worn and flat-panel displays [Feiner and Shamash-1991], thesupport of conventional 2D window systems within 3D virtual worlds [Feiner et al.-1993a], and applications forknowledge-based maintenance and repair documentation [Feiner et al.-1993b] and architectural training [Feiner etal.-1995]. For example, Figure G.3.3 shows the user’s view through our see-through head-mounted display, as theyinspect the steel reinforcing bars inside a support column in our lab. At the right of the column is an X11 window thatcontains a text-based structural analysis of the column created by a commercial tool.

To explore how these concepts can be extended to a mobile environment, we are buildingCOTERIE [MacIntyre andFeiner-1995], a toolkit that provides language-level support for distributed virtual environments. COTERIE is based onthe distributed data-object paradigm for distributed shared memory. Any data object in COTERIE can be declared to bea shared object that is replicated fully in any process that is interested in it. These shared objects support asynchronousdata propagation with atomic serializable updates, and asynchronous notification of updates. On top of COTERIE, weare implementing a rule-based environment-management component. In collaboration with colleagues in the Schoolof Architecture, we are developing an outdoor mobile testbed application that will use this infrastructure. The userwears a backpack containing a computer and differential GPS (Global Positioning System) tracker, connected to ourdepartment computer facilities through a spread-spectrum radio modem. A combination of displays and interactiondevices will be used to present information, in this case an architectural tour of the Columbia campus. A see-throughhead-worn display (with magnetometer/inclinometer-based orientation tracking) will allow information, such as 3Dmodels and multimedia explanations, to be positioned near physical buildings in the environment. The head-worndisplay will be used in conjunction with a hand-held display, 2D stylus, and 3D mouse.

The tracking technologies currently used for this project provide submeter accuracy position tracking (at 2Hz) and

44

1–2 degree accuracy orientation tracking. While this is barely adequate for coarse positioning of virtual objects in theproximity of distant physical ones, it is insufficient for the precise registration of real and virtual objects that we wouldideally like to accomplish. We are especially interested in collaborating with Prof. Nayar to use this coarse trackingdata (or, far better, the centimeter accuracy provided by the requested dual-frequency carrier-phase GPS system), alongwith a 3D model of the environment, to guide a precise vision-based tracker, using a small head-worn camera. (WhileGPS will only work outdoors with an adequate view of the sky, there are other tetherless position tracking technologies,such as infrared optical radar, that we will substitute indoors.) We also propose to work with Prof. Nayar to use hisreal-time depth map computation approach to model the user’s changing 3D environment (especially moving objectswithin it), so that the virtual objects that we generate can be obscured by physical objects when appropriate.

G.3.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE

Our work is both time and space intensive. For example, our research on summarization and mobile user interfaceswill require massive amounts of storage for large text corpora, 3D models, and domain knowledge bases. Fast, tightlycoupled processors, dedicated to individual media (especially temporal media, such as speech and animation), andhigh-speed networking to other processors, will be vital for good performance. Since some of the tractable reasoningalgorithms involve independently executable subcomponents, the requested infrastructure would make it feasible toexperiment with parallelization. The multimedia/visualization supercomputer, multimedia workstations, and speechrecognition system are needed for the multimedia briefings that our systems will generate and for evaluating tractablereasoners for content-based querying of multimedia knowledge representations. Our work on mobile user interfaceswill benefit significantly from the requested wireless infrastructure, dual-frequency carrier-phase measurement GPSsystem, and high-resolution video projection system.

G.3.5 INTERACTIONS WITH OTHER PROJECTS

Prof. McKeown’s work on summarization is relevant to the search algorithms and data mining techniques addressed byProfs. Aho and Stolfo. When a search returns more than one document or multiple pieces of data, summarization canbe used to give the user an idea of whether the retrieved information is relevant. Prof. McKeown interacts with Prof.Kender on the generation of descriptions from images. In particular, his project has made use of the FUF sentencegeneration tool developed by the natural language group.

Profs. McKeown, Feiner, and Dalal collaborate with members of Columbia Presbyterian’s Center for MedicalInformatics in their work on multimedia interfaces to health care data.

Profs. Feiner and Nayar will collaborate on building hybrid 3D tracking algorithms and on using real-time depth-map computation to create augmented realities in which virtual and physical objects interact. The 3D models andmultimedia databases to be developed by the School of Architecture for Prof. Feiner’s campus tour testbed will offeran excellent proving ground for Prof. Golubchik’s work on multimedia delivery to position-tracked mobile users.

Prof. Dalal will make his tractable representation and reasoning systems available for use by the other projects,and together with Prof. Ross will investigate query processing where approximations are made to answer prohibitivelyexpensive subqueries.

G.4 SCALABLE SYSTEMS FOR MOBILE AND PORTABLE COMPUTING:

Steven Nowick, Yechiam Yemini

G.4.1 ABSTRACT

An important element of universal information access is the design of scalable distributed systems for mobile andportable computing. Scalability means the ability to extend a system gracefully, without affecting its various per-formance measures such as reliability and response time. This ability is fundamental to modern applications, fromlarge networks to digital systems. Mobility means that computations and services can adapt to dynamic changes inunderlying network topology and location of resources.

45

We consider two related problems in large-scale distributed computing: (i) agent-based distributed computing forlarge scale and mobile networks; and (ii) hardware design support for modular and low-power digital systems. Anagent is a program that is dispatched to, linked with and executed at a remote host. For (i), we will develop agenttechnologies to support extensible networks and mobile systems. For (ii), we will develop techniques for extensibleand low-power portable digital systems. In each domain, our focus is on the design and support of scalable and robustdistributed systems.

G.4.2 BACKGROUND OF PROPOSED RESEARCH

Computing with Mobile Agents An important component of universal information access is agent-based distributedcomputing in large-scale and mobile networks. An agent is a program that is dispatched to, linked with and executed ata remote host. Agents enable new modes of distributed computing based on dynamic extensibility of remote systems.

Agent technologies have recently attracted significant interest in addressing the challenges of emerging networkedsystems. Agents can be dispatched to analyze and transact with remote information servers. They can be used, forexample, to search and analyze Web stores or databases and identify and learn information of interest. Agents can bedispatched to accomplish complex transactions involving multiple service providers such as booking airline tickets,hotel reservations and rental cars for a vacation. Agents can be dispatched by a Web server to a remote browser inorder to dynamically extend the browser with new capabilities (e.g., to play interactive games).

Agents can also be used to manage remote systems. For example, agents can be dispatched to monitor, diagnoseand control the behavior of a remote router. They can be used to extend a remote system with new protocols or services.For example, agents can be dispatched to a remote system to support a new video-on-demand protocol.

Agents are particularly useful to support adaptive computing in mobile networks. Consider a mobile unit thatroams through a network and attaches physically to a new domain. Agents can be dispatched by the unit to configureservices in its previous domain to adapt to its new location and bandwidth availability. Agents can be dispatched tothe unit by servers of the new domain to configure it with the resources needed to interact with the domain services.Consider also a multi-hop packet network all of whose nodes can be mobile.

Preliminary proposals for agent technologies have pursued a language-based paradigm. One develops a specialinterpreted language such as Telescript and Java. An agent is a script in such a language that is dispatched to andexecuted by a remote interpreter.

In contrast, the delegation agent technology developed at Columbia University [Yemini et al.-1991] is languageindependent. Agents can be developed in an arbitrary compiled or interpreted language such as C, C++ or tcl. They aredispatched to, linked with and executed at a remote system using a delegation protocol. The remote system supportsexecution of agents through minimal portable OS extensions.

Language-independent agents support language-dependent agents as a special case. For example, one can delegatea tcl or a Java interpreter as an agent to a remote system and then delegate script agents to it. More importantly,language-independent agents enable significantly more powerful computing paradigm than interpreted agents. Forexample, interpreted agents are limited in performing real-time tasks such as monitoring and controlling a remotesystem, compressing sensor data, or processing a new protocol for multimedia communications. Such tasks are moreeffectively handled by compiled agents constructed in C or C++.

Our studies of delegated agent technologies, funded in a large part by a previous NSF grant, resulted in the onlylanguage-independent agent technologies. We had applied these technologies extensively to decentralize networkmanagement. These technologies were exported through collaborations with industry and had already accomplishedsignificant impact on commercial network management products of several companies.

Scalable and Low-Power Digital Systems Another critical component of universal information access is the designof the hardware infrastructure. Scalable, modular and low-power hardware systems are required. Scalability isthe ability to extend a system gracefully, without affecting reliability. Modularity means the use “plug-and-play”components, where portions of the system can be upgraded without concern for global issues. Low power is required,since portable systems must be lightweight and inexpensive.

To this end, our research is focused on the design of low-power self-timed digital systems. These systems haveno global clock, unlike the more common synchronous systems. Instead, system components communicate with

46

each other using local “handshaking protocols”. Somewhat analogous to object-oriented software systems, thesecomponents are basically self-contained: once local protocols are observed, there are no global timing requirementsthat must be met.

By eliminating the global synchronization requirement, self-timed systems have three potential benefits:

� low-power, since components consume power only when they compute (in contrast, synchronous systems mayconsume power on every clock cycle, even when no computation is needed);

� high-performance, because of the avoidance of clock synchronization issues;

� modularity, since systems can be composed out of components operating at different speeds.

Though self-timed design has been around for many years, it has posed many difficult design problems. However,significant progress has been made in the last 5-8 years, and there are a number of successes: (i) a low-power infraredcommunications chip for portable devices, developed at HP/Stanford using some of our design tools [Marshall etal.-1994]; (ii) a low-power digital compact cassette (DCC) error correcting chip at Philips [van Berkel et al.-1994];(iii) a self-timed divider chip which is twice as fast as existing synchronous designs [Williams and Horowitz-1991].

Our recent research has focused on the design and optimization of self-timed controllers. This work has beensupported by two NSF awards (1993 RIA, 1995 CAREER), a 1995 Alfred P. Sloan Foundation fellowship, and a grantfrom IBM Research.

Our work combines three areas: (i) optimization algorithms for hardware design; (ii) software tools for computer-aided design (CAD); and (iii) novel hardware design methods. Our main focus to date has been on high-performancehardware (not low power).

We have produced a CAD tool package for self-timed controller design which is now being used in 8 leadingcompanies and universities. The package was successfully used at AMD for an experimental design of a self-timedSCSI controller [Yun and Dill-1995], and at Hewlett-Packard for the design of a portable communications chip(indicated above). With IBM Research, we are involved in incorporating our tools into their internal synchronousdesign package, Booledozer.

Funds from our current departmental NSF CISE infrastructure grant have greatly aided in this research. Thedepartmental computing facilities have been important, since our software tools often run on compute-intensive andmemory-intensive design problems. The quality of infrastructure support has been critical in developing optimizationalgorithms, designing and running synthesis programs, and evaluating results.

G.4.3 SUMMARY OF PROPOSED RESEARCH

Computing with Mobile Agents: Yechiam Yemini We propose to continue our studies of language-independentagent technologies and focus on the following issues:

� Adaptive agent-based computing for mobile networks This study will explore the use of agent technologies tosupport adaptive computing in a mobile network environment. A mobile network experiences dynamic changesin connectivity topology and bandwidth/delay characteristics for accessing resources. We will explore the useof agents to dynamically relocate resources and reconfigure services to adapt to such changes.

� Secure agent technologies. Security is an issue for both the delegation mechanism and the target system. Forexample, one may use the delegation mechanism to dispatch viruses to remote hosts or to send agents that gainaccess and manipulate restricted resources at a remote site. We will investigate infrastructures to authenticateagents and to control their access to remote resources.

� Stability and control of agent systems. Agents create dynamic distributed coupled computations structures.Controlling the dynamics of such distributed computations is of great importance to the success of agentcomputing. Uncontrolled dynamics can lead to instabilities. For example, one can experience a “sorcerer’sapprentice” behavior where agents multiply rapidly and spread throughout a system to consume its resources.This study will investigate technologies to monitor and control agent activities. We will study instrumentation

47

of agent systems to monitor and control resource allocation and consumption by agents and to coordinatecomputations by multiple agents spread at multiple sites.

� Applications of agent technologies to intelligent information processing in large scale networks. The complexityof accessing and manipulating information spread in large scale networked systems is a central bottleneck inaccomplishing universal information access. To reduce this complexity one needs a new generation of toolsthat can search, locate, correlate and manipulate large scale distributed information. Such tools are proposed invarious other sections of this proposal. This study will explore the use of agent technologies to support suchtools. We will study applications of agents to search and locate resources and to extract and correlate informationfrom distributed stores.

Scalable and Low-Power Digital Systems: Steven Nowick For future work, we intend to expand our efforts inthree directions in self-timed design: (i) CAD tools and algorithms for low-power controllers; (ii) application toportable DSP (digital signal processing) design; (iii) low-power datapaths.

Low power design is the main goal of this research, since it is a driving force in the commercial market for portableelectronics. Our remaining goals, scalability and modularity, are inherent in self-timed systems. While our currentCAD tools are optimized for performance, different algorithms and optimizations are needed to produce low powerdesigns. We also intend to apply our techniques to portable examples from industry.

Finally, we will focus on a critical part of hardware systems: datapath components, such as adders and incrementers.The design of low-power synchronous datapaths has received increasing attention. However, an advantage of self-timed datapaths is that they have an inherent potential for low power, since they compute “on demand”. We willexplore self-timed datapath design techniques where power consumption is significantly reduced. We will considernovel structures, encoding techniques for low power, and optimization algorithms.

G.4.4 IMPORTANCE OF PROPOSED INFRASTRUCTURE

The computing resources proposed in this grant support mobile computing systems that are particularly suitable forresearch in agent computing. In particular, mobile PC computers with wireless links, and other related equipment, willbe used to explore and develop agent technologies.

Resources will also support research on low-power portable systems. Computing support from a CISE grant willenable us to design effective CAD tools, apply these tools to industrial examples, and simulate and evaluate results.Hardware support will also allow us to create rapid FPGA prototypes.

G.4.5 INTERACTIONS WITH OTHER PROJECTS

This grant will spawn interaction with several groups at Columbia and in industry:

� Artificial Intelligence (Prof. Stolfo). This collaboration will study techniques (such as learning, searching,planning, etc.) that agents must incorporate to perform their tasks.

� Mobile Computing (Prof. Duchamp). This collaboration will apply the agent paradigm to support mobilecomputing.

� Industrial Application of Self-Timed Digital Design (IBM/Intel). We expect collaboration with groups at IBMand Intel, in applying self-timed methods to industrial design problems.

� Low-Power Circuit Techniques (Prof. Zukowski, EE Department). This collaboration will explore issues inlow-level circuit design for low power.

48

G.5 Bibliography

References

[Abella and Kender, 1993] A. Abella and J. R. Kender. Qualitatively describing objects using spatial prepositions.In Proceedings of the National Conference on Artificial Intelligence, July 1993.

[Abella et al., 1995] A. Abella, J. Starren and J. R. Kender. Automated natural language description of radiographs.In Proceedings of the 19th Symposium on Computer Applications in Medical Care, October 1995.

[Abrams and Allen, 1995] Steven Abrams and Peter K. Allen. Swept volumes and their use in viewpoint compu-tation in a robotic workcell. In International Symposium on Assembly and Task Planning (ISATP), Pittsburgh,August 1995.

[Abrams et al., 1993] Steven Abrams, Peter K. Allen and Konstantinos A. Tarabanis. Dynamic sensor planning.In Proceedings of DARPA Image Understanding Workshop, pages 599–610, Washington, D.C., April 1993.

[Aho and Griffeth, 1995] A. V. Aho and N. D. Griffeth. Feature interactions in the global information infrastructure.In Proc. 1995 ACM Symp. Foundations of Software Engineering, Washington, D.C., 1995. To appear.

[Aho et al., 1991] A. Aho, A. Dahbura, D. Lee, and M. Uyar. An optimization technique for protocol conformancetest generation based on uio sequences and rural chinese postman tours. IEEE Trans. on Communication, pages1604–1615, November 1991.

[Aho, 1990] A. V. Aho. Algorithms for finding patterns in strings. Handbook of Theoretical Computer Science,pages 255–300, 1990.

[Allen and Michelman, 1990] Peter K. Allen and Paul Michelman. Acquisition and interpretation of 3-D sensordata from touch. IEEE Transactions on Robotics and Automation, pages 397–404, August 1990.

[Allen et al., 1990] Peter K. Allen, Paul Michelman and Kenneth Roberts. A system for programming and con-trolling a multi-sensor robotic hand. IEEE Transactions on Systems, Man, and Cybernetics, 20(6):1450–1456,Nov/Dec 1990.

[Allen et al., 1993] P. K. Allen, Aleksandar Timcenko, BillibonYoshimi, and Paul Michelman. Automated trackingand grasping of a moving object with a robotic hand-eye system. IEEE Transactions on Robotics and Automation,pages 152–165, April 1993.

[Beerel et al., 1995] P.A. Beerel, K.Y. Yun, S.M. Nowick, and P.-C. Yeh. Estimation and bounding of energyconsumption in burst-mode control circuits. In IEEE/ACM International Conference on Computer-Aided Design.IEEE Computer Society Press, November 1995.

[Ben-Shaul and Kaiser, 1995] Israel Ben-Shaul and Gail E. Kaiser. A Paradigm for Decentralized Process Model-ing. Kluwer Academic Publishers, Boston, 1995.

[Berson et al., May 1995] S. Berson, L. Golubchik and R. R. Muntz. Fault Tolerant Design of Multimedia Servers.In Proceedings of the 1995 ACM SIGMOD Conf., pages 364–375, San Jose, CA, May 1995.

[Beshers and Feiner, 1993] C. Beshers and S. Feiner. AutoVisual: Rule-based design of interactive multivariatevisualizations. IEEE Computer Graphics and Applications, 13(4):41–49, July 1993.

[Chan and Stolfo, 1995a] P. Chan and S. Stolfo. A comparative evaluation of voting and meta-learning on parti-tioned data. In Proc. Twelfth Intl. Conf. Machine Learning, 1995. To appear.

[Chan and Stolfo, 1995b] P. Chan and S. Stolfo. Learning arbiter and combiner trees from partitioned data forscaling machine learning. In Proc. Intl. Conf. Knowledge Discovery and Data Mining, 1995. To appear.

49

[Chan and Stolfo, 1996] P. Chan and S. Stolfo. On the accuracy of meta-learning for scalable data mining. J.Intelligent Information Systems, 1996. To appear.

[Dalal and Etherington, 1992] M. Dalal and D. W. Etherington. Tractable approximate deduction using limitedvocabularies. In Proc. AI’92, pages 206–212, 1992.

[Dalal, 1992a] M. Dalal. Efficient propositional constraint propagation. In Proc. AAAI-92, pages 409–414, 1992.

[Dalal, 1992b] M. Dalal. Tractable deduction in knowledge representation systems. In B. Nebel, C. Rich andW. Swartout, editors, Proc. KR’92, pages 393–402, 1992.

[Dalal, 1995a] M. Dalal. Anytime families of tractable propositional reasoners. Submitted 4th Int. Symp. AI andMaths, 1995.

[Dalal, 1995b] M. Dalal. A rewrite system for tractable propositional reasoning. Submitted 4th Int. Symp. AI andMaths, 1995.

[Dalal, 1995c] M. Dalal. Tractable reasoning in knowledge representation systems. Technical Report CUCS-017-95, Dept. Computer Science, Columbia University, NY, 1995.

[de Kleer, 1990] J. de Kleer. Exploiting locality in a TMS. In Proc. AAAI-90, pages 264–271, 1990.

[Dewan et al., 1994] H. Dewan, M. Hernandez, S. Stolfo, and J. Wong. Predictive dynamic load balancing ofparallel and distributed rule and query processing. In Proc. 1994 Intern. Conf. on Management of Data,SIGMOD-94, pages 277–288, May 1994.

[Elhadad, 1991] M. Elhadad. Fuf: The universal unifier—user manual, version 5.0. Technical Report CUCS-038-91, Columbia University, 1991.

[Elhadad, 1993] M. Elhadad. Using argumentation to control lexical choice: a unification-based implementation.PhD thesis, Computer Science Department, Columbia University, 1993.

[Feiner and McKeown, 1991] S. Feiner and K. McKeown. Automating the generation of coordinated multimediaexplanations. IEEE Computer, 24(10):33–41, October 1991.

[Feiner and Shamash, 1991] S. Feiner and A. Shamash. Hybrid user interfaces: Breeding virtuallybigger interfacesfor physically smaller computers. In Proc. UIST ’91 (ACM Symp. on User Interface Software and Technology),pages 9–17, Hilton Head, SC, November 11–13 1991.

[Feiner et al., 1993a] S. Feiner, B. MacIntyre, M. Haupt, and E. Solomon. Windows on the world: 2D windowsfor 3D augmented reality. In Proc. UIST ’93 (ACM Symp. on User Interface Software and Technology), pages145–155, Atlanta, GA, November 3–5 1993.

[Feiner et al., 1993b] S. Feiner, B. MacIntyre and D. Seligmann. Knowledge-based augmented reality. Communic.ACM, 36(7):52–62, July 1993.

[Feiner et al., 1995] S. Feiner, A. Webster, T. Krueger, B. MacIntyre, and E. Keller. Architectural anatomy.Presence, 4(3):318–325, Summer 1995.

[Fuhrer et al., 1995] R.M. Fuhrer, B. Lin and S.M. Nowick. Symbolic hazard-free minimization and encoding ofasynchronous finite state machines. In IEEE/ACM International Conference on Computer-Aided Design. IEEEComputer Society Press, November 1995.

[Goldszmidt and Yemini, 1993] German Goldszmidt and Yechiam Yemini. Evaluating management decisions viadelegation. In Third International Symposium on Integrated Network Management, San Francisco, CA, USA,April 1993.

50

[Goldszmidt and Yemini, 1995] German Goldszmidt and Yechiam Yemini. Distributed management by delegat-ing mobile agents. In 15th International Conference on Distributed Computing Systems, Vancouver, BritishColumbia, Canada, June 1995.

[Goldszmidt, 1993] German Goldszmidt. Distributed system management via elastic servers. In IEEE FirstInternational Workshop on Systems Management, pages 31–35, Los Angeles, California, USA, April 1993.

[Golubchik et al., May 1995] L. Golubchik, J. C.-S. Lui and R. R. Muntz. Reducing I/O Demand in Video-On-Demand Storage Servers. In Proceedings of the ACM SIGMETRICS Conf., pages 25–36, Ottawa, Canada, May1995.

[Golubchik et al., September 1995] L. Golubchik, R. R. Muntz and R. W. Watson. Analysis of Striping Techniquesin Robotic Storage Libraries. In Proc. of the 14th IEEE Symposium on Mass Storage Systems, pages 225–238,Monterey, CA, September 1995.

[Hatzivassiloglou and McKeown, 1993] V. Hatzivassiloglou and K.R. McKeown. Towards the automatic iden-tification of adjectival scales: Clustering adjectives according to meaning. In Proc. 31st Conf. of the ACL,Columbus, Ohio, 1993. Assoc. for Computational Linguistics.

[Kaiser and Pu, 1992] Gail E. Kaiser and Calton Pu. Dynamic restructuring of transactions. In Ahmed K. Elma-garmid, editor, Database Transaction Models for Advanced Applications, chapter 8, pages 265–295. MorganKaufmann, San Mateo CA, 1992.

[Karp and Feiner, 1993] P. Karp and S. Feiner. Automated presentation planning of animation using task decom-position with heuristic reasoning. In Proc. Graphics Interface ’93, pages 118–127, Toronto, Canada, May 17–211993.

[Kjeldsen and Kender, 1995] R. Kjeldsen and J. R. Kender. Visual hand gesture recognition for window systemcontrol. In Proceedings of the IEEE International Workshop on Automatic Face- and Gesture-Recognition,Zurich, June 1995.

[Kurlander and Feiner, 1992] D. Kurlander and S. Feiner. A history-based macro by example system. In Proc.UIST ’92 (ACM Symp. on User Interface Software and Technology), pages 99–106, Monterey, CA, November15–18 1992.

[MacIntyre and Feiner, 1995] B. MacIntyre and S. Feiner. Language-level support for exploratory programmingof distributed virtual environments. In Submitted, September 1995.

[Marshall et al., 1994] A. Marshall, B. Coates and P. Siegel. The design of an asynchronous communications chip.IEEE Design and Test of Computers, 11(2):8–21, Summer 1994.

[McAllester, 1990] D. McAllester. Truth maintenance. In Proc. AAAI-90, pages 1109–1116, 1990.

[McKeown et al., 1992] K. McKeown, S. Feiner, J. Robin, D. Seligmann, and M. Tanenblatt. Generating cross-references for multimedia explanation. In Proc. AAAI-92, pages 9–16, San Jose, CA, July 12–17 1992.

[McKeown et al., 1994] K.R. McKeown, K.K. Kukich and J. Shaw. Practical issues in automatic documentationgeneration. In Proc. ACL Applied Natural Language Conf., Stuttgart, Germany, October 1994.

[McKeown et al., 1995] K.R. McKeown, K.K. Kukich and J. Robin. Generating concise natural language sum-maries. Journal of Information Processing and Management, to appear 1995.

[Meyer et al., 1995] Kraig Meyer, Mike Erlinger, Joe Betser, Carl Sunshine, German Goldszmidt, and YechiamYemini. Decentralizing control and intelligence in network management. In Fourth International Symposiumon Integrated Network Management, Santa Barbara, CA, USA, May 1995.

51

[Murase and Nayar, 1995] H. Murase and S. K. Nayar. Visual learning and recognition of 3d objects from appear-ance. International Journal of Computer Vision, 14(1):5–24, January 1995.

[Nayar and Bolle, 1996] S. K. Nayar and R. M. Bolle. Reflectance based object recognition. International Journalof Computer Vision, (to appear) 1996.

[Nayar and Oren, 1995] S. K. Nayar and M. Oren. Visual appearance of matte surfaces. SCIENCE, 267:1153–1156,February 1995.

[Nayar et al., 1995] S. K. Nayar, M. Watanabe and M. Noguchi. Real-time focus range sensor. Proc. of Intl. Conf.on Computer Vision, pages 995–1001, June 1995.

[Nene et al., 1994] S. A. Nene, S. K. Nayar and H. Murase. Slam: A software library for appearance matching.Proc. of ARPA Image Understanding Workshop, November 1994.

[Nowick et al., 1995] S.M. Nowick, N.K. Jha and F.-C. Cheng. Synthesis of asynchronous circuits for stuck-atand robust path delay fault testability. In VLSI Design 95. IEEE Computer Society Press, January 1995.

[Oren and Nayar, 1994] M. Oren and S. K. Nayar. Generalization of lambert’s reflectance model. Proc. of ACMSIGGRAPH, July 1994.

[Oren and Nayar, 1995] M. Oren and S. K. Nayar. Generalization of the lambertian model and implications formachine vision. International Journal of Computer Vision, 14(2-3):227–251, April 1995.

[Park and Kender, 1993] I. P. Park and J. R. Kender. Using isolated landmarks and trajectories in robot navigation.In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, June1993.

[Park and Kender, 1996] I. P. Park and J. R. Kender. Topological direction-giving and visual navigation in largeenvironments. Artificial Intelligence Journal (Special Issue on Computer Vision), in press 1996.

[Radha et al., 1991] H. Radha, R. Leonardi, M. Vetterli, and B. Naylor. Binary space partitioning tree representationof images. Journal of Visual Communication and Image Representation, 2(3):201–220, September 1991.

[Reed et al., 1995] Michael Reed, Peter K. Allen and Steven Abrams. CAD model acquistion using BSP trees. InIROS International Conference on Intelligent Robots and Systems, pages 335–339, August 1995.

[Robin and McKeown, 1993] J. Robin and K.R. McKeown. Corpus analysis for revision-based generation ofcomplex sentences. In Proc. Nat. Conf. on Artificial Intelligence, Washington, D.C., July 1993.

[Robin and McKeown, 1995] J. Robin and K.R. McKeown. Empiricially designing and evaluating a new revision-based model for summary generation. Artificial Intelligence Journal, submitted January 1995.

[Robin, 1994] J. Robin. Revision-Based Generation of Natural Language Summaries Providing Historical Back-ground. PhD thesis, Computer Science Department, Columbia University, 1994.

[Ross and Li, 1995] K. A. Ross and Z. Li. Jive-join and Smash-join: Efficient join techniques for large relationsand small main memory. Submitted for publication, 1995.

[Ross, 1995] K. A. Ross. Efficiently following object references for large object collections and small mainmemory. In Proceedings of the International Conference on Deductive and Object-Oriented Databases, 1995.

[Seligmann and Feiner, 1991] D. Seligmann and S. Feiner. Automated generation of intent-based 3D illustrations.In Proc. ACM SIGGRAPH ’91 (Computer Graphics, 25:4, July 1991), pages 123–132, Las Vegas, NV, July28–August 2 1991.

52

[Smadja and McKeown, 1990] F. Smadja and K.R. McKeown. Automatically extracting and representing collo-cations for language generation. In Proc. 28th Annual Meeting of the Assoc. for Computational Linguistics,Pittsburgh, Pa., June 1990.

[Smadja et al., 1995] F. Smadja, K.R. McKeown and V. Hatzivassiloglou. Automatic development of bilinguallexicons. Journal of Computational Linguistics, to appear, 1995.

[Smadja, 1991] F. Smadja. Retrieving Collocational Knowledge from Textual Corpora. An Application: LanguageGeneration. PhD thesis, Department of Computer Science, Columbia University, New York, NY, 1991.

[Stolfo et al., 1990] S. J. Stolfo, L. Woodbury, J. Glazier, and P. Chan. The ALEXSYS mortgage pool allocationexpert system: A case study of speeding up rule-based systems. In AI and Business Workshop, AAAI-90, 1990.

[Sutherland, 1968] I. Sutherland. A head-mounted three dimensional display. In Proc. FJCC 1968, pages 757–764,Washington, DC, 1968. Thompson Books.

[Tarabanis et al., 1991] K. Tarabanis, Roger Tsai and Peter K. Allen. Automated sensor planning for robotic visiontasks. In IEEE International Conference on Robotics and Automation, Sacramento, April 9–11 1991.

[Tarabanis et al., 1994] K. Tarabanis, Roger Tsai and Peter K. Allen. Analytical characterization of the featuredetectability constraints of resolution, focus and field-of-view for vision sensor planning. Computer Vision,Graphics, and Image Processing, 59(3):340–358, May 1994.

[Tarabanis et al., 1995a] K. Tarabanis, Roger Tsai and Peter Allen. The MVP sensor planning system for roboticvision tasks. IEEE Transactions on Robotics and Automation, 11(1):72–85, February 1995.

[Tarabanis et al., 1995b] K. Tarabanis, Roger Tsai and Peter Allen. Sensor planning in computer vision. IEEETransactions on Robotics and Automation, 11(1):86–105, February 1995.

[Theobald et al., 1995] M. Theobald, S.M. Nowick and T. Wu. Espresso-hf: A heuristic hazard-free minimizer fortwo-level logic. Submitted to a conference, 1995.

[Tong et al., 1994] Andrew Z. Tong, Gail E. Kaiser and Steven S. Popovich. A flexible rule-chaining engine forprocess-based software engineering. In 9th Knowledge-Based Software Engineering Conference, pages 79–88,Monterey CA, September 1994. IEEE Computer Society Press.

[Valduriez, 1987] P. Valduriez. Join indices. ACM Transactions on Database Systems, 12(2):218–246, 1987.

[van Berkel et al., 1994] K. van Berkel, R. Burgess, J. Kessels, M. Roncken, F. Schalij, and A. Peeters. Asyn-chronous circuits for low power: A DCC error corrector. IEEE Design and Test of Computers, 11(2):22–32,Summer 1994.

[Van Gelder et al., 1991] A. Van Gelder, K. A. Ross and J. S. Schlipf. The well-founded semantics for generallogic programs. JACM, 38(3):620–650, 1991.

[Williams and Horowitz, 1991] T.E. Williams and M.A. Horowitz. A zero-overhead self-timed 54b 160ns CMOSdivider. IEEE Journal of Solid-State Circuits, 26(11):1651–1661, November 1991.

[Yemini et al., 1991] Y. Yemini, German Goldszmidt and Shaula Yemini. Network management by delegation:The design of a management delegation engine. In Second International Symposium on Integrated NetworkManagement, Washington, DC, USA, April 1991.

[Yoshimi and Allen, 1995] Billibon Yoshimi and Peter Allen. Active uncalibrated visual servoing. IEEE Transac-tions on Robotics and Automation, 11(5):516–521, August 1995.

[Yun and Dill, 1995] K. Yun and D.L. Dill. A high-performance asynchronous SCSI controller. In IEEE Interna-tional Conference on Computer Design. IEEE Computer Society Press, October 1995.

53

H STAFF CREDENTIALS

54

I RESULTS FROM PRIOR RI AWARD

The Department of Computer Science has grown significantly during the period (1991-95) of the current CISE grantwith the addition of 5 new junior faculty taking the total strength of the faculty to 18. The following is a very briefoutline of the major accomplishments of the faculty who participated in the CISE grant (1991-95).(a) Two faculty received the NSF National Young Investigator Award: Shree K. Nayar (1993) and Kenneth Ross(1994). One faculty received the NSF Faculty Early Career Development Award: Steven Nowick (1995). Fourfaculty received the NSF Research Initiation Award: Shree K. Nayar (1991), Kenneth Ross (1992), Steven Nowick(1993), and Mukesh Dalal (1993). Two faculty received the ONR Young Investigator Award: Steven Feiner (1991)and Daniel Duchamp (1992).(b) Two faculty received the David and Lucile Packard Fellowship awarded to only 20 scientists and engineersnationwide: Shree K. Nayar (1992) and Kenneth Ross (1993). Kathleen McKeown was appointed Fellow of AAAI(1994). Two faculty received the Alfred P. Sloan Research Award: Kenneth Ross (1994) and Steven Nowick (1995).(c) The research funds in the department (from NSF, ARPA, NY State, and several private foundations and industry)have grown in the past 5 years to $ 4.6 M/year, and the total number of Ph.D. students increased from 61 in 1991 to85 in 1995. These numbers reflect substantial growth in the department’s research activities.(d) Our faculty members and Ph.D. students have received outstanding paper awards at several prestigious conferences:Anton Philip Best Student Paper at the IEEE Conference on Robotics and Automation (Sacramento, 1991), BestPaper Award at the IEEE International Conference on Computer Design (1991), 20th Pattern Recognition Awardat the International Conference on Pattern Recognition (Jerusalem, 1994), Siemens Outstanding Paper Award at theIEEE Conference on Computer Vision and Pattern Recognition (Seattle, 1994), Best Industry Related Paper Awardat the IAPR International Conference on Pattern Recognition (Jerusalem, 1994), and the David Marr Prize at theInternational Conference on Computer Vision (Boston, 1995).

The CISE grant has been instrumental in supporting all the experimental research conducted in the department.The following is a summary of a few selected research accomplishments. All references cited here can be found in themain bibliography of the proposal.

Computer Graphics and User Interfaces: Prof. Steven Feiner’s group has pursued research in the exploratorydesign of new user interface metaphors. Research projects have included knowledge-based generation of 3D graphicsand virtual worlds in domains ranging from maintenance documentation [Seligmann and Feiner-1991] to multivariatevisualization [Beshers and Feiner-1993], novel approaches to demonstrational programming [Kurlander and Feiner-1992], and experimental infrastructure for and applications of augmented reality using see-through head-worn displays[Feiner and Shamash-1991, Feiner et al.-1993b, Feiner et al.-1993a, Feiner et al.-1995].

Physics Based Computational Vision: Prof. Shree Nayar’s group has developed a series of physics based models,sensors, and algorithms for computational vision. Recent accomplishments include the generalization of Lambert’slaw for diffuse reflectance [Oren and Nayar-1995] [Oren and Nayar-1994] [Nayar and Oren-1995], a video-rate 3Dcamera that produces 512x480 depth estimates at 30 Hz [Nayar et al.-1995], and a software library for appearancematching (SLAM) that has been licensed to 35 research institutions [Murase and Nayar-1995], [Nene et al.-1994].Papers authored by Prof. Nayar’s group have received outstanding paper awards at the following conferences: ICCV(Osaka, 1990), IEEE CVPR (Seattle, 1994), ICPR (Jerusalem, 1994), and ICCV (Boston, 1995).

Middle-Level and High-Level Vision: Prof. John Kender’s group has produced three vision systems. The firstformalized the theory of topological navigation [Park and Kender-1993], and demonstrated in real time a novel systemfor giving navigational directions without the use of either absolute coordinates or numeric quantities [Park andKender-1996]. The second system took real images and produced English language output that described objects’physical location in terms of relationships to landmarks, working equally robustly in the two very different domains ofmedical radiograph interpretation [Abella et al.-1995], and theme park direction giving [Abella and Kender-1993]. Thefinal system takes visual input of human gestures in place of mouse input to drive a menu selection system [Kjeldsenand Kender-1995].

Robotics and Model Based Vision: Prof. Peter Allen’s research has focused on the integration of both touch andvision for robotic’s tasks. Some of the first work on haptic shape recovery [Allen and Michelman-1990] using a

81

dexterous robotic hand system with 22 degrees-of-freedom [Allen et al.-1990] was reported, and this work has beenextended to merge grasping with real-time vision processing, creating a unique hand-eye grasping system capable ofpicking up moving objects [Allen et al.-1993], and using uncalibrated vision [Yoshimi and Allen-1995] . More recentwork has tried to understand the constraints imposed using sensors, and developing algorithms that automaticallyplan sensor locations and parameters based upon task level constraints [Tarabanis et al.-1994, Tarabanis et al.-1995a,Tarabanis et al.-1995b, Abrams and Allen-1995, Reed et al.-1995].

Natural Language Generation: Prof. Kathleen McKeown’s research focuses on natural language generation,multimedia explanation, and statistical analysis of corpora to identify constraints on word usage. Recent results includesummarization [Robin and McKeown-1993, Robin and McKeown-1995], and automated documentation developedjointly with Bellcore [McKeown et al.-1995], all building on our widely distributed sentence generation tool, FUF[Elhadad-1991]. Research on multimedia explanation has examined coordination of generated text and graphics[Feiner and McKeown 91] while in the area of statistical analysis our research addresses learning of collocations[Smadja and McKeown-1990], of semantically related words [Hatzivassiloglou and McKeown-1993], and translationof collocations [Smadja et al.-1995].

Database Systems and Theory: Prof. Kenneth Ross and his researchers have proposed several solutions to significantsemantic problems in deductive databases, including the “well-founded semantics,” which has been widely acceptedwithin the deductive database community [Van Gelder et al.-1991]. A declarative language for object-orienteddatabases has been developed and forms the basis for the SWORD database language currently being implemented.Comprehensive techniques have been invented for adapting a materialized view when the view definition changes,using the old materialization to speed up the process. Other research describes various query optimization techniquesfor deductive databases, relational databases, and distributed databases. A patent has been filed for the “Jive-join”technique in [Ross and Li-1995, Ross-1995].

Parallel and Distributed Expert Database Systems: Prof. Sal Stolfo’s research group has developed data miningsystems that combine multiple predictors in inductive learning tasks for large distributed databases [Chan and Stolfo-1996]. Results on predictive load balancing for parallel database systems have been transferred for use in an internaldatabase application at a large financial institution [Dewan et al.-1994]. The Alexsys system (for mortgage backedsecurity trading) has been licensed to Thomson Financial Software, a Boston based company, that plans at least 200installations. The techniques embodied by the system for fast parallel deductive inference have been submitted forpatent protection [Stolfo et al.-1990].

Process-Centered Software Development Environments: Prof. Gail Kaiser’s research program has introducedthe concept of decentralized process (or “workflow”) modeling and enactment, supporting interoperability amongautonomously defined processes and geographical dispersion of multi-site collaborative work environments [Ben-Shaul and Kaiser-1995]. The Oz system has been used for software development, document authoring, teachingassistant support, and managed healthcare. Marvel, the single-process, single-site (per environment) predecessor ofOz, has been been licensed to over 40 institutions, including 12 companies. AT&T employs Marvel as a componentof its Provence and Improvise process monitoring systems.

Scalable Systems for Portable Computing: The research program of Prof. Steven Nowick has developed a computer-aided design (CAD) tool for portable self-timed digital systems. Several optimization algorithms were developed andincorporated into the tool: optimal state assignment [Fuhrer et al.-1995], synthesis for testability [Nowick et al.-1995], logic minimization [Theobald et al.-1995] and power estimation [Beerel et al.-1995]. This tool was used byHP/Stanford in the Stetson project to produce a low-power chip for infrared communications, and at AMD to design aSCSI controller. The package is now being incorporated into the internal IBM CAD system, Booledozer, and is usedin several other companies and universities.

Computing with Mobile Agents: The research of Prof. Yechiam Yemini and his group has led to the design and imple-mentation of the only language-independent agent technology presently known [Goldszmidt and Yemini-1995]. Theyhave particularly investigated the use of delegation agents technology to decentralize network management [Meyeret al.-1995, Goldszmidt and Yemini-1993, Goldszmidt-1993] Several companies have based key components of theirmanagement products and even strategy on the delegation technology (including Synoptics superagent technology,Ungermann-Bass, LANNet, and others).

82