2
Intent Parser: a tool for codifying experiment design Tramy Nguyen 1 , Nicholas Walczak 1 , Jacob Beal 1 , Daniel Sumorok 1 , Mark Weston 2 1 Raytheon BBN Technologies, 2 Netrias, Cambridge MA [email protected],[email protected] 1 INTRODUCTION Many biological experiments are described in text docu- ments, such as laboratory notebooks, capturing information such as the purpose, execution, and results of an experiment. In such descriptions, however, authors typically present in- formation in a highly personal and idiosyncratic manner, at varying levels of detail and often omitting critical informa- tion. Consequently, this lack of consistency lead to a variety of issues commonly encountered when attempting to com- pare experimental reports created by dierent authors or to build upon those results in new work. Humans can some- times infer sucient information to interpret such informal documentation of experiment designs, but this is typically an ad-hoc, challenging, and error-prone process, not partic- ularly susceptible to automation. At the same time, precise and unambiguous specications of both elements and their combinations can be expressed in machine-readable repre- sentations such as SBOL [2], but making use of these tools is dicult for many investigators. A “middle-ground” approach combining both accessibility and representational precision, however, has been known at least as far back as Winograd’s SHRDLU system [4], using machine feedback and prompting to shape human input into a semi-structured form that can be readily interpreted and checked by machines. We have applied this approach to de- velop Intent Parser, a tool that combines a word-processing interface with structured tables and assisted linking to def- initions to provide a simple interface for incremental cod- ication of experiment designs. Use of this tool can help synthetic biology collaborations by reducing the time and skills required to produce precise experiment designs, en- abling automatic checking for errors and ambiguities, and simplifying interpretation of experimental data. 2 INTENT PARSER ARCHITECTURE Fundamentally, the Intent Parser acts as a link between a user-friendly document editor (in this case Google Docs) and a repository of formal denitions (in this case SynBio- Hub [3]). By linking these and adhering to certain document Supported by AFRL and DARPA under contract FA875017CO184. This doc- ument does not contain technology or technical data controlled under either U.S. International Trac in Arms Regulation or U.S. Export Administration Regulations. Any opinions, ndings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reect the views of AFRL and DARPA. Google Docs Intent Parser Server Intent Parser Add-On Google Sheets SBOL Project Dictionary Experiment Description User Requests DB Queries & Updates Document Updates Figure 1: Architecture of Intent Parser: a Google Docs add- on sends user requests to a “back-end” server that inter- acts with databases of denitions (SynBioHub) and termi- nology (SBOL Project Dictionary). Results return to the add- on, which oers the user choices and alters the document. conventions, the tool allows users to generate unambiguous machine-readable descriptions of experiment design. In our implementation, we have chosen to use Google Docs for the editor, SynBioHub [3] as the repository (in combination with the SBOL Project Dictionary interface [1]), and to output experiment design specications in JSON. Intent Parser is implemented as an “add-on” to Google Docs using the Google Docs API to implement the architec- ture shown in Figure 1. Once installed, this add-on provides a le menu of operations that can be performed on any Google Doc. Users conceiving of an experiment write up an experiment description in a Google Doc and invoke requests around two workows: 1) grounding document text with links to denitions in SynBioHub [3], and 2) dening, vali- dating, and exporting experiment requests that make use of those denitions. These requests can be made at any time, supporting an incremental and collaborative approach to experiment design. When the user makes a request in the add-on, an HTTP request is sent to the Intent Parser Server, which then parses the document and returns an HTTP response with the result back to the add-on. The server is a backend that carries out requests by interacting with SynBioHub [3], which provides 66

Intent Parser: a tool for codifying experiment designIntent Parser: a tool for codifying experiment design Tramy Nguyen 1, Nicholas Walczak , Jacob Beal , Daniel Sumorok1, Mark Weston2∗

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Intent Parser: a tool for codifying experiment designIntent Parser: a tool for codifying experiment design Tramy Nguyen 1, Nicholas Walczak , Jacob Beal , Daniel Sumorok1, Mark Weston2∗

Intent Parser: a tool for codifying experiment designTramy Nguyen1, Nicholas Walczak1, Jacob Beal1, Daniel Sumorok1, Mark Weston2∗

1Raytheon BBN Technologies, 2Netrias, Cambridge [email protected],[email protected]

1 INTRODUCTIONMany biological experiments are described in text docu-ments, such as laboratory notebooks, capturing informationsuch as the purpose, execution, and results of an experiment.In such descriptions, however, authors typically present in-formation in a highly personal and idiosyncratic manner, atvarying levels of detail and often omitting critical informa-tion. Consequently, this lack of consistency lead to a varietyof issues commonly encountered when attempting to com-pare experimental reports created by di�erent authors or tobuild upon those results in new work. Humans can some-times infer su�cient information to interpret such informaldocumentation of experiment designs, but this is typicallyan ad-hoc, challenging, and error-prone process, not partic-ularly susceptible to automation. At the same time, preciseand unambiguous speci�cations of both elements and theircombinations can be expressed in machine-readable repre-sentations such as SBOL [2], but making use of these toolsis di�cult for many investigators.

A “middle-ground” approach combining both accessibilityand representational precision, however, has been known atleast as far back as Winograd’s SHRDLU system [4], usingmachine feedback and prompting to shape human input intoa semi-structured form that can be readily interpreted andchecked by machines. We have applied this approach to de-velop Intent Parser, a tool that combines a word-processinginterface with structured tables and assisted linking to def-initions to provide a simple interface for incremental cod-i�cation of experiment designs. Use of this tool can helpsynthetic biology collaborations by reducing the time andskills required to produce precise experiment designs, en-abling automatic checking for errors and ambiguities, andsimplifying interpretation of experimental data.

2 INTENT PARSER ARCHITECTUREFundamentally, the Intent Parser acts as a link between auser-friendly document editor (in this case Google Docs)and a repository of formal de�nitions (in this case SynBio-Hub [3]). By linking these and adhering to certain document

∗Supported by AFRL and DARPA under contract FA875017CO184. This doc-ument does not contain technology or technical data controlled under eitherU.S. International Tra�c in Arms Regulation or U.S. Export AdministrationRegulations. Any opinions, �ndings and conclusions or recommendationsexpressed in this material are those of the author(s) and do not necessarilyre�ect the views of AFRL and DARPA.

Google Docs

Intent Parser Server

Intent Parser Add-On

Google Sheets

SBOL Project Dictionary

Experiment Description

UserRequests

DB Queries & Updates

DocumentUpdates

Figure 1: Architecture of Intent Parser: a Google Docs add-on sends user requests to a “back-end” server that inter-acts with databases of de�nitions (SynBioHub) and termi-nology (SBOL Project Dictionary). Results return to the add-on, which o�ers the user choices and alters the document.

conventions, the tool allows users to generate unambiguousmachine-readable descriptions of experiment design. In ourimplementation, we have chosen to use Google Docs for theeditor, SynBioHub [3] as the repository (in combination withthe SBOL Project Dictionary interface [1]), and to outputexperiment design speci�cations in JSON.Intent Parser is implemented as an “add-on” to Google

Docs using the Google Docs API to implement the architec-ture shown in Figure 1. Once installed, this add-on providesa �le menu of operations that can be performed on anyGoogle Doc. Users conceiving of an experiment write up anexperiment description in a Google Doc and invoke requestsaround two work�ows: 1) grounding document text withlinks to de�nitions in SynBioHub [3], and 2) de�ning, vali-dating, and exporting experiment requests that make use ofthose de�nitions. These requests can be made at any time,supporting an incremental and collaborative approach toexperiment design.When the user makes a request in the add-on, an HTTP

request is sent to the Intent Parser Server, which then parsesthe document and returns an HTTP response with the resultback to the add-on. The server is a backend that carries outrequests by interacting with SynBioHub [3], which provides

66

Page 2: Intent Parser: a tool for codifying experiment designIntent Parser: a tool for codifying experiment design Tramy Nguyen 1, Nicholas Walczak , Jacob Beal , Daniel Sumorok1, Mark Weston2∗

IWBDA 2020, August 2020, Online Nguyen, Walczak, et al.

Figure 2: Screenshot of Intent Parser in action on a document from the DARPA SD2 program, showing a measurement tablewith reagents linked to de�nitions in SynBioHub [3]. The navigation panel on the right suggests links to add, in this case a linkfor the term “Glucose” (document location not shown), providing both a best guess and a number of potential alternatives.

information about referenced elements in SBOL format [2],and with the SBOL Project Dictionary [1], which provides aspreadsheet interface that tracks shorthand and lab-speci�cnames. Figure 2 shows an example of linking informationfound on SynBioHub to names and terminologies on theGoogle Doc.Experiment designs are based on tables following a spe-

ci�c format, for which templates can be generated from anadd-on menu item. In this abstract, these tables are referredas Intent Parser tables. Validation and export requests aresent to the Intent Parser Server to validate the contents ofIntent Parser tables parsed from the Google Doc. Validationfollows a prede�ned data schema that checks for the requiredinformation, using SynBioHub and the SBOL Project Dictio-nary to validate that all terms extracted from the table areproperly grounded. From these, the server generates bothreports on validity and JSON representations of experimentdesign.

3 CASE STUDY: DARPA SD2 PROGRAMThe DARPA Synergistic Discovery and Design (SD2) pro-gram aims to accelerate scienti�c discovery by machine-assisted integration of experimental design, build, test, andlearning, and is testing these aims via a collaboration of over100 researchers across over a dozen organizations. In SD2,Intent Parser is used by both data scientists and experimen-talists to de�ne and request experiments via Google Docs.Stakeholders including data scientists, subject matter ex-

perts, and experimental labs were consulted to help deter-mine a format that was su�ciently general to specify exper-iment designs spanning across multiple challenge problems,protocols, laboratories, and experiment designs. The infor-mation recorded in these experiment requests includes thename of the lab to execute the experiment, which measure-ments are to be taken and at what time points, amounts ofreagents, strains, and media to be used in each sample, aswell as experimental conditions and parameters such as cul-turing temperature. As users describe the experiment, they

also check its validity and required number of samples withIntent Parser. Finally, when when the experiment design isvalidated and all parties are satis�ed, the users can requestthat the experiment be executed.The generality of the approach is demonstrated by the

breadth of usage in this program: during a four month pe-riod, 19 SD2 users from various organizations generated 34experimental requests in multiple di�erent areas of investi-gation, resulting in a total of 16,876 experimental samplesexecuted using three di�erent protocols and collecting datafrom a variety of instruments. Because these experiments aregenerated systematically with grounded de�nitions, meta-data assignment and analysis has been greatly simpli�ed andaccelerated, helping enable faster analysis and more e�ectivesharing of data and analyses across the SD2 program.

4 CONTRIBUTIONSIntent Parser provides a user-friendly process for describ-ing experiments, grounding narrative design descriptions inlinks to a SynBioHub repository, and generating and validat-ing speci�cations for wet-lab experiments. The positive ex-periences of users in the SD2 program suggest this approachhas value, and should continue to be elaborated. Potentialfuture directions include improving integration and UI, in-creasing scope of descriptions, and using natural languageprocessing to extract additional semantic content from prose.This tool is actively developed at SD2E’s GitHub reposi-

tory https://github.com/SD2E/experimental-intent-parser.

REFERENCES[1] B���, J., �� ��. Collaborative terminology: SBOL project dictionary.

Submitted to IWBDA 2020.[2] C��, R. S., �� ��. Synthetic Biology Open Language (SBOL) version 2.2.

0. Journal of integrative bioinformatics 15, 1 (2018).[3] M�L�������, J. A., �� ��. SynBioHub: A standards-enabled design

repository for synthetic biology. ACS Synthetic Biology 7, 2 (2018),682–688. PMID: 29316788.

[4] W�������, T. Procedures as a representation for data in a computerprogram for understanding natural language. Tech. rep., MassachusettsInstitute of Technology, Project MAC, 1971.

67