14
Introduction: Explanation of the importance and relevance of the proposed work as well as a review of the relevant literature. How is it done today, and what are the limits of current practice? Across many science areas, relevant AI workflow information published across a multitude of conferences and journals remains difficult to find, access, and reuse. Consequently, tracking the abundance of tasks and datasets available for training AI models that are already available across thousands of publications becomes increasingly difficult. Currently, individual researchers must discover information about what datasets are available for training machine learning (ML) models in their area of interest, as well as which models have already been applied. Although there are existing community-driven efforts toward sharing datasets and comparing model performance on benchmark datasets [1, 2], these are manually curated and offer limited coverage of existing research (e.g., Papers With Code hosts 22,000 publications as of April 2020, representing just over 1% of ArXiv). Because of the exponential growth of AI applications across all subject areas, keeping these AI workflow information summaries updated requires increasingly more effort. Furthermore, existing efforts toward extracting AI workflow information from research literature [3, 25, 26, 27]: (1) are limited to the specific disciplines, (2) are limited to the specific information (e.g., tasks and metrics) being extracted, (3) typically focus on extraction from abstracts or specific sections, and (4) do not consider information shared in tables or figures. For example, consider the publication of Kurc et al. [4]. From the abstract alone, the authors state the following information: “…four deep learning-based image analysis methods… One method is a segmentation method… Three methods are classification methods developed to categorize adult diffuse glioma cases into oligodendroglioma and astrocytoma classes using radiographic and histologic image data. These methods achieved accuracy values of 0.75, 0.80, and 0.90, measured as the ratio of the number of correct classifications to the number of total cases, with the challenge datasets.” From this information alone, the reader can identify: (1) the tasks (i.e., image segmentation and image classification), (2) the data (i.e., radiographic and histologic image data), and (3) the accuracy metrics (i.e., the ratio of the number of correct classifications to the number of total cases) and measures (i.e., accuracy values of 0.75, 0.80, and 0.90). Furthermore, within the full content of the publication, the authors state, “The datasets for the 2018 CPM challenge were obtained from TCGA1 (Tomczak et al., 2015) and The Cancer Imaging Archive (TCIA2) (Clark et al., 2013; Prior et al., 2013) repositories, and the images had been scanned at the highest resolution.” This provides even more detailed information regarding the data and where to acquire them. Additionally, the authors state, “The method implements an application of the Mask-RCNN network (He et al., 2017) with a novel MASK non-maximum suppression (MASK-NMS) module, which can increase the robustness of the model.” This provides detailed information about the AI model that was used and where to learn more about it. Finally, within the full content of the publication, the authors state: “Moreover, rather than training the network end-to-end from the start, we initialized the model using weights from the pre-training on the MSCOCO dataset (Lin et al., 2014). We train the layers in multiple stages. We first train the network heads after they are randomly initialized. We later train the upper layers of the network. After this, we reduce the learning rate by a factor of 10 and train the entire network end to end. In our experiments, the training took 300 epochs using stochastic gradient descent with

PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Introduction: Explanation of the importance and relevance of the proposed work as well as a review of the relevant literature. How is it done today, and what are the limits of current practice? Across many science areas, relevant AI workflow information published across a multitude of conferences and journals remains difficult to find, access, and reuse. Consequently, tracking the abundance of tasks and datasets available for training AI models that are already available across thousands of publications becomes increasingly difficult. Currently, individual researchers must discover information about what datasets are available for training machine learning (ML) models in their area of interest, as well as which models have already been applied. Although there are existing community-driven efforts toward sharing datasets and comparing model performance on benchmark datasets [1, 2], these are manually curated and offer limited coverage of existing research (e.g., Papers With Code hosts 22,000 publications as of April 2020, representing just over 1% of ArXiv). Because of the exponential growth of AI applications across all subject areas, keeping these AI workflow information summaries updated requires increasingly more effort. Furthermore, existing efforts toward extracting AI workflow information from research literature [3, 25, 26, 27]: (1) are limited to the specific disciplines, (2) are limited to the specific information (e.g., tasks and metrics) being extracted, (3) typically focus on extraction from abstracts or specific sections, and (4) do not consider information shared in tables or figures. For example, consider the publication of Kurc et al. [4]. From the abstract alone, the authors state the following information:

“…four deep learning-based image analysis methods… One method is a segmentation method… Three methods are classification methods developed to categorize adult diffuse glioma cases into oligodendroglioma and astrocytoma classes using radiographic and histologic image data. These methods achieved accuracy values of 0.75, 0.80, and 0.90, measured as the ratio of the number of correct classifications to the number of total cases, with the challenge datasets.”

From this information alone, the reader can identify: (1) the tasks (i.e., image segmentation and image classification), (2) the data (i.e., radiographic and histologic image data), and (3) the accuracy metrics (i.e., the ratio of the number of correct classifications to the number of total cases) and measures (i.e., accuracy values of 0.75, 0.80, and 0.90). Furthermore, within the full content of the publication, the authors state, “The datasets for the 2018 CPM challenge were obtained from TCGA1 (Tomczak et al., 2015) and The Cancer Imaging Archive (TCIA2) (Clark et al., 2013; Prior et al., 2013) repositories, and the images had been scanned at the highest resolution.” This provides even more detailed information regarding the data and where to acquire them. Additionally, the authors state, “The method implements an application of the Mask-RCNN network (He et al., 2017) with a novel MASK non-maximum suppression (MASK-NMS) module, which can increase the robustness of the model.” This provides detailed information about the AI model that was used and where to learn more about it. Finally, within the full content of the publication, the authors state:

“Moreover, rather than training the network end-to-end from the start, we initialized the model using weights from the pre-training on the MSCOCO dataset (Lin et al., 2014). We train the layers in multiple stages. We first train the network heads after they are randomly initialized. We later train the upper layers of the network. After this, we reduce the learning rate by a factor of 10 and train the entire network end to end. In our experiments, the training took 300 epochs using stochastic gradient descent with

Page 2: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

momentum set to 0.9. During training and testing, input tissue images were cropped to 600 × 600.”

This provides detailed information regarding the training methodology. The information identified in the example is only a sample, and there are additional details in this publication relating to the data, AI model, task, training methodology, and accuracy metrics and measures. Domain scientists would need to read this publication in detail and manually extract this information. The challenge is that there are many more publications like this example for this science domain alone. Scaling out in terms of venues (e.g., multiple conferences and journals), time (e.g., multiple years), and science domains, there are thousands of publications that contain information regarding all aspects of an AI workflow, such as data, AI model, task, training methodology, and accuracy metrics and measures. No framework exists that automatically extracts and links AI workflow information from thousands of scientific publications across all science domains.

Page 3: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Proposed Research and Methods: Identify the hypotheses to be tested (if any) and details of the methods to be used including the integration of experiments with theoretical and computational research efforts. What's new in your approach and why do you think it will be successful? Who cares? If you're successful, what difference will it make? What are the risks and the payoffs? To alleviate this issue, we propose to develop a framework for summarizing AI workflow information by automatically extracting the mentions of and links to datasets, tasks, models, evaluation metrics, and results—as well as equations, chemical formulae, tables, and figures—from scientific literature. Combined with visual analytics, this framework will enable scientists to find, access, and reuse datasets and models that have been applied in their domain and compare their performance. Furthermore, this framework will advance our understanding of AI and provide new insights to help researchers with applications of AI techniques. In previous work [9–11], we demonstrated the feasibility of extracting specific information from scientific literature and the advantages of text visualization for revealing trends in documents and social media text streams. Although the goal is for the system to be applicable to any domain, we will focus on easing the access to dataset and model information in areas in which AI approaches are underexplored due to the difficulties of obtaining datasets or a lack of interest of the mainstream AI community. This proposal will deliver a scalable platform prototype that leverages state-of-the-art natural language processing, ML, and visualization methods. The platform will help researchers quickly sift through millions of publications, discover datasets and models in their science domain, compare results obtained on these datasets, and develop a holistic understanding of prior work, which will consequently help jumpstart the development of new AI applications in the US Department of Energy (DOE) Office of Science mission space. We expect this platform to hasten knowledge transfer, provide a common forum for comparing different solutions with shared problems, and facilitate discussions converging toward solutions to open problems in various disciplines. There are two main risks to our approach. First, information extraction models typically depend on labeled training data, which are unavailable in new applications. However, we have already published preliminary results indicating that accurate models can be trained to extract information from scientific publications using very little labeled training data. Second, although our aim is to develop domain-agnostic extraction models, domain-specific model fine-tuning might still be required to maximize performance. Our ideal goal is to develop a generalizable domain-agnostic system for extracting AI workflow information across all domains; however, developing a system that requires only minimal fine-tuning for a specific target domain would still have extremely high utility. Despite these risks, the payoff of our approach will significantly speed up access to AI workflow information for the entire scientific community, thus closing the gap between general AI research and scientific AI applications.

Page 4: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Timetable of Activities: Timeline for all major activities including milestones and deliverables. What are the midterm and final "exams" to check for success? Our midpoint metrics for success will be: (1) a preliminary set of extraction models for each AI workflow information type being targeted with example results for at least three application domains, which will help further inform the design of domain-agnostic extraction models; and (2) a visual interface prototype for querying and analyzing relationships between the data and models mentioned in scientific literature. As a final metric of success, our method will automatically generate a summary of all datasets, tasks, and models that have been applied within three select scientific areas based on all available literature within those areas. These summaries can then help identify challenge areas regarding scientific datasets and open areas of research on which the community can focus. Finally, we will demonstrate and publish our framework for applications relating to DOE’s Biological and Environmental Research, Basic Energy Sciences, High Energy Physics, and Energy Efficient Mobility Systems programs.

Milestone name/description End date Type

FY21 Q1 milestone 12/31/2020 Quarterly progress measure

FY21 Q2 milestone 3/31/2021 Quarterly progress measure

Midterm 3/31/2021 Quarterly progress measure

FY21 Q3 milestone 6/30/2021 Quarterly progress measure

FY21 Q4 deliverable 9/30/2021 Annual milestone

FY21 Q4 deliverable 9/30/2021 Quarterly progress measure

FY21 Annual Deliverable 9/30/2021 Annual milestone

Page 5: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Project Objectives: This section should provide a clear, concise statement of the specific objectives/aims of the proposed project. What are you trying to do? Articulate your objectives using absolutely no jargon. Our proposed framework for summarizing AI workflow information by automatically extracting the mentions of and links to datasets, tasks, models, and results from the scientific literature will enable scientists to find, access, and reuse datasets and models that have been applied and published in their domain and provide a holistic view of their performance. To accomplish this, our goal is to provide scientists with a summary of the results of ML experiments performed in their area of interest. We want this summary to enable comparisons of experimental results across different publications in terms of tasks, datasets, and evaluation metrics. This kind of summary is sometimes referred to as evidence synthesis, particularly in biomedical research. We aim to extract information that will allow scientists to (1) quickly identify well-performing models that have been applied in their area of interest, (2) quickly identify available datasets, and (3) obtain a holistic overview of the relationship between existing datasets and model performance. To this end, we propose to develop tools for extracting the following information from research publications:

● Datasets: information about which datasets were used to develop models. This could include the dataset name, citation reference, URL, or description.

● Tasks: a description of specific tasks being addressed in the paper. For example, “image segmentation,” such as in the case of the example from the introduction [4], or “temperature prediction,” such as in Fernandez et al. [5].

● Models: information about which models were used, along with any citation references to the original publications that have introduced these models. This could include model names, natural language descriptions, images or tables capturing model architecture, and other relevant information, such as the training method and hyperparameter optimization procedures.

● Evaluation metrics: List of metrics that were used to evaluate model performance (e.g., accuracy, as in the example from the introduction [4]).

● Performance: performance score corresponding to the evaluation metrics. This could include tables or figures.

This list is not exhaustive, and other relevant information could be identified during the project based on subject-specific needs and opportunities. For example, in some domains, it is becoming commonplace to provide URLs to the source codes used to produce the reported experiments. Where available, these URLs and other relevant information will also be extracted. An illustrative example is shown in Figure 1. Given the example publication [4], the goal is to extract a list of tasks that were described in the paper, along with the datasets, models used in each task, and evaluation metrics and reported performance.

Page 6: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Figure 1. An illustrative example of the process for extracting AI workflow information from a sample article.

To achieve results similar to those illustrated in Figure 1, we propose the framework shown in Figure 2 supported by the following objectives. This framework automatically finds and extracts information relevant to the AI workflow from unstructured natural language text in scientific publications and conveys the results in a structured and visually informative display. All data associated with this framework, except the full publication content, will be publicly available in a web-based interface. This framework will be achieved through the following objectives.

Page 7: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Figure 2. FAIR framework for mining AI workflow information from scientific publications.

Objective 1: Develop a collection system for retrieving publications, datasets, and results that are relevant to a given research area. We will create an information retrieval system that will enable scientists to identify publications most relevant to their area of interest. This system will leverage access to publication repositories, such as Web of Science (WoS) [12], Scopus [13], PubMed [14], CiteseerX [15], arXiv [16], bioRxiv [17], CORE [18], and the Microsoft Academic Graph [19]. Collectively, these repositories contain tens of millions of publications from the full range of science topic areas. This system will quickly mine these data repositories to identify publications with a high probability of AI-related content using natural language processing to scan the readily available metadata, such as the title, abstract, keywords, and venue. The system would also identify AI-related publications using citation analysis and tracking. Using open-source tools such as ElasticSearch [20], we will then create a narrowed repository to further process these selected publications using more computationally and time-intensive techniques that provide higher extraction accuracy. If available, the system will also retrieve the full publication content for extraction purposes. Because of copyright and access restrictions, the full content of the publications might not be available for this project to open source. However, the results of mining them would be made openly available, along with the corresponding link or DOI for a third party to access the full content, if needed. The work in this objective will include the following tasks.

Page 8: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Task 1.1: Develop a system for retrieving publication abstracts and full-text content from repositories. Attempting to retrieve all possible documents from each of our proposed sources (e.g., WoS, Scopus) would be a tremendous strain on computing, networking, and storage resources and would also take a significant amount of time, even when automated. Consequently, a more intelligent approach is needed that is similar to the system developed in Patton, Potok, and Worley [21], which won the Best Paper Award at that venue. Such a recommender system helps reduce the necessary resources. The work described in Patton, Potok, and Worley [21] used really simple syndication (RSS) feeds as data sources. This task will adapt and tune that work for sources containing scientific publications [12–19] and leverage readily available publication metadata (e.g., title, abstract, keywords, venue). Task 1.2: Create a method for identifying publications that applied ML methods to subject areas of interest. Performing simple keywords searches will not provide the level of filtering necessary for eliminating unwanted publications. For example, in May 2020, a topic search on the WoS [12] for “artificial intelligence” returned 47,271 results. Within these results, there was a publication titled “Profit driven decision trees for churn prediction,” which was published in the European Journal of Operational Research. Although this work applies ML, it does not apply the neural network ML approach of interest for this work and does not pertain to any DOE subject area of interest. To address this filtering challenge, this task will develop novel deep learning-based classification methods for classifying publications relevant appropriate to DOE subject areas of interest. We successfully created a system called Multinode Evolutionary Neural Networks for Deep Learning (MENNDL) [22] and have used this system, which won a 2018 R&D 100 Award, for nominations to the Association of Computing Machinery’s Gordon Bell Award in 2018 [23] and 2019 [24]. This system automatically creates and tunes neural networks for classification and prediction tasks. This task will leverage MENNDL to create classifiers to provide intelligent content filtering for use in Task 1.1. Task 1.3: Develop a system for converting scientific publications into a format usable by information extraction models. Because publications were historically intended for print and not machine consumption, the digital document formats used today (primarily pdf) are not directly usable by extraction models. Therefore, this task will develop a system for converting scientific documents stored in pdf and other formats into a structured representation usable by information extraction models. More specifically, the system output should be the plain text content of scientific articles with denoted sections, paragraphs, front-page metadata, and additional sections. Additionally, the system will extract figures and tables from the full content of the publications. We will leverage our existing capabilities on vision-based extraction [10]—as well as existing tools for extracting text [29], tables [30], and figures [31] from scientific articles—and improve upon these tools where necessary.

Page 9: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

Objective 2: Develop a system for extracting mentions of datasets, tasks, models, and results from scientific publications. Our goal for this objective is to create a capability for extracting AI workflow information from unstructured corpora of scientific literature without being domain-specific, a capability that could be used to compare results from different disciplines. Such a capability does not currently exist. Given the full text of a scientific article in pdf or other formats, the objective is to create a process for analyzing the entire publication, including any tables and figures, and identify and extract AI workflow information (e.g., datasets, tasks, models, metrics, performance). In information extraction, many works have studied extraction from scientific literature, including the extraction of tasks, processes, and materials [25, 26]; tasks, datasets, metrics, and performance scores [3]; and datasets, experiments, and results [27]. These previous works typically work on abstracts or specific sections, do not consider information shared in tables or figures, and are domain-specific. In contrast to the previous work, our goal for this task is to create a system that will: (1) enable the extraction of all relevant information, including tables and figures, where appropriate; (2) be capable of processing entire full-text scientific articles, where available, but will work with abstracts where full-text content cannot be obtained; and (3) work across disciplines and thus enable comparisons of results from different disciplines. Based on our own prior experience with materials science research, biomedical research, and ML research, we believe these models can be multidisciplinary since AI workflow information is described in a similar way across disciplines (i.e., there are only so many ways one can report model names, training processes, and performance metrics). The work in this objective will include the following tasks. Task 2.1: Define all information to be extracted and collect necessary datasets for training extraction models. Because any work in information extraction depends on labeled training data, our first goal is to identify existing datasets and label new data where existing datasets are unavailable or are limited to a single discipline. First, we will manually curate several examples from several disciplines to capture as many different cases as possible. For example, considering publications such as Kurc et al. [4] and Fernandez et al. [5]—which represent two disciplines (biomedical research and nuclear research)—we will manually annotate these publications to understand the different methods of referencing each type of information being extracted (e.g., URLs, citation references, in-text mentions for datasets). The goal of this task is to understand where existing datasets, such as Augenstein et al. [25] and Choi et al. [27], can be applied and in which cases new datasets must be labeled. We expect the extraction of information about which models were applied to require labeling new data, and to the best of our knowledge, no previously labeled datasets have been released in this area. As the final step, we will create new labeled datasets as needed while leveraging methods, such as active learning [28], to speed up annotation. Task 2.2: Develop models for extracting information regarding datasets, tasks, models, metrics, and performance. Using datasets from Task 2.1, we will develop a collection of extraction models for extracting datasets, tasks, models, metrics, and performance scores from the scientific articles as illustrated in Figure 1. We will leverage our existing work on similarity-based extraction [9], vision-based extraction [10], and character-based extraction models [11], along with existing semi-supervised approaches [32] and transformer-based models [33], which have demonstrated promising results

Page 10: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

on related tasks [32, 33, 34]. We will also extract all necessary information to form various graph structures (e.g., authorship, citation, semantics) that are inherently embedded in the data. Objective 3: Develop visual analytics for exploring and understanding relationships between the data and models mentioned in scientific literature. We will develop a web-based visual analytics interface that enables users to interpret the output of the automated algorithms developed under Objectives 1 and 2. This interface will leverage decades of text visualization research [6–8] to provide visual representations of complex graph-based associations between literature and data models. This interface will provide new intuitive visualizations that show collaboration networks between documents, particularly those related entities (e.g., topics, domains, institutions) found in the source corpus. Rather than simply showing force-direct network diagrams, which are prone to interpretation issues, these visualizations will use the underlying graph (e.g., authorship, citation, semantic graphs) to drive intuitive charts (e.g., bar charts, line graphs, multivariate views) that are perceptually optimal based on previous human subject tests in the data visualization domain. The motivation for using intuitive graphics stems from the visual analytics philosophy of system development whereby multiple linked views and interactivity that are easily decoded are preferred over complicated visualizations that are difficult to read. A key feature in this aim will be to integrate a level of detail technique into these visualizations that begin with a high-level overview and allow users to gradually increase the granularity of information displayed as they “drill down” into the data. These visualizations will be highly interactive and allow users to find, access, and reuse various levels of detail on demand using direct (i.e., accessed from within the visualization) and indirect (i.e., accessed from linked user interface widgets) techniques. Interactions will link multiple data views to allow users to explore new hypotheses and interpret automated analytical results. The work in this objective will include the following tasks. Task 3.1: Develop visual representations of networks and summary information derived in Objectives 1 and 2. We will develop multiscaled visual representations of the graphics and other summarized information produced in Objectives 1 and 2. This task will involve mapping the data to graphical encodings to generate informative displays that reveal key information about the literature under investigation. The focus will be on developing intuitive graphics that users can understand effectively. Specific encodings for multivariate information (e.g., model hyperparameters) and network connections (e.g., author, citation networks) will be employed and connected together in coordinated views. These visualizations will also support drill-down/roll-up analysis whereby the views adapt the granularity of the information shown through clustering and statistical summaries. These views will be developed as web-based techniques and deployed as a web application. Task 3.2: Develop interactive techniques to query, filter, and explore the data visualizations. The data visualizations developed in Task 3.1 will be interactive to support dynamic data exploration, queries, and filtering. These interaction techniques will be directly embedded in the visualizations (e.g., clicking and pointing directly in the visualization images) and indirectly connected to user interface components (e.g., sliders, menus, dialogs). In addition to allowing users to ask questions of the data and prove the resulting literature, these interactions will link multiple views of the data so that a chance in one view is propagated to other views.

Page 11: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

The proposed visual analytics from this task combined with the information extracted from objectives 1 and 2 will leverage the vast amount of published work that already exists and provide the necessary information to advance AI understanding, provide new insights with applications of AI techniques for different science domains, and provide an environment in which novel approaches to AI can be explored.

Page 12: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

APPENDIX 3: REFERENCES [1] “Papers With Code,” [Online]. Available: https://paperswithcode.com/. [Accessed 3 May

2020]. [2] “Benchmarking Every Open Source Model,” [Online]. Available:

https://www.sotabench.com/. [Accessed 3 May 2020]. [3] Y. Hou, C. Jochim, M. Gleize, F. Bonin, and D. Ganguly. “Identification of Tasks, Datasets,

Evaluation Metrics, and Numeric Scores for Scientific Leaderboards Construction,” in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 2019.

[4] T. Kurc, S. Bakas, X. Ren, A. Bagari, A. Momeni, Y. Huang, L. Zhang, A. Kumar, M. Thibault, Q. Qi, Q. Wang, A. Kori, O. Gevaert, Y. Zhang, D. Shen, M. Khened, X. Ding, and Kr. “Segmentation and Classification in Digital Pathology for Glioma Research: Challenges and Deep Learning Approaches,” Frontiers in Neuroscience, vol. 14, no. 27, 2020.

[5] M. G. Fernandez, A. Tokuhiro, K. Welter, and Q. Wu. “Nuclear Energy System’s Behavior and Decision Making Using Machine Learning,” Nuclear Engineering and Design, vol. 324, no. 1, pp. 27–34, 2017.

[6] C. Steed, C. Symons, F. DeNap, and T. Potok. “Guided Text Analysis Using Adaptive Visual Analytics.” Proceedings of the Visualization and Data Analysis Conference, pp. 61–74, January 2012.

[7] C. Steed, M. Drouhard, J. Beaver, J. Pyle, and P. Bogen II. “Matisse: A Visual Analytics System for Exploring Emotion Trends in Social Media Text Streams,” in Proceedings of the IEEE International Conference on Big Data (IEEE Big Data 2015), pp. 807–814, October 2015.

[8] M. Elmore, J. Reed, T. Potok, and R. Patton. (2005). “Real-Time Document Cluster Analysis for Dynamic Data Sets.” Available: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.109.1626&rep=rep1&type=pdf [Accessed May 2020]

[9] D. Herrmannova, S. Young, R. Patton, C. Stahl, N. Kleinstreuer, and M. Wolfe. “Unsupervised Identification of Study Descriptors in Toxicology Research: An Experimental Study,” in Proceedings of the Ninth International Workshop on Health Text Mining and Information Analysis, pp. 71–82. 2018.

[10] C. Stahl, S. Young, D. Herrmannova, R. Patton, and J. Wells. “DeepPDF: A Deep Learning Approach to Extracting Text from PDFs,” 7th International Workshop on Mining Scientific Publications, Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan, 2018.

[11] S. Young, A. Maksov, M. Ziatdinov, Y. Cao, M. Burch, J. Balachandran, L. Li, et al. “Data Mining for Better Material Synthesis: The Case of Pulsed Laser Deposition of Complex Oxides,” Journal of Applied Physics 123, no. 11 (2018): 115303.

[12] “Web of Science,” [Online]. Available: https://www.webofknowledge.com/ [Accessed 3 May 2020].

[13] “Scopus,” [Online]. Available: https://www.scopus.com/ [Accessed 3 May 2020]. [14] “PubMed,” [Online]. Available: https://www.ncbi.nlm.nih.gov/pubmed/ [Accessed 3 May

2020]. [15] “CiteseerX,” [Online]. Available: https://citeseerx.ist.psu.edu/ [Accessed 3 May 2020]. [16] “arXiv,” [Online]. Available: https://arxiv.org/ [Accessed 3 May 2020]. [17] “bioRxiv,” [Online]. Available: https://www.biorxiv.org/ [Accessed 3 May 2020].

Page 13: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

[18] “CORE,” [Online]. Available: https://core.ac.uk/ [Accessed 3 May 2020]. [19] “Microsoft Academic Graph,” [Online]. Available: https://www.microsoft.com/en-

us/research/project/microsoft-academic-graph/ [Accessed 3 May 2020]. [20] “ElasticSearch,” [Online]. Available: https://www.elastic.co/ [Accessed 3 May 2020]. [21] R. Patton, T. Potok, and B. Worley. “Discovery and Refinement of Scientific Information

via a Recommender System,” in Proceedings of the Second International Conference on Advanced Communications and Computation (INFOCOMP 2012), Venice, Italy, 2012.

[22] S. R. Young, D. C. Rose, T. Johnston, W. T. Heller, T. P. Karnowski, T. E. Potok, R. M. Patton, G. Perdue, and J. Miller. “Evolving Deep Networks Using HPC,” in Proceedings of the Machine Learning on HPC Environments. ACM, 2017.

[23] R. M. Patton, J. T. Johnston, S. R. Young, C. D. Schuman, D. D. March, T. E. Potok, D. C. Rose, S. H. Lim, T. P., Karnowski, M. A. Ziatdinov, and S. V. Kalinin. “167-PFlops Deep Learning for Electron Microscopy: From Learning Physics to Atomic Manipulation,” in Proceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis (SC’18), IEEE, 2018.

[24] R. M. Patton, J. T. Johnston, S. R. Young, C. D. Schuman, T. E. Potok, D. C. Rose, S.-H. Lim, J. Chae, L. Hou, S. Abousamra, D. Samaras, and J. Saltz. “Exascale Deep Learning to Accelerate Cancer Research,” 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, California, 2019

[25] I. Augenstein, M. Das, S. Riedel, L. Vikraman, and A. McCallum. “SemEval 2017 Task 10: ScienceIE - Extracting Keyphrases and Relations from Scientific Publications,” in Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017). 2017.

[26] Y. Luan, L. He, M. Ostendorf, and H. Hajishirzi. “Multi-Task Identification of Entities, Relations, and Coreference for Scientific Knowledge Graph Construction,” in Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018.

[27] E. Choi, M. Horvat, J. May, K. Knight, and D. Marcu. “Extracting Structured Scholarly Information from the Machine Translation Literature,” in Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16). 2016.

[28] M. Lu, S. Bangalore, G. Cormode, M. Hadjieleftheriou, and D. Srivastava. “A Dataset Search Engine for the Research Document Corpus,” 2012 IEEE 28th International Conference on Data Engineering. 2012.

[29] P. Lopez. “GROBID: Combining Automatic Bibliographic Data Recognition and Term Extraction for Scholarship Publications,” Research and Advanced Technology for Digital Libraries. 2009.

[30] N. Siegel, N. Lourie, R. Power, and W. Ammar. “Extracting Scientific Figures with Distantly Supervised Neural Networks,” in Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries. 2018.

[31] C. A. Clark and S. Divvala. “Looking Beyond Text: Extracting Figures, Tables and Captions from Computer Science Papers,” workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.

[32] S. Mishra and J. Diesner. “Semi-Supervised Named Entity Recognition in Noisy-Text,” in Proceedings of the 2nd Workshop on Noisy User-Generated Text (WNUT), pp. 203–212. 2016.

Page 14: PGD TYS - UTKweb.eecs.utk.edu › ~jplank › work › x1.pdf · 1650 TYS 30 MARCH 2018 PGD TYS PUNTA GORDA, FL KNOXVILLE, TN Punta Gorda Airport McGhee Tyson Airport BOARDING DOORS

[33] K. Hakala and S. Pyysalo. “Biomedical Named Entity Recognition with Multilingual BERT,” in Proceedings of the 5th Workshop on BioNLP Open Shared Tasks, pp. 56–61. 2019.

[34] V. Yadav and S. Bethard. “A Survey on Recent Advances in Named Entity Recognition from Deep Learning Models,” in Proceedings of the 27th International Conference on Computational Linguistics, pp. 2,145–2,158. 2018.