29
University of Groningen Genomics-based discovery and engineering of biocatalysts for conversion of amines Heberling, Matthew Michael IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2017 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Heberling, M. M. (2017). Genomics-based discovery and engineering of biocatalysts for conversion of amines. [Groningen]: University of Groningen. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 08-08-2020

University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

University of Groningen

Genomics-based discovery and engineering of biocatalysts for conversion of aminesHeberling, Matthew Michael

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite fromit. Please check the document version below.

Document VersionPublisher's PDF, also known as Version of record

Publication date:2017

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):Heberling, M. M. (2017). Genomics-based discovery and engineering of biocatalysts for conversion ofamines. [Groningen]: University of Groningen.

CopyrightOther than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of theauthor(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policyIf you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediatelyand investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons thenumber of authors shown on this cover page is limited to 10 maximum.

Download date: 08-08-2020

Page 2: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 9PDF page: 9PDF page: 9PDF page: 9

9

1Introduction

Matthew M. Heberling1 and Dick B. Janssen2

1 Peptone – The Protein Intelligence Company, Hullenbergweg 280, 1101BV Amsterdam, The Netherlands2 Biotransformation and Biocatalysis, Groningen Biomolecular Sciences and Biotechnology Institute, University of Groningen, Nijenborgh 4, 9747 AG Groningen, The Netherlands

Page 3: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 10PDF page: 10PDF page: 10PDF page: 10

10 | Chapter 1

Enzyme biocatalysis and its path to industrial stardom

Abstract

Research in industrial biotechnology has delivered impressive biocatalysts as pivotal green chemistry solutions for the pharma, energy, and chemical commodities sectors. Next-generation DNA sequencing technologies enriched enzyme discovery methods with vast genomics data from microbial communities and single genomes. This review summarizes discovery methods like soil enrichments, metagenomics, and bioinformatics mining with a general focus on transaminases for their industrial prowess. Improvements enabling industrial enzymatic applications through protein engineering and directed evolution are briefly highlighted. The causes of a costly imbalance between collecting genomics data and knowledge generation are also discussed. Once this imbalance is overcome, we will be on the verge of unmasking exquisite commercial potential of sequence databases and go beyond the proverbial ‘tip of the iceberg’ for biocatalysts.

Page 4: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 11PDF page: 11PDF page: 11PDF page: 11

11Introduction |

11 | Introduction

As the world population continues to increase alongside an ever-increasing demand for natural resources and production goods to sustain Humankind’s way of living, the environmental footprint of humans on Earth has become a global concern [1]. Suggestions and solutions exist that address many environmental issues [2]. Unfortunately, putting these solutions into action at the industrial level usually requires relentless governmental lobbying with cohesive industrial support, which is most often dictated and hindered by capitalism [3]. Regardless, many governments and industries have taken notice of our environmental dilemmas and have begun to enforce eco-friendly routes toward the production of consumer goods, drug therapies, and alternative fuels. Consequently, industries are slowly implementing sustainable or ‘green’ processes that still meet market demands, while concurrently saving energy and reducing, or even preventing, hazardous waste. Although in mere infancy as far as its full economic potential, biocatalysis has proved to be an effective tool for implementing sustainable/greener processes in the production of certain textiles, foods, and chemicals [4,5]. In recent decades, biological research has made tremendous contributions to the discovery of novel biocatalysts that are capable of unconventional chemical conversions and waste reductions [6-8]. These conversions include the degradation of toxic chemical waste and synthesis of exceptionally pure compounds to be used as pharmaceuticals. The source of such novel biocatalysts and microorganisms can be found in our natural environment. Although a single unperturbed soil sample may contain up to 104 different microbial species, only ~1% of these microorganisms are cultivable [9,10]. Fortunately, sophisticated culture-independent technologies are available that tap into the natural biodiversity (metagenome) for the discovery of new biocatalysts. However, exploiting the metagenome to bring biocatalysts to the market requires keen collaborations between many scientific disciplines and industrial partners. An emerging powerful tool that complements and can even replace metagenome-based discovery is genomics-based research that has rapidly developed into a field of its own. The eminent role of genomics in the discovery of new genes is due to not only the advent of high-throughput DNA sequencing technologies, but also to the impressive computer algorithms that underlie the bioinformatics tools, enabling massive processing of an already inundated genome sequence data bank. However, we are far from annotating most unidentified genes in public databases because of a major disparity between bioinformatics and genomics. This disparity is simply explained by the fact that biochemists cannot experimentally verify all gene functions (functional genomics), especially if the gene product has a novel function and/or amino acid sequence that cannot be implicated in any pathway. Nonetheless, the combination of genomics and bioinformatics continually proves to be a useful and efficient approach to discover genes encoding novel enzyme biocatalysts.

Page 5: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 12PDF page: 12PDF page: 12PDF page: 12

12 | Chapter 1

2 | Biocatalysis

Enzymes and microorganisms have been used for centuries as biocatalysts to generate food products and in the manufacturing of commodities, such as linen and leather. Until the end of the 19th century, the use of the biocatalysts had only limited applications since they could not be used in any well-characterized or pure form. Owing to the rapid evolution of fermentation processes and biotechnology after the late 19th century, the industrial prevalence of biocatalysis has increased immensely. Regardless of whether the isolated enzymes or whole cells are used, the catalytic properties of the biocatalysts often make them far superior to the traditional chemical catalysts. For example, biocatalysts can not only produce orders of magnitude higher reaction rate enhancements compared with chemical catalysts; biocatalysts also operate efficiently under mild conditions, have a high reaction specificity to give exceptional product purity, they are biodegradable, and they can be regenerated with a relatively benign impact on the environment. Such defining properties of biocatalysts translate into less chemical waste and energy input, higher quality of products, as well as more economical, safer, and eco-friendly production routes [4,6,11]. These attributes have propelled biocatalysts as the most obvious and promising solution for relevant, sustainable processes. This has brought them at the forefront of industrial biotechnology, or white biotechnology, driving initiatives that aim to diminish the environmental footprint of industries. For instance, in less than a decade after pioneer discoveries in enzyme optimization technologies [6], biocatalysis has emerged as a standard technology in the fine chemicals industry with 134 applications by 2002 [12]. Since then, further technological advances put biocatalytic applications in pharmaceutical manufacturing on the rise [6,7].

3 | White biotechnology

Biotechnology has found its application in various sectors, such as in the fields of medicine (red biotechnology) and agriculture (green biotechnology). White biotechnology (WB) is regarded as the third wave of biotechnology that applies modern biotechnology, using enzymes and/or living cells, to sustainably produce materials, chemicals, and fuels [13]. WB was coined by the European Association for Bioindustries (EuropaBio) in a 2003 report, which provides case studies of pioneering companies in WB in order to demonstrate its potential as a key to sustainable production processes [14]. However, implementing sustainable processes like industrial biotechnology is a far cry from meeting its full global market potential due to the lack of governmental mandates. This notion is typified in an interview by Stephan Herrera with the vice president of IDC Life Sciences (Framingham, MA/U.S.A.), Jim Golden, who comments on implementing industrial biotechnology:

There is no question that from a scientific standpoint, industrial biotech has a great story to tell, … But the commercial issues are the killers. The world’s industrial complex wants to change, needs to change, but sees change as very expensive and risky. I don’t see industry investing in industrial biotechnology until government regulations force them to [3].

Page 6: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 13PDF page: 13PDF page: 13PDF page: 13

13Introduction |

1Golden painted an accurate picture of the WB landscape in developed nations. This is not to say that societal support does not exist in leading industrialized nations to implement WB, but there are major obstacles and differences in the approach – for instance, between the USA and the EU [15]. The US implemented a top-down approach, initiated by the federal government in 2000, to promote WB practices by adopting the Biomass Research and Development Act of 2000 [16]. As energy was declared a national security issue, the Act is premised on R&D for alternative energy from biomass feedstocks to relieve their dependence on foreign petroleum. The ensuing multi-agency advisory committee generated a Vision plan that challenged industry and policy makers to create and procure growth for bio-based products in various areas; such as feedstock, inexpensive sugars, conversion of biomass into resourceful building blocks, and integrated biorefineries. Despite the shale oil boom in the U.S. from 2010–2014 that, indeed, made the U.S. the world’s largest oil producer and independent of foreign oil, the most recent Act amendments (Agricultural Act of 2014) still encourage biofuel development through enhanced subsidies and incentives [17-19]. The European initiative for a bio-based revolution followed a bottom-up approach that stemmed from global industry leaders in chemical production, such as Evonik Industries (fka Degussa Corp.), DSM, and BASF. With their competiveness in the chemical industry in jeopardy due to developing nations taking part (e.g., EU’s global market share slashed by almost half between 2005 and 2015), European chemical companies sought to boost their market share by increasing innovation and lower production costs [20,21]. Reducing the energy input and costs were prime targets. The workhorse behind this initiative is the European Technology Platform for Sustainable Chemistry (SusChem), which was jointly launched by the European Chemical Industry Council, the European Commission, and The European Association for Bioindustries (EuropaBio) in 2004 [20]. The goals of SusChem relate to eco-friendly routes towards chemical synthesis, positive societal impressions of chemical companies, novel molecules with multi-faceted use, and competitiveness [20]. Implementing such program required strategic and effective collaborations between academia and industry. The international BE-Basic Foundation is an example of such an effort to put the SusChem goals into action by creating industrial-academic collaborations to discover and apply new biotechnologies for sustainable processes [22]. Also in 2004, the Swiss Industrial Biocatalysis Consortium (SIBC), made up of several major industry players, aimed to elevate the industrial potential of biocatalysis by addressing the general lack of broad panels of enzyme-catalyzed reaction platforms (“enzyme toolboxes”) [23]. By invigorating enzyme discovery to enable new types of chemistry, the SIBC established approaches to create enzyme toolboxes that could compete with synthetic organic chemistry. In 2013, the largest EU public-private partnership to promote green chemistry in the pharma industry was achieved with the CHEM21 program that involved 13 European universities and 6 companies [11]. As a result and 12 years after the birth of SusChem, Europe has harnessed the power of WB by producing more than 50% of the world’s enzymes and microorganisms supply and bridging the gap between public and private support to advance green chemistry in the pharma and chemical industries [24]. Overall, the dedicated financial support by the federal government coupled with the massively productive agricultural sector should propel the US as a world-leader in biofuel research and development. Recent legislation upholding their bio-based efforts despite a bolstering oil shale boom paints a promising future for WB in the US [17,19]. The EU approach places less priority on this

Page 7: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 14PDF page: 14PDF page: 14PDF page: 14

14 | Chapter 1

research area for sustainable economies due to the geopolitics of individual mobility and focuses more on sustainable processes in the chemical industry to mitigate the environmental impact [15]. Regardless of any industrialized nation’s aspirations and contributions to a sustainable society, the traditional gatekeeper to success has been the price of gas and oil. So long as our global economy has not surpassed the critical threshold of key commodities, governmental mandates and financial support will play a crucial role in showcasing the power of WB. However, the current and 2008 oil price dives have broken the interdependence between oil price and overall market fluctuations since demand has remained low even during low oil prices [25]. One new factor affecting this demand slump is the emerging global switch away from oil, which is certainly encouraging for the WB sector and our environment [18].

4 | Green biocatalysis in industry

Inspired by the atom economy principle to evaluate efficiency of chemical processes, Anastas and Warner laid the path for green chemistry by devising the 12 Principles of Green Chemistry that have guided chemical and pharmaceutical companies to implement sustainable processes and generate green products [26,27]. For instance, the ACS Green Chemistry Institute (GCI) and several global pharmaceutical companies formed the ACS GCI Pharmaceutical Roundtable to encourage innovation while integrating green principles [8]. Shortly after this establishment, four other roundtables were formed, including one for chemical manufacturers and a forum recently established that aims to enable the bio-based and chemicals renewable economy [28]. A roundtable for hydraulic fracturing was also formed as a direct response to the oil shale boom in the U.S. These accomplishments show a strong commitment to reducing environmental impact in chemical processes that reverberates across many industrial sectors. While pure chemical processes like multiple component reactions (MCRs) are emerging as green chemistry options, the biocatalytic complement to MCRs, such as multi-enzymatic in vitro catalysis, is gaining traction [29,30]. Thus, combining or even replacing traditional chemical processes with enzymatic alternatives is becoming a pivotal industrial move towards greener synthetic routes [4,5,23]. The environmental factor (E factor) has served as the guiding metric, which is the ratio of waste over product [31,32]. The initial adoption of green biocatalysis in industry was led by the fine chemical industry (E factor range of 5–50), but the alarming E factor range (25–100) hindering the pharmaceutical industry have prompted impressive biocatalytic applications in this sector [4]. For instance, Codexis (U.S.A.) reduced total waste by 19% using an enzymatic route towards the diabetic drug, Januvia [33], and Pfizer (U.S.A.) reduced total waste by 90% after incorporating an enzymatic step in the synthesis of the antidepressant drug reboxetine (trade name Edromax) [34]. Codexis also made an impact in green biocatalysis with an E factor of 5.8 for a chemoenzymatic route towards atorvastatin (Lipitor), which is a former global top-selling pharmaceutical [35,36]. Among other impressive biocatalytic applications, Wong and co-workers showcased deoxy ribose aldolase (DERA) enzymes as a viable biosynthetic tool towards a statin side-chain of the cholesterol-lowering drug rosuvastatin (Crestor) – one of the top 5 blockbuster drugs [4,37-40]. Follow-up engineering

Page 8: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 15PDF page: 15PDF page: 15PDF page: 15

15Introduction |

1studies with DERA at DSM Pharma Chemicals (Netherlands) improved its industrial fitness while maintaining the single-step formation of two C–C bonds and chiral centers that trumped prior multi-step commercial processes for other statin drugs [35,36,41]. Sandoz (Germany) recently patented a multistep one-pot chemoenzymatic route towards rosuvastatin that lacks intermediate isolation [42,43]. While these examples demonstrate the industrial capacity of green biocatalysis, its full potential will not be realized until the ‘biocatalytic retrosynthesis’ mindset proposed by Turner and O’Reilly becomes second-nature to process chemists [44]. Such mindset will benefit generic pharmaceuticals as expiring patents on blockbuster drugs will trigger cost-effective synthetic processes, as seen in the Sandoz example [42], but also for second-generation manufacturing processes by brand manufacturers to enhance sustainability and profitability [44]. Codexis vice-president of R&D, Jim Lalonde, notes in a recent interview that pharma companies realize how green biocatalytic alternatives could still benefit ~25–75% of their pipelines [45]. With such a huge market potential for green biocatalysts in pharma alone, even more impressive is its staggering €30 billion impact on the EU economy via Industrial Biotech applications, which has created 0.5 million jobs with a projected growth to 0.9–1.5 million jobs and a €100 billion economic impact by 2030 [46]. So, how do enzymes find their way into practical applications and why the commercial hype?

4.1 | Advantages of biocatalysts For the remainder of this review, ‘biocatalyst’ will refer to isolated enzymes, unless noted otherwise. Although green chemistry is finding its way into chemocatalytic processes, major challenges still exist; like the avoidance of toxic metals, solvents, and high waste volumes [47-50]. Biocatalytic applications overcome many of these challenges through relatively benign process conditions, usually better reaction rates, and (enantio)selectivities, including major advances in controlling chirality [51]. Enhanced atom economy (efficiency of atoms from reactants incorporated into products) is another key driver for green biocatalytic processes [27,47]. E-factors help to quantify the actual waste reduction, which is a major component in judging the overall environmental impact of the green process [31,32]. The complete lack of protection groups on reactants has also made biocatalysts key components in large-scale production of many synthetic and natural industrial compounds [52]. The key features of biocatalysts have even inspired chemists to mimic enzyme active sites in their own macromolecular complexes, some of which exceeded the functionalities of the original active sites they were molded after [53].

4.2 | Challenges of enzyme biocatalyst applications The attractively high specificity of enzymes traditionally comes at a cost with limited substrate scopes [54]. However, two general solutions have emerged to side-step this shortcoming. Nature’s solution is the evolved promiscuous enzyme function achieved through malleable substrate scopes and mechanisms [55-61]. Natural enzyme promiscuity appears to be a remnant of ancestral activities that renders many enzymes with inherent catalytic versatility [57,60,62]. As a second solution, directed evolution and engineering of enzymes in the lab have either introduced de novo promiscuous functions or improved otherwise meager promiscuous activities [58,63]. One study demonstrated

Page 9: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 16PDF page: 16PDF page: 16PDF page: 16

16 | Chapter 1

impressive commercial potential of the berberine bridge enzyme for non-natural product synthesis by extracting maximal value from its native substrate promiscuity without laboratory evolution or engineering efforts [64,65]. Only reaction conditions were tuned for the desired conversions. Clearly, high enzyme specificity is being progressively mitigated through studies on promiscuous activities. Other factors that affect the potential of biocatalytic applications are thermostability, activity, reaction scopes, cost, and reusability [6,23,66]. Considering the aforementioned commercial determinants of biocatalysts, it has become clear that a delicate balance among these is crucial to enable the most economically viable route of application [7]. This balance culminates into an overarching factor, Volume-Time-Output (VTO), which was determined to be the most influential driver of good chemical manufacturing processes by Boehringer Ingelheim [7,67]. From the VTO metric, it was determined that minimal product concentrations generated from typical low molecular weight substrates in a single reactor process should be in the range of 100–250 mM [7,67]. Whereas this is within the inhibitory concentration range for many enzymes, directed evolution and engineering approaches have overcome this hurdle by innovative scientific approaches [33,36]. Nevertheless, novel biocatalysts are still in high demand that can catalyze a plethora of reactions [5,23,68].

5 | Finding the ideal industrial biocatalyst

In a review over enzyme use in organic synthesis and the life sciences, an industrial biotech consortium stressed the importance of the “toolbox” feature of commercial enzyme applications [23]. The toolbox characteristic refers to groups of enzymes similar in reaction type, but with versatile substrate scopes and selectivities that may surpass or equal that of traditional chemical catalysts [23,69]. The consortium further outlined approaches to fully embrace the potential of enzymes in industrial biotech, which include tapping into underdeveloped enzyme classes using meaningful enzyme test kits, enabling ready-to-use stable enzymes spanning all classes, unleashing functional potential in sequence databases, scale-up technology advances, and effective collaboration models to accelerate access to biocatalysts. Acquiring a toolbox of enzymes is only one aspect of finding the “ideal” biocatalyst. Matching biocatalysts with bioprocess conditions is the other critical aspect. With the help of directed evolution and enzyme engineering, a paradigm shift emerged from a non-ideal to an ideal fit between bioprocess conditions and the biocatalyst [71]. As seen in Figure 1a, the “non-ideal” situation of fitting a process to the biocatalyst has shifted to tuning the biocatalyst to fit the bioprocess in the current “ideal” situation. Manipulating characteristics of the enzyme (Figure 1b), such as specificity, activity, stability, and efficiency through directed evolution and engineering approaches augmented this ideal state in industrial biocatalysis [70]. Now that laboratory techniques can make biocatalysts fit the bioprocess, finding pools of suitable enzyme targets for engineering to replace the vast array of chemical processes is still the bottleneck. However, we may be at the verge of unlocking the treasures of sequence databases to extract the immense commercial value of biocatalysts that awaits. General methods that brought us to this point, along with selected enzyme discoveries that arose, are reviewed here. Transaminase (TA) highlights

Page 10: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 17PDF page: 17PDF page: 17PDF page: 17

17Introduction |

1will be the prime focus, due to their industrial prowess. Directed evolution examples are interlaced within the relevant discovery method. A thorough review of TAs by Guo and Berglund provides an exquisitely deep insight into TA research [72]. As complex as the TA field has become within only a decade, the review should serve as the ‘Holy-Grail’ for researchers in this field to gain clear oversight and improve impact of future works [72].

Figure 1 | The ideal biocatalyst. (a) Conventional laboratory methods enabled a paradigm shift to an ideal industrial bioprocess, whereby the biocatalyst is designed to meet the industrial parameters. (b) Biocatalyst feature profile to evaluate industrial fitness (adapted from Lorenz et al. [70].

5.1 | Bioinformatics: the gatekeeper to knowledge generation Just under a decade ago, the onset of next generation sequencing (NGS) technologies began to exponentially slice DNA sequencing costs much faster than the decline of computational costs (data storage, processing, downstream analysis, etc.) to accommodate the big data influx [73-77]. These misaligned cost trends have pinned bioinformatics as the key bottleneck to maximizing value of public sequence databases [78]. The fact that less 1% of the ~71 M sequences in the UniProtKB database have high-quality annotations underscores the magnitude of this bottleneck [79]. On the data acquisition side, the relatively faster NGS tech development compared to traditional developmental trends in other tech fields, as described by Moore’s law, exacerbated this disparity between data generation and knowledge generation. Our computational capacity simply cannot digest the big data generated by NGS tech in a scalable fashion. On the knowledge generation side, an inability to develop standardized, reliable automated pipelines to annotate predicted genes has made bioinformatics the addressable bottleneck in the post-2008 genomics era, since experimental validation is not feasible for all database sequences. In a 2009

Page 11: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 18PDF page: 18PDF page: 18PDF page: 18

18 | Chapter 1

study to gauge the accuracy of automated annotations, three of the major public protein sequence databases (GenBank NR, UniProtKB/TrEMBL, and KEGG) harbored misannotation rates between 5–63% across six enzyme superfamilies [80]. Of the total 37 families studied, 10 of them showed a staggering error of >80% in at least one of the databases. Unsurprisingly, the manually curated SwissProt database had an error rate near 0%. However, within this manually curated database that contains experimentally validated annotations, biases have apparently been introduced from high-throughput screening results, which may not be reporting prominent or native functions [81]. Despite that relatively more sophisticated computational annotation methods are in place now, the prevalent homology-based annotation was the key driver of error propagation in databases during the first decade of the 2000s [80]. Thus, researchers had been largely addressing the multifaceted problem of defining a protein function with a single-prong solution [82]. In 2010, a major U.S. consortium, the Enzyme Function Initiative (EFI), was created to address these challenges of enzyme function predictions in bacterial genomes [83]. The EFI initiative employed a multidisciplinary approach to predict functions of uncharacterized enzymes spanning five functionally diverse superfamilies. The five-year project resulted in the annotation of 6,444 proteins among the original targeted superfamilies, and an additional 4,132 proteins annotated covering 3 additional superfamilies via community efforts as of November 2016 (http://kiemlicz.med.virginia.edu/efi/targets/index). To gauge the accuracy of protein prediction tools in 2013, the first Critical Assessment of protein Function Annotation algorithms (CAFA1) experiment evaluated the accuracy of 54 state of the art tools using 866 uncharacterized target proteins spanning 11 organisms [82]. Function predictions were then scored against community-based experimental validations. The results showed that, although second generation prediction tools show vast accuracy improvements compared to first generation tools (eg., BLAST), a large room for improvements remained. Subsequently, a second assessment in 2016 of prediction accuracy at a larger scale was performed in CAFA2, where 126 prediction methods were scored against 3,681 uncharacterized proteins [84]. Accuracy did improve since CAFA1 experiments, but individual usefulness and interpretation of results were context-dependent. Aside from misannotation blunders, redundancies have also infested databases. This was apparent after the highly-regarded UniProtKB database was shrunk in half after filtering out the redundancies in 2015 [85]. Clearly an effective digest of genomics big data is desperately needed to maximize output of public sequence databases. A major force in addressing this issue was recently formed by Intel Corp. and the Broad Institute of MIT and Harvard [86]. They aim to integrate diverse genomic datasets and empower analytics. Achieving this will transform the roadmap for enzyme discoveries via soil enrichments, metagenomics, bioinformatics mining, and protein engineering. Tremendous discoveries have still been made with these methods, which are highlighted in the following sections. Notable directed evolution and protein engineering efforts that elevated industrial potential of the highlighted enzymes are mentioned throughout.

5.1.1 | Bioinformatics mining for enzymes The onset of NGS tech transformed bioinformatics mining for enzymes as data grew exponentially and predictive bioinformatics tools developed [87,88]. The intrinsically higher throughput with this

Page 12: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 19PDF page: 19PDF page: 19PDF page: 19

19Introduction |

1approach dwarfs traditional ‘wet lab’ approaches like PCR and colony hybridization to identify new homologs [87]. As reviewed by Luo et al., two general approaches for bioinformatics mining have prevailed to fish out novel enzymes from databases: genome hunting and data mining [88]. Genome hunting, or genome mining, relates to “hunting” for enzymes confined to a single microorganism. The strains are usually selected by soil enrichments or from strain collections after filtering with target reactions. In silico homology-based screening is often used to retrieve genes of interest after the genome is sequenced. PCR-based screening or functional screening of DNA libraries are also commonly applied. With genome hunting today, laborious genomic libraries can usually be avoided. Data mining refers to construction of homology-based selection and alignments of sequences from public databases to discover enzymes of interest. While this approach covers much larger amounts of sequence space than genome hunting, diligence in choosing curated input sequences is crucial for high-quality output. The following examples highlight discoveries of amine-converting enzymes using bioinformatics mining. This group of enzymes is considered because approximately 75% of the pharmaceuticals contain a chiral amine group [89]. Furthermore, reductive amination of ketones for chiral amine production is one of the most challenging enzymatic reactions [90]. Thus, genome and data mining have become an invaluable discovery method for novel aminating enzymes; especially amine transaminases (ATA) [91], which are mostly fold-type I, class II pyridoxal-5’-phosphate (PLP)-dependent enzymes. Although ATAs are also referred to as class III members, the original classification established by Mehta et al. and reviewed by Schiroli and Peracchi is used here [92,93]. An example of the data mining approach was described by Höhne et al., who discovered rare (R)-ATAs transpiring from fold-type IV PLP enzymes [94]. Using sequence-based prediction of substrate specificity and enantiopreference, 21 putative (R)-ATAs were mined out of 5,980 filtered sequences from the NCBI database that were mostly classified as branched-chain TAs. Out of these 21 candidates tested in the lab, 17 proved to be true (R)-ATAs (80% hit rate) and >50% proved to be useful biocatalysts. The streamlined approach avoided library screening altogether by implementing a rational design filter upfront for an in silico ‘smart library’ screening. Seven of these new (R)-ATAs were further evaluated for industrial potential, yielding a diverse ATA toolbox for synthesis of structurally diverse amines [95]. Another example of data mining for ATAs was reported by Park et al., who mined four ATAs from different microorganisms based on the query sequence of an ATA from Paracoccus denitrificans (PdATA) [96]. PdATA was previously shown to give a 91% synthetic yield of an unnatural amino acid (L-homoalanine, ee>99%) from a natural one (L-threonine) in a novel coupled enzyme reaction with a threonine deaminase [97]. Although the three characterized ATAs share diverse sequence identities (17–42%) with PdATA, all showed similar substrate specificities. For instance, all four ATAs showed similar activities towards amino donors, (S)-1-aminoindan and (S)-α-ethylbenzylamine, which were higher and lower than the activities with reference substrate (S)-α-MBA, respectively. Through homology modeling and substrate dockings, six active site residues were earmarked as the determinants of substrate specificity and demonstrated the prevalent disconnect between sequence and function among ATAs [91,96]. Another impressive example of data mining for amine enzymes was by Mayol et al. [90]. They recently discovered the first sequences of natural amine dehydrogenases (AmDH) that convert ketones with a carboxylic acid group beyond the β-position.

Page 13: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 20PDF page: 20PDF page: 20PDF page: 20

20 | Chapter 1

After searching for homologs of a known (2R,4S)-2,4-diaminopentanoate dehydrogenase from Clostridium sticklandii that acts on the γ-amino group, twenty-six candidate sequences were selected for characterization out of the 169 clustered protein sequences that were identified. Twenty of the selected protein sequences were successfully produced and six showed activity toward non-α/β keto acids, yielding a 28% hit-rate of the characterized candidates. After reaction optimization (Figure 2a), enzyme AmDH4 from the thermophilic bacterium Petrotoga mobilis emerged with industrial potential for sustainable synthesis of (4S)-4-aminopentanoic acid, a valuable therapeutic building block (Figure 2b) [98-100]. Although Mayol et al. deemed their method as ‘genome mining,’ it is classified here as ‘data mining,’ since more than one genome was mined in the process of selecting candidate sequences.

Figure 2 | Characterization of a novel amine dehydrogenase (AmDH4) mined from the genome of Petrotoga mobilis strain DSM 10674. Modified from Mayol et al. [90]. (a) Optimization for biocatalytic conversion of 4-oxopentanoic acid (4-OPA) to (S)-4-aminopentanoic acid ((S)-4-APA) using AmDH4 at various conditions. (b) Optimized biocatalytic synthesis of (S)-4-APA using an AmDH4/formate dehydrogenase (FDH) coupled system at semi-preparative scale. Final conditions and concentrations are indicated with small fonts and brackets, resp.

Page 14: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 21PDF page: 21PDF page: 21PDF page: 21

21Introduction |

1 Massive genome mining for ATAs was nicely demonstrated by Seo et al. [101]. This study extracted 10 putative ATA sequences from a Mesorhizobium loti strain, which contains 2–4 fold more TA genes than E. coli or Bacillus subtilis due to its nitrogen-fixation ability. All 10 sequences were verified as ATAs with broad specificities (Figure 3). Since the active sites were all predictively similar, the differing residues lining active site entrances were predicted to control specificity. This study underscored the importance of high-quality input query sequences to maximize hit rates. Cerioli et al. recently mined an ATA from the moderate halophilic bacterium, Halomonas elongata (HEWT) [102]. HEWT was discovered through a conserved domain, structural, and catalytic residue motif analyses.

Figure 3 | Substrate profile of 10 ω-TAs mined from genome of Mesorhizobium loti strain MAFF303099. Summary of work by Seo et al. [101]. The inset shows the reference reaction with (S)-α-methylbenzylamine (MBA). The distribution of relative activities to (S)-α-MBA among the ω-TAs are shown for each amine donor tested: A2, 1-phenylpropylamine; A3, phenylethylamine; A4, 3-phenylpropylamine; B1, ethylamine; B2, propylamine; B3, butylamine; B4, sec-butylamine; B5, sec-amylamine; C1, β-alanine; C2, 3-amino-n-butyric acid; D1, β-phenylalanine. Substrate numbering is consistent with original work.

Page 15: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 22PDF page: 22PDF page: 22PDF page: 22

22 | Chapter 1

High enantioselectivity, a broad substrate profile, and organic solvent tolerance confirmed HEWT’s strong potential for industrial synthesis of chiral amines. Wilding et al. recently mined 14 putative TAs from a Pseudomonas strain to rationalize its metabolism of a nylon-12 building block, 12-aminododecanoic acid (12-ADA) [103]. Of the 11 putative TAs that were successfully produced, nine showed TA activity and three converted 12-ADA. Follow-on work unveiled the impressive promiscuity harbored in the best-performing 12-ADA TA (KES23458), and confirmed its native function as a β-alanine transaminase [104]. This work highlights the importance of dormant substrate promiscuities encoded within enzymes and how it empowers microbial versatility.

5.2 | Soil enrichments Functional screening of novel enzymes usually transpires from two traditional experimental routes depicted in Figure 4. In the first route (Figure 4a), metagenomic DNA (metDNA) is used to a make a gene expression library that is screened for enzymes with target activities in a suitable expression host. Although metDNA library screening offers immense diversity by overcoming culture bias, this method most often leads to dead-ends when screening for non-abundant enzymes, and more so if pre-enrichment with target activities was not applied. The next section highlights discoveries with this method. Soil enrichments overcome the gene abundancy issue since the target activities are enriched for (Figure 4b). However, immense biodiversity is lost since ~99% of microbes are not cultivable [10]. Nevertheless, great biocatalytic discoveries have been made with soil enrichments. As a nice ‘wet lab’ complement to the aforementioned in silico screening for (R)-ATAs by Höhne et al., Pavkov-Keller et al. recently discovered two novel (R)-ATAs through soil enrichments [108]. Interestingly, the (R)-ATA sequence motif defined by Höhne et al. was missing in the novel enzymes. Thorough characterization placed them in a new subgroup within the fold IV family of PLP-enzymes, further confirming the diverse nature of this fold family. One differentiator of the new subgroup is its inability to catalyze the reverse of the reaction screened for (amination of ketones). While the bitter-sweet reality of “you get what you screen for” applies here, the new subgroup obviously would have never been discovered by screening the reverse reaction. Thus, as impressive as the Höhne et al. work is, Pavkov-Keller et al. showed how we are still at the proverbial ‘tip of the iceberg’ in grasping the true diversity that flourishes within enzyme families. The most studied (S)-ATA originates from the Vibrio fluvialis JS17 strain (VfATA), which was isolated 15 years ago by Shin et al. using soil enrichments (Figure 5) [102,105]. Although two other isolated strains showed (S)-ATA activity, the JS17 strain prevailed as the most promising biocatalyst for chiral amine production. Subsequent characterization of VfATA demonstrated its novelty among four other homologous ATAs by not exhibiting β-alanine activity and missing half of the 159 conserved residues contained within the homologs [106]. Directed evolution using random mutagenesis of VfATA led to a double mutant with product inhibition reduced by up to 6-fold relative to wild type and improved activities with long-chain alkyl amines (Figure 5) [107]. Interestingly, the double mutant arose after only one round of random mutagenesis. Site-saturation mutagenesis targeting the mutated positions in the double mutant did not yield improved variants. To date, VfATA has become one of the most broadly applied ATAs in chiral amine synthesis [72].

Page 16: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 23PDF page: 23PDF page: 23PDF page: 23

23Introduction |

1

Figure 4 | Workflows of biocatalyst discoveries. (a) Metagenomics library screening and (b) microbial screening methods with defined media growth selection utilizing various β-amino acids as sole nitrogen sources.

Shortly after the initial characterization of VfATA, Iwasaki et al. isolated two strains, Arthrobacter sp. KNK168 and Pseudomonas sp. KNK425, from soil enrichments selected for degradation of (R)- and (S)-3,4-dimethoxyamphetamine (DMA), respectively (Figure 6) [109]. DMA is an important synthetic intermediate in anti-diabetic and anti-obesity agents [109]. Characterization of the (R)-ATA

Page 17: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 24PDF page: 24PDF page: 24PDF page: 24

24 | Chapter 1

from the KNK168 strain (ArtATA) was reported nine years later, and the crystal structure determined soon thereafter by Guan et al. [110,111]. The latter reported 3D structures of ArtATA and the famed ATA-117-Rd11 from Codexis, Inc., which rationalized stereospecificity of (R)-ATAs and pinpointed an active-site loop as a key determinant of substrate specificity (Figure 6). Although it is not evaluated by Guan et al., the aforementioned (R)-ATA motif is present within these (R)-ATAs [94,108,111]. ATA-117-Rd11, derived from a homolog of ArtATA (one amino acid difference [111], was previously engineered by Codexis, Inc. for synthesis of the diabetic drug sitagliptin (Figure 6) [33].

Figure 5 | Discovery and directed evolution of VfATA. (1) Discovery through enrichment cultures (green flask) of Vf JS17 strain that was applied for kinetic resolution of rac-sec-butylamine to give (R)-enantiomer (Shin et al. [105]). The VfATA enabling the amine conversions by strain JS17 was later characterized by Shin et al. [106]. (2) Directed evolution of VfATA and attributes of the evolved mutant (Yun et al. [107]).

Page 18: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 25PDF page: 25PDF page: 25PDF page: 25

25Introduction |

1

Figure 6 | Discovery and directed evolution of ATAs for industrial synthesis of sitagliptin. (1) Two strains discovered by Iwasaki et al. [109] that synthesize enantiopure (R)- and (S)-3,4-dimethoxyamphetamine (DMA) from 3,4-dimethoxy-phenylacetone (DMPA). Characterization of the ArtATA enabling (R)-DMA conversion in Arthrobacter strain KNK168 was reported by Iwasaki et al. [110]. (2) Directed evolution of ArtATA homolog, ATA-117, was performed by Savile et al. [33] yielding the ATA-117-Rd11 mutant (‘R11’), which was applied for industrial synthesis of sitagliptin. Accumulation of the 27 mutations in the R11 mutant from each round are highlighted in different colors with indicated activity effects relative to the prior round mutant. Note that the crystal structure of the R11 mutant (PDB: 3WWJ) determined by Guan et al. [111] is represented in each evolution round since a structure of each mutant is not available. Green flask indicates liquid growth selection step.

Page 19: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 26PDF page: 26PDF page: 26PDF page: 26

26 | Chapter 1

5.3 | Metagenomics: the yellow brick road to biodiversity Microbial metagenomics (MG) is the study of microbial communities at the genome level to quantify and exploit biodiversity. From this, we can exploit the commercial potential of novel biocatalysts that affect disease, industry, and our environment [112]. The culture-independent route using MG surmounts the daunting fact that only 1% of microbes are cultivable in the lab [10]. Thus, MG is the premier method to explore the microbial frontier at the genome level [113]. Just as NGS tech has generated a deluge of sequencing data for genomics, the same has happened for MG. As such, unfortunately, an imbalance between data generation and knowledge acquisition plagues the MG field as well. With the ever-increasing number of bioinformatics tools and databases amassing for MG analyses [114-121], an unsustainable situation both financially and computationally has transpired [122]. The gridlock of complexities from each step has obviated development of a gold-standard for an MG analysis pipeline [115,121]. A major complexity is developing standardized, replicated experimental design to go beyond quantifying apparent diversity [112]. Overcoming this allows system-level studies to extract global value of Earth’s microbiome. The evolution of developing systemic, standardized MG experiments is seen in prior marine ecosystem studies. Marine MG projects have made massive contributions to our sequence databases to study the climate and biogeochemical regulation by the world’s marine ecosystem, as well as bioprospecting [112,123,124]. The first major ocean sampling expedition in the genomics era (cloned-based sequencing) reported ~1 Gb of non-redundant sequences originating from the Sargasso Sea [125]. Insights into oceanic microbial diversity were obtained that shed light on efficient nitrogen and phosphorus utilization by Sargasso Sea inhabitants in a nutrient-deficient environment. Three years later, the Sorcerer II Global Ocean Sampling expedition extended the Sargasso dataset by more than 5-fold to give the largest collection at that time of MG sequences yielding 6.3 Gb [126]. This study provided insight into the evolution mechanisms employed by marine microbes and underscored the spectacular diversity within microbial communities. The expedition achieved this through novel comparative genomic and assembly methods. Both seminal expeditions sampled only microbes, however. The paradigm shift in MG induced by NGS tech through cheaper sequencing resulted in the largest marine MG sequencing expedition, Tara Oceans [127]. This 3-year expedition across many oceans employed a holistic approach to study the globe’s marine ecosystem by sampling plankton and analyzing 5 major organisms, including microbes. In total, ~35,000 plankton samples generated 40 million genes. Five concurrent Science publications reported initial analyses from the Tara Oceans dataset that begin to shed light on how climate change has impacted marine ecosystems [128-132]. The Ocean Sampling Day (OSD) was another massive initiative to employ marine MG on diverse global shores [133]. The MicroB3 EU project initiated OSD and rallied up 191 sampling sites in 2014 and nearly the same in 2015 to acquire the largest standardized microbial dataset in a single day (https://www.microb3.eu/osd). Processing the MG data requires key multidisciplinary collaborations to develop relevant analysis pipelines for high-throughput functional annotation. For the Tara Oceans and OSD projects, the EBI Metagenomics (EMG) cloud-based platform emerged as the sole analysis pipeline [116]. EMG generated impressive throughput during its seven-month endeavor that yielded 23 billion predicted protein sequences (9 billion curated matches) of the Tara Oceans microbiome subset.

Page 20: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 27PDF page: 27PDF page: 27PDF page: 27

27Introduction |

1Authors claim that the streamlined EMG pipeline could now process a similar dataset about three-fold faster. Analysis of the OSD MG 2014 dataset using EMG led to the prediction of 180 million protein sequences (68 million curated matches). Although the EMG database now rivals the size of other major MG datasets (eg., MG-RAST, iMicrobe, IMG/MG), a major challenge will be to synchronize organization and analyses across all datasets to enable cross-platform comparisons [116]. Implementation of the Minimum Information about any (x) Sequence (MIxS) checklists is a major step towards standardized data input that makes meta-analyses of many studies easier – potentially eradicating laborious literature research [112,134,135]. The aforementioned Intel-Broad Institute collaboration could likely make valuable contributions in these efforts, as they aim to help scientists integrate and analyze diverse genomic datasets [86].

5.3.1 | Enzyme discoveries through metagenomics Enzyme discovery rates with MG DNA sequence data have substantially increased with improved bioinformatics mining, compared to activity-based screening of MG expression libraries. This is easily seen in Table 1 of a review by Iqbal et al., which summarized 13 different types of enzymes discovered through genome mining and traditional functional screenings [87]. The hit rates derived from activity-based screening of random expression clones reached only 0.1%, whereas bioinformatics-based selection of candidate sequences followed by laboratory evaluation produced hit rates of >50%. Of course, relative gene abundancies will dictate discovery rates of functional screening and should be noted when comparing such metrics. Both metagenomic sequence databases and metagenomic expression libraries have become key tools for novel enzyme discoveries [136]. Such discoveries advance industrial biotech applications and avoids patent infringements. Novelty is measured at the sequence, functional, and structural levels. Exploring unchartered sequence space has become a trend, but this does not ensure novel functions or structures [136]. For functional MG library screening, barriers to novelty transpire from biased expression methods mostly confined to E. coli host strains and limited assay methods, which have caused production of more novel sequences than functions [136]. In other words, the degree of novelty discovered depends on the novelty of the screening method. By 2009, functional metagenomics made impressive leaps into novel sequence space, especially for glycosyl hydrolases and lipase/esterases [136]. However, these discoveries are biased by their industrial relevance, established screening, and high gene abundance [136,137]. Colin et al. recently overcame these biases and accessed new sequence space across three superfamilies through promiscuous activities [137]. They discovered novel hydrolases with an ultra-throughput microfluidic droplet screening method using fluorogenic bait substrates (Figure 7). This cutting-edge functional screening method unveiled 14 rare hits in a MG library size of 1.25 million clones with a 15-fold coverage. Intriguingly, the rare activities detected were promiscuous activities with xenobiotics in most of the hits, and sequence analyses could not have predicted these promiscuities (Figure 7). This seminal work marks the first MG-based discovery of triesterases or sulfatases and demonstrates how promiscuity links non-homologous sequences. Now, new access points in promiscuous activity networks are possible with the droplet screening method. This will empower annotation capacities of previously unassigned sequences and help mitigate the 30–40% constant rate of unknown genes per genome being deposited in our public databases [76].

Page 21: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 28PDF page: 28PDF page: 28PDF page: 28

28 | Chapter 1

Figure 7 | Picodroplet functional metagenomics screening. Summary of work by Colin et al. [137], who used ultrahigh-throughput screening for novel hydrolases converting fluorogenic bait substrates 1 and 2. Six hits were discovered with substrate 1, including a new adenosine 3-phosphate-5-phosphosulfate (PAPS)-independent aryl sulfotransferase (AST). One predicted and seven novel phosphotriesterases (PTE) were discovered with substrate 2.

Page 22: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 29PDF page: 29PDF page: 29PDF page: 29

29Introduction |

1The key challenge will be to pick ‘generic’ substrates compatible with droplet screening. This will allow selection of multiple specificities at once and improve the cost-benefit ratio of pursuing such method. So far, the droplet screening method has been applied to the discovery or engineering of hydrolases, aldolases, polymerases, and dehydrogenases [138]. Considering extensive reviews [72,139], class II TA discoveries via functional MG routes seem nonexistent. However, MG database mining has generated a novel acyl-CoA β-TA involved in an alternative route for lysine fermentation using gene context analyses [140], thermostable (S)-selective ATAs with broad substrate ranges using 15 query sequences of (R)- and (S)-ATAs [141], and additional ATAs for the synthesis of allylic amines through a Pfam-based filtering of enzyme classes [142]. With the numerous high-throughput screening options [72] and prevalent disconnect between sequence and function for class II TAs [91], ramping up functional MG screenings may provide powerful insight into selectivity determinants for this class of TAs and new conversions.

6 | Conclusions

Impressive advancements in enzyme discovery methods have propelled biocatalysts as sustainable solutions in green chemistry. This review briefly summarizes enzyme discovery for applied biocatalysis and provides examples of industrial applications. Less than a decade ago, NGS tech empowered enzyme discovery resources with rich genomics data, but the route to exploit it was severely underestimated. Furthermore, misannotation and lack of clear sequence-structure-activity relationships have infiltrated many facets of enzyme research, leaving researchers with confusing amounts of bioinformatics tools to extract value from sequence databases. The goal now is to start correcting the imbalance between data and knowledge generation to speed up the discovery of industrial biocatalysts. Massive efforts like the Enzyme Function Initiative and the Intel-Broad Institute collaboration are the exact routes we need to make sense of the data we already have [83,86]. Full economic support for such efforts will be crucial in defining sustainable solutions for our big data problems. Once achieved, a smoother path to industrial stardom of biocatalysts should prevail.

7 | Thesis goals and outline

In this thesis, complementary approaches for biocatalyst discovery and engineering described above are applied to obtaining novel or better enzymes that catalyze (de)amination reactions on a series of model organic compounds. The emphasis is on aminomutases (AM), transaminases (TA), and a CoA-dependent amino acid ligase/ammonia lyase pair. After this Introduction (Chapter 1), we describe the discovery of new enzymes that form or cleave carbon-nitrogen bonds from nature’s metabolic diversity using metagenomic screening and soil enrichments. The results are reported in Chapter 2, where a β-TA from Variovorax paradoxus strain M7V is discovered after enrichment with β-valine. Activity measurements are reported for the enzyme and compared to previously characterized β-TAs. Bioinformatics mining led also to strong

Page 23: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 30PDF page: 30PDF page: 30PDF page: 30

30 | Chapter 1

implications of β-valine catabolic enzymes in strain M7V and Acidovorax sp. strain MG01, which was also enriched on β-valine. A potentially novel ω-TA with a likely broad substrate scope was also implicated from mining. The work contributes to the available toolbox of enzymes for applied biocatalysis. A second target is the engineering of a phenylalanine aminomutase (PAM) to improve its industrial potential for β-phenylalanine (β-Phe) synthesis, an anticancer drug building block. PAMs catalyze the reversible regioisomerization of α-phenylalanine and β-phenylalanine. Significant sequence and structural similarity suggest that their mode of action is similar to that of phenylalanine ammonia lyases. Chapter 3 reviews the therapeutic and industrial potential of these enzymes. The similarity between the mutases and the lyases suggests that the mutases can also be used for direct amination of cinnamic acids to produce the β-phenylalanines. However, the catalytic features are very different. Control of catalytic rate, regioselectivity, and stereoselectivity emerge as important engineering targets. The results reported in Chapters 4 and 5 show that a flexible loop protruding into the active site of PAM dictates conversion rates for β-phenylalanine. Chapter 5 reports the first empirical evidence of this structural determinant that separates phenylalanine lyases from aminomutases. As reviewed in this Chapter, function prediction tools will be crucial in processing the immense sequencing data. Thus, Chapter 6 examines potential starting points towards developing in silico predictions of substrate specificity for fold-type I, class II TAs. After extensive substrate profiling of TAs, the respective protein crystal structures or homology models are inspected to see if general trends can be explained. An in silico method was developed that could possibly be a foundation for an enantioselectivity prediction tool for β-TAs. Finally, Chapter 7 gives an overview of the most important conclusions and future directions of the presented work.

Page 24: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 31PDF page: 31PDF page: 31PDF page: 31

31Introduction |

1References

1. Stokes B, Wike R, Carle J. Global Concern about Climate Change, Broad Support for Limiting Emissions [Internet]. Pew Res. Cent. | Glob. Attitudes Trends. 2015. Available from: http://www.pewglobal.org/2015/11/05/global-concern-about-climate-change-broad-support-for-limiting-emissions/

2. Denchak M. How You Can Stop Global Warming [Internet]. NRDC. 2016. Available from: https://www.nrdc.org/stories/how-you-can-stop-global-warming

3. Herrera S. Industrial biotechnology - A chance at redemption. Nat. Biotechnol. 2004;22:671–5. 4. Dunn PJ. The importance of reen Chemistry in Process Research and Development. Chem. Soc. Rev.

2012;41:1452–61. 5. Reetz MT. Biocatalysis in organic chemistry and biotechnology: past, present, and future. J. Am. Chem. Soc.

2013;135:12480–96. 6. Bornscheuer UT, Huisman GW, Kazlauskas RJ, Lutz S, Moore JC, Robins K. Engineering the third wave of

biocatalysis. Nature. 2012;485:185–94. 7. Huisman GW, Collier SJ. On the development of new biocatalytic processes for practical pharmaceutical

synthesis. Curr. Opin. Chem. Biol. 2013;17:284–92. 8. Constable DJC, Dunn PJ, Hayler JD, Humphrey GR, Leazer, Jr. JL, Linderman RJ, et al. Key green chemistry

research areas—a perspective from pharmaceutical manufacturers. Green Chem. 2007;9:411–20. 9. Torsvik V. Prokaryotic diversity- magnitude, dynamics, and controlling factors. Science. 2002;296:1064–6. 10. Amann R, Ludwig W, Schleifer K. Phylogenetic identification and in situ detection of individual microbial cells

without cultivation. Microbiol Rev. 1995;59:143–69. 11. Aldridge S. Industry backs biocatalysis for greener manufacturing. Nat. Biotechnol. 2013;31:95–6. 12. Straathof AJ., Panke S, Schmid A. The production of fine chemicals by biotransformations. Curr. Opin. Biotechnol.

2002;13:548–56. 13. Riese J. Surfing the third wave: new value chain creation opportunities in industrial biotechnology. Frankfurt,

Germany: McKinsey & Company; 2003. 14. Schepens H, Sijbesma F. White Biotechnology: gateway to a more sustainable future. Frankfurt, Germany:

McKinsey & Company; 2003. 15. Lorenz P, Zinke H. White biotechnology: differences in US and EU approaches? Trends Biotechnol. 2005;23:570–4. 16. Biomass Research and Development Act of 2000. 2000. 17. Sheikhs v shale; The new economics of oil. Econ. 2014 Dec 6;413:17. 18. E.L. Why the oil price is falling [Internet]. Econ. 2014. p. 1–2. Available from: http://www.economist.com/blogs/

economist-explains/2014/12/economist-explains-419. Agricultural Act of 2014. 2014. 20. SusChem stakeholders. SusChem: Strategic Innovation and Research Agenda. 2015. 21. European Chemical Industry Facts and Figures Report 2016 [Internet]. 2016. Available from: http://www.cefic.

org/Facts-and-Figures/22. BE-Basic Foundation [Internet]. Available from: http://www.be-basic.org/about/about-be-basic.html23. Meyer H-P, Eichhorn E, Hanlon S, Lütz S, Schürmann M, Wohlgemuth R, et al. The use of enzymes in organic

synthesis and the life sciences: perspectives from the Swiss Industrial Biocatalysis Consortium (SIBC). Catal. Sci. Technol. 2013;3:29–40.

24. Desaint N. EuropaBio calls for reality check as Commission’s Circular Economy proposal hits the mat. 2015. 25. Mackintosh J. What Is the “Magic Number” for the Price of Oil? Wall Str. J. 2016 Apr 19;3. 26. Anastas PT, Warner JC. Green Chemistry: Theory and Practice. 1998. 27. Trost BM. The atom economy-A Search for synthetic efficiency. Science. 1991;254:1471–7. 28. ACS GCI Roundtable website [Internet]. Available from: https://www.acs.org/content/acs/en/greenchemistry/

industry-business.html

Page 25: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 32PDF page: 32PDF page: 32PDF page: 32

32 | Chapter 1

29. Cioc RC, Ruijter E, Orru RVA. Multicomponent reactions: advanced tools for sustainable organic synthesis. Green Chem. 2014;16:2958.

30. Köhler V, Turner NJ, Kroutil W, Rueping M, Bruggink A, Schoevaart R, et al. Artificial concurrent catalytic processes involving enzymes. Chem. Commun. 2015;51:450–64.

31. Sheldon RA. Consider the environmental quotient. Chemtech. 1994;24:3. 32. Sheldon RA. The E Factor: fifteen years on. Green Chem. 2007;9:1273. 33. Savile CK, Janey JM, Mundorff EC, Moore JC, Tam S, Jarvis WR, et al. Biocatalytic asymmetric synthesis of chiral

amines from ketones applied to sitagliptin manufacture. Science. 2010;329:305–9. 34. Assaf G, Checksfield G, Critcher D, Dunn PJ, Field S, Harris LJ, et al. The use of environmental metrics to evaluate

green chemistry improvements to the synthesis of (S,S)-reboxetine succinate. Green Chem. 2012;14:123–9. 35. Ma SK, Gruber J, Davis C, Newman L, Gray D, Wang A, et al. A green-by-design biocatalytic process for

atorvastatin intermediate. Green Chem. 2010;12:81–6. 36. Fox RJ, Davis SC, Mundorff EC, Newman LM, Gavrilovic V, Ma SK, et al. Improving catalytic function by

ProSAR-driven enzyme evolution. Nat. Biotechnol. 2007;25:338–44. 37. Chen L, Dumas DP, Wong CH. Deoxyribose 5-phosphate aldolase as a catalyst in asymmetric aldol condensation.

J. Am. Chem. Soc. 1992;114:741–8. 38. Gijsen HJM, Wong C-H. Unprecedented asymmetric aldol reactions with three aldehyde substrates catalyzed by

2-Deoxyribose-5-phosphate aldolase. J. Am. Chem. Soc. 1994;116:8422–3. 39. Wong C-H, Garcia-Junceda E, Chen L, Blanco O, Gijsen HJM, Steensma DH. Recombinant 2-deoxyribose-5-

phosphate adolase in organic synthesis: Use of sequential two-substrate and three-substrate aldol reactions. J. Am. Chem. Soc. 1995;117:3333–9.

40. [Liu J, Hsu C-C, Wong C-H. Sequential aldol condensation catalyzed by DERA mutant Ser238Asp and a formal total synthesis of atorvastatin. Tetrahedron Lett. 2004;45:2439–41.

41. Jennewein S, Schürmann M, Wolberg M, Hilker I, Luiten R, Wubbolts M, et al. Directed evolution of an industrial biocatalyst: 2-deoxy-D-ribose 5-phosphate aldolase. Biotechnol. J. 2006;1:537–48.

42. Metzner R, Hummel W, Wetterich F, König B, Gröger H. Integrated biocatalysis in multistep drug synthesis without intermediate isolation: A de novo approach toward a rosuvastatin key building block. Org. Process Res. Dev. 2015;19:635–8.

43. König B, Wetterich F, Gröger H, Metzner R. Process for enantioselective synthesis of 3-hydroxy-glutaric acid monoesters and use thereof. PCT Int. Appl. WO 2014/140006, Sept. 18, 2014;

44. Turner NJ, O’Reilly E. Biocatalytic retrosynthesis. Nat. Chem. Biol. 2013;9:285–8. 45. Challener CA. Going Green with Biocatalysis. Pharm. Technol. 2016;40:24–5. 46. Debergh, Pieterjan; Bilsen, Valentijn; Van de Velde E. Jobs and growth generated by industrial biotechnology in

Europe (prepared for: EuropaBio — the European Association for Bioindustries). 2016. 47. Li C-J, Trost BM. Green chemistry for chemical synthesis. Proc. Natl. Acad. Sci. 2008;105:13197–202. 48. Anastas P, Eghbali N. Green chemistry: Principles and practice. Chem. Soc. Rev. 2010;39:301–12. 49. DeVito SC. On the design of safer chemicals: a path forward. Green Chem. 2016;18:4332–47. 50. Challener CA. Biocatalysis versus chemocatalysis [Internet]. Manuf. Chem. Pharma. 2016. Available from:

https://www.manufacturingchemist.com/news/article_page/Biocatalysis_versus_chemocatalysis/11762251. Turner NJ. Controlling chirality. Curr. Opin. Biotechnol. 2003;14:401–6. 52. Wohlgemuth R. The locks and keys to industrial biotechnology. N. Biotechnol. 2009;25:204–13. 53. Wiester MJ, Ulmann PA, Mirkin CA. Enzyme mimics based upon supramolecular coordination chemistry.

Angew. Chemie Int. Ed. 2011;50:114–37. 54. Milner SE, Maguire AR. Recent trends in whole cell and isolated enzymes in enantioselective synthesis.

ARKIVOC. 2012;2012:321–82. 55. Arora B, Mukherjee J, Gupta MN. Enzyme promiscuity: using the dark side of enzyme specificity in white

biotechnology. Sustain. Chem. Process. 2014;2:25. 56. Steffen-Munsberg F, Vickers C, Thontowi A, Schätzle S, Meinhardt T, Svedendahl Humble M, et al. Revealing the

structural basis of promiscuous amine transaminase activity. ChemCatChem. 2013;5:154–7.

Page 26: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 33PDF page: 33PDF page: 33PDF page: 33

33Introduction |

157. Babtie A, Tokuriki N, Hollfelder F. What makes an enzyme promiscuous ? Curr. Opin. Chem. Biol. 2010;14:200–7. 58. Kazlauskas RJ. Enhancing catalytic promiscuity for biocatalysis. Curr. Opin. Chem. Biol. 2005;9:195–201. 59. Khersonsky O, Tawfik DS. Enzyme promiscuity: A mechanistic and evolutionary perspective. Annu. Rev.

Biochem. 2010;79:471–505. 60. Yadav VG. Unraveling the multispecificity and catalytic promiscuity of taxadiene monooxygenase. J. Mol. Catal.

B Enzym. 2014;110:154–64. 61. Baas B-J, Zandvoort E, Geertsema EM, Poelarends GJ. Recent advances in the study of enzyme promiscuity in the

tautomerase superfamily. ChemBioChem. 2013;14:917–26. 62. Bornscheuer UT, Kazlauskas RJ. Catalytic promiscuity in biocatalysis: using old enzymes to form new bonds and

follow new pathways. Angew. Chemie Int. Ed. 2004;43:6032–40. 63. Renata H, Wang ZJ, Arnold FH. Expanding the enzyme universe: Accessing non-natural reactions by mechanism-

guided directed evolution. Angew. Chemie - Int. Ed. 2015;54:3351–67. 64. Vick JE, Schmidt-Dannert C. Expanding the enzyme toolbox for biocatalysis. Angew. Chemie Int. Ed.

2011;50:7476–8. 65. Schrittwieser JH, Resch V, Sattler JH, Lienhart WD, Durchschein K, Winkler A, et al. Biocatalytic enantioselective

oxidative C-C coupling by aerobic C-H activation. Angew. Chemie - Int. Ed. 2011;50:1068–71. 66. Tufvesson P, Lima-Ramos J, Nordblad M, Woodley JM. Guidelines and cost analysis for catalyst production in

biocatalytic processes. Org. Process Res. Dev. 2011;15:266–74. 67. Dach R, Song JJ, Roschangar F, Samstag W, Senanayake CH. The eight criteria defining a good chemical

manufacturing process. Org. Process Res. Dev. 2012;16:1697–706. 68. Nestl BM, Hammer SC, Nebel BA, Hauer B. New generation of biocatalysts for organic synthesis. Angew. Chemie

- Int. Ed. 2014;53:3070–95. 69. Grumwald P. Enzyme toolboxes to obtain high product diversity. Ind. Biocatal. 2014;1232. 70. Lorenz P, Eck J. Metagenomics and industrial applications. Nat. Rev. Microbiol. 2005;3:510–6. 71. Burton SG, Cowan D a, Woodley JM. The search for the ideal biocatalyst. Nat. Biotechnol. 2002;20:37–45. 72. Guo F, Berglund P. Transaminase biocatalysis: optimization and application. Green Chem. 2017;13:2285–314. 73. Ansorge W. Next generation DNA sequencing (II): Techniques, applications. J. Next Gener. Seq. Appl. 2015;1:1–10. 74. Ansorge WJ. Next-generation DNA sequencing techniques. N. Biotechnol. 2009;25:195–203. 75. Sboner A, Mu XJ, Greenbaum D, Auerbach RK, Gerstein MBM, Metzker M, et al. The real cost of sequencing:

higher than you think! Genome Biol. 2011;12:125. 76. Galperin MY, Koonin E V. From complete genome sequence to “complete” understanding? Trends Biotechnol.

2010;28:398–406. 77. Temperton B, Giovannoni SJ. Metagenomics: microbial diversity through a scratched lens. Curr. Opin. Microbiol.

2012;15:605–12. 78. Muir P, Li S, Lou S, Wang D, Spakowicz DJ, Salichos L, et al. The real cost of sequencing: scaling computation to

keep pace with data generation. Genome Biol. 2016;17:53. 79. UniProt Consortium TU. UniProt: a hub for protein information. Nucleic Acids Res. 2015;43:D204-12. 80. Schnoes AM, Brown SD, Dodevski I, Babbitt PC. Annotation error in public databases: Misannotation of

molecular function in enzyme superfamilies. PLOS Comput. Biol. 2009;5:e1000605. 81. Schnoes AM, Ream DC, Thorman AW, Babbitt PC, Friedberg I. Biases in the experimental annotations of protein

function and their effect on our understanding of protein function space. PLOS Comput. Biol. 2013;9:e1003063. 82. Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, et al. A large-scale evaluation of

computational protein function prediction. Nat. Methods. 2013;10:221–7. 83. Gerlt J a., Allen KN, Almo SC, Armstrong RN, Babbitt PC, Cronan JE, et al. The enzyme function initiative.

Biochemistry. 2011;50:9950–62. 84. Jiang Y, Oron TR, Clark WT, Bankapur AR, D’Andrea D, Lepore R, et al. An expanded evaluation of protein

function prediction methods shows an improvement in accuracy. Genome Biol. 2016;17:184. 85. Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R. The European Bioinformatics Institute in

2016: Data growth and integration. Nucleic Acids Res. 2016;44:D20-6.

Page 27: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 34PDF page: 34PDF page: 34PDF page: 34

34 | Chapter 1

86. Broad Institute teams up with Intel to integrate genomic data from diverse sources and enhance genomic data analytic capabilities [Internet]. http://www.broadinstitute.org; 2016 [cited 2016 Dec 14]. p. 1. Available from: https://www.broadinstitute.org/news/broad-institute-teams-intel-integrate-genomic-data-diverse-sources-and-enhance-genomic-data

87. Iqbal HA, Feng Z, Brady SF. Biocatalysts and small molecule products from metagenomic studies. Curr. Opin. Chem. Biol. 2012;16:109–16.

88. He Xu J. Genomic data mining: An efficient way to find new and better enzymes. Enzym. Eng. 2012;1:1–4. 89. Hilterhaus L, Liese A, Kettling U, Antranikian G. Chapter 10: Transaminases - A Biosynthetic Route for Chiral

Amines. Appl. Biocatal. From Fundam. Sci. to Ind. Appl. 2016. p. 400. 90. Mayol O, David S, Darii E, Debard A, Mariage A, Pellouin V, et al. Asymmetric reductive amination by a wild-

type amine dehydrogenase from the thermophilic bacteria Petrotoga mobilis. Catal. Sci. Technol. 2016;6:7421–8. 91. Steffen-Munsberg F, Vickers C, Kohls H, Land H, Mallin H, Nobili A, et al. Bioinformatic analysis of a PLP-

dependent enzyme superfamily suitable for biocatalytic applications. Biotechnol. Adv. 2015;33:566–604. 92. Mehta PK, Hale TI, Christen P. Aminotransferases: demonstration of homology and division into evolutionary

subgroups. Eur. J. Biochem. 1993;214:549–61. 93. Schiroli D, Peracchi A. A subfamily of PLP-dependent enzymes specialized in handling terminal amines. Biochim.

Biophys. Acta - Proteins Proteomics. 2015;1854:1200–11. 94. Höhne M, Schätzle S, Jochens H, Robins K, Bornscheuer UT. Rational assignment of key motifs for function

guides in silico enzyme identification. Nat. Chem. Biol. Nature Publishing Group; 2010;6:807–13. 95. Schätzle S, Steffen-Munsberg F, Thontowi A, Höhne M, Robins K, Bornscheuer UT. Enzymatic asymmetric

synthesis of enantiomerically pure aliphatic, aromatic and arylaliphatic amines with (R)-selective amine transaminases. Adv. Synth. Catal. 2011;353:2439–45.

96. Park E-S, Kim M, Shin J-S. Molecular determinants for substrate selectivity of ω-transaminases. Appl. Microbiol. Biotechnol. 2012;93:2425–35.

97. Park E, Kim M, Shin JS. One-pot conversion of L -threonine into L -homoalanine: Biocatalytic production of an unnatural amino acid from a natural one. Adv. Synth. Catal. 2010;352:3391–8.

98. Trotter NS, Brimble MA, Harris PWR, Callis DJ, Sieg F. Synthesis and neuroprotective activity of analogues of glycyl-l-prolyl-l-glutamic acid (GPE) modified at the α-carboxylic acid. Bioorg. Med. Chem. 2005;13:501–17.

99. Sumi K, Inoue Y, Nishio M, Naito Y, Hosoya T, Suzuki M, et al. IOP-lowering effect of isoquinoline-5-sulfonamide compounds in ocular normotensive monkeys. Bioorganic Med. Chem. Lett. 2014;24:831–4.

100. Otsuka M, Masuda T, Haupt A, Ohno M, Shiraki T, Sugiura Y, et al. Synthetic studies on antitumor antibiotic, bleomycin. 27. Man-designed bleomycin with altered sequence specificity in DNA cleavage. J. Am. Chem. Soc. 1990;112:838–45.

101. Seo J-H, Hwang J-Y, Seo S-H, Kang H, Hwang B-Y, Kim B-G. Computational selection, identification and structural analysis of ω-aminotransferases with various substrate specificities from the genome sequence of Mesorhizobium loti MAFF303099. Biosci. Biotechnol. Biochem. 2012;76:1308–14.

102. Cerioli L, Planchestainer M, Cassidy J, Tessaro D, Paradisi F. Characterization of a novel amine transaminase from Halomonas elongata. J. Mol. Catal. B Enzym. 2015;120:141–50.

103. Wilding M, Walsh EFA, Susan J, Scott C. Identification of novel transaminases from a 12-aminododecanoic acid-metabolizing Pseudomonas strain. Microb. Biotechnol. 2015;8:665–72.

104. Wilding M, Peat TS, Newman J, Scott C. A β-alanine catabolism pathway containing a highly promiscuous ω-transaminase in the 12-aminododecanate-degrading Pseudomonas sp. strain AAC. Appl. Environ. Microbiol. 2016;82:3846–56.

105. Shin J-S, Kim B-G. Comparison of the ω-transaminases from different microorganisms and application to production of chiral amines. Biosci. Biotechnol. Biochem. 2001;65:1782–8.

106. Shin J-S, Yun H, Jang J-W, Park I, Kim B-G. Purification, characterization, and molecular cloning of a novel amine:pyruvate transaminase from Vibrio fluvialis JS17. Appl. Microbiol. Biotechnol. 2003;61:463–71.

Page 28: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 35PDF page: 35PDF page: 35PDF page: 35

35Introduction |

1107. Yun H, Hwang BY, Lee JH, Kim BG. Use of enrichment culture for directed evolution of the Vibrio

fluvialis JS17 ω-transaminase, which is resistant to product inhibition by aliphatic ketones. Appl. Environ. Microbiol. 2005;71:4220–4.

108. Pavkov-Keller T, Strohmeier GA, Diepold M, Peeters W, Smeets N, Schürmann M, et al. Discovery and structural characterisation of new fold type IV-transaminases exemplify the diversity of this enzyme fold. Sci. Rep. 2016;6:38183.

109. Iwasaki A, Yamada Y, Ikenaka Y, Hasegawa J. Microbial synthesis of (R)- and (S)-3,4-dimethoxyamphetamines through stereoselective transamination. Biotechnol. Lett. 2003;25:1843–6.

110. Iwasaki A, Matsumoto K, Hasegawa J, Yasohara Y. A novel transaminase, (R)-amine:pyruvate aminotransferase, from Arthrobacter sp. KNK168 (FERM BP-5228): purification, characterization, and gene cloning. Appl. Microbiol. Biotechnol. 2012;93:1563–73.

111. Guan L-J, Ohtsuka J, Okai M, Miyakawa T, Mase T, Zhi Y, et al. A new target region for changing the substrate specificity of amine transaminases. Sci. Rep. 2015;5:10753.

112. Knight R, Jansson J, Field D, Fierer N, Desai N, Fuhrman J a, et al. Unlocking the potential of metagenomics through replicated experimental design. Nat. Biotechnol. 2012;30:513–20.

113. Committee on Metagenomics: Challenges and Functional Applications NRC. The New Science of Metagenomics. Washington, D.C.; 2007.

114. Oulas A, Pavloudi C, Polymenakou P, Pavlopoulos GA, Papanikolaou N, Kotoulas G, et al. Metagenomics: tools and insights for analyzing next-generation sequencing data derived from biodiversity studies. Bioinform. Biol. Insights. 2015;9:75–88.

115. Wooley JC, Godzik A, Friedberg I. A primer on metagenomics. PLOS Comput. Biol. 2010;6:e1000667. 116. Mitchell A, Bucchini F, Cochrane G, Denise H, Hoopen P ten, Fraser M, et al. EBI metagenomics

in 2016 - an expanding and evolving resource for the analysis and archiving of metagenomic data. Nucleic Acids Res. 2016;44:D595–603.

117. Kim M, Lee K-H, Yoon S-W, Kim B-S, Chun J, Yi H. Analytical tools and databases for metagenomics in the next-generation sequencing era. Genomics Inform. 2013;11:102–13.

118. Ladoukakis E, Kolisis FN, Chatziioannou AA. Integrative workflows for metagenomic analysis. Front. cell Dev. Biol. 2014;2:70.

119. Dudhagara P, Bhavsar S, Bhagat C, Ghelani A, Bhatt S, Patel R. Web resources for metagenomics studies. Genomics, Proteomics Bioinforma. 2015;13:296–303.

120. Abbasian F, Lockington R, Megharaj M, Naidu R. The integration of sequencing and bioinformatics in metagenomics. Rev. Environ. Sci. Bio/Technology. 2015;14:357–83.

121. Teeling H, Glöckner FO. Current opportunities and challenges in microbial metagenome analysis-A bioinformatic perspective. Brief. Bioinform. 2012;13:728–42.

122. Desai N, Antonopoulos D, Gilbert JA, Glass EM, Meyer F. From genomics to metagenomics. Curr. Opin. Biotechnol. 2012;23:72–6.

123. Kodzius R, Gojobori T. Marine metagenomics as a source for bioprospecting. Mar. Genomics. 2015;24:21–30. 124. Kennedy J, Flemer B, Jackson SA, Lejon DPH, Morrissey JP, O’Gara F, et al. Marine metagenomics: new tools for

the study and exploitation of marine microbial metabolism. Mar. Drugs. 2010;8:608–28. 125. Venter JC. Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004;304:66–74. 126. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, et al. The Sorcerer II Global Ocean

Sampling expedition: Northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007;5:0398–431. 127. Karsenti E, Acinas SG, Bork P, Bowler C, De Vargas C, Raes J, et al. A holistic approach to marine eco-systems

biology. PLoS Biol. 2011;9:e1001177. 128. Sunagawa S, Coelho LP, Chaffron S, Kultima JR, Labadie K, Salazar G, et al. Structure and function of the global

ocean microbiome. Science. 2015;348. 129. Brum JR, Ignacio-Espinoza JC, Roux S, Doulcier G, Acinas SG, Alberti A, et al. Patterns and ecological drivers of

ocean viral communities. Science. 2015;348. 130. de Vargas C, Audic S, Henry N, Decelle J, Mahé F, Logares R, et al. Eukaryotic plankton diversity in the sunlit

ocean. Science. 2015;348.

Page 29: University of Groningen Genomics-based discovery and … · 2017-10-05 · from global industry leaders in chemical production, such as Evonik Industries ( a Degussa Corp.), DSM,

514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-Heberling514067-L-bw-HeberlingProcessed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017Processed on: 27-9-2017 PDF page: 36PDF page: 36PDF page: 36PDF page: 36

36 | Chapter 1

131. Villar E, Farrant GK, Follows M, Garczarek L, Speich S, Audic S, et al. Environmental characteristics of Agulhas rings affect interocean plankton transport. Science. 2015;348.

132. Lima-Mendez G, Faust K, Henry N, Decelle J, Colin S, Carcillo F, et al. Determinants of community structure in the global plankton interactome. Science. 2015;348.

133. Kopf A, Bicak M, Kottmann R, Schnetzer J, Kostadinov I, Lehmann K, et al. The ocean sampling day consortium. Gigascience. 2015;4:27.

134. Yilmaz P, Kottmann R, Field D, Knight R, Cole JR, Amaral-Zettler L, et al. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications. Nat. Biotechnol. 2011;29:415–20.

135. Lozupone CA, Knight R. Global patterns in bacterial diversity. Proc. Natl. Acad. Sci. U. S. A. 2007;104:11436–40. 136. Tuffin M, Anderson D, Heath C, Cowan DA. Metagenomic gene discovery: How far have we moved into novel

sequence space? Biotechnol. J. 2009;4:1671–83. 137. Colin P-Y, Kintses B, Gielen F, Miton CM, Fischer G, Mohamed MF, et al. Ultrahigh-throughput discovery of

promiscuous enzymes by picodroplet functional metagenomics. Nat. Commun. 2015;6:10008. 138. Mair P, Gielen F, Hollfelder F. Exploring sequence space in search of functional enzymes using microfluidic

droplets. Curr. Opin. Chem. Biol. 2017;37:137–44. 139. Ferrer M, Martínez-Martínez M, Bargiela R, Streit WR, Golyshina O V., Golyshin PN. Estimating the success of

enzyme bioprospecting through metagenomics: current status and future trends. Microb. Biotechnol. 2016;9:22–34.

140. Perret A, Lechaplais C, Tricot S, Perchat N, Vergne C, Pellé C, et al. A novel acyl-CoA beta-transaminase characterized from a metagenome. PLoS One. 2011;6:e22918.

141. Ferrandi EE, Previdi A, Bassanini I, Riva S, Peng X, Monti D. Novel thermostable amine transferases from hot spring metagenomes. Appl. Microbiol. Biotechnol. 2017;1–17.

142. Baud D, Jeffries JWE, Moody TS, Ward JM, Hailes HC. A metagenomics approach for new biocatalyst discovery: application to transaminases and the synthesis of allylic amines. Green Chem. 2017;19:1134–43.