Upload
ashutosh-jogalekar
View
181
Download
3
Tags:
Embed Size (px)
Citation preview
Ashutosh (Ash) Jogalekar (h1p://wavefunc:on.fieldofscience.com)
About me• Medicinal and computational chemist working in the biopharma
industry in Cambridge, MA.• Blogger “The Curious Wavefunction”• Contact:
- Blog: http://wavefunction.fieldofscience.com- Twitter: @curiouswavefn- Email: [email protected]
Two kinds of scientific revolutions• Idea-driven (Kuhn): physics (quantum theory), astronomy
(expanding universe), biology (evolution)
• Tool-driven (Galison): engineering (transistor), biology (sequencing), astronomy (telescope).
Thomas Kuhn The Structure of
Scientific Revolutions(1962)
Image and Logic(1997)Peter Galison
Chemistry as an experimental science has benefited much more from tool-driven revolutions.
Major tool-driven revolutions in chemistry
Our latest (and greatest) tool
The Computer“I think it's fair to say that personal computers have become the most empowering tool we've ever created. They're tools of communica=on, they're tools of crea=vity, and they can be shaped by their user.” – Bill Gates
A brief history of computers in chemistry• 1950s: Driven by quantum chemistry and crystallography.• Early efforts needed access to centralized machines, travel. Computations
enormously expensive: 1.5 years (1959) vs one day (2014).
Punched card (2014)
Punched card (1960)
UNIVAC 1: 1.5 yrs to calculate 12 molecules Apple MacBook Air: 4 hours for same calcula:on
• 1958: Moore’s Law; doubling of transistors every two years.• 1970s: Use of computers started becoming routine. Still slow.• 1990s: Exponential developments in desktop computing, software, internet.• 2000s: Applications to biology, materials science become routine.
How have computers affected chemistry?• Publications: ~25 major journals, also described in others.• Companies: Schrodinger, OpenEye, CCG, Perkin-Elmer etc.• Conferences: Gordon Conference, IAQMS.• ACS Division of Computers in Chemistry.• Awards: ACS Award for Computers in Chemistry.
DataSimulation & Analysis
SociologyThe Future
Data“It is a capital mistake to theorize before one has data.”
-‐-‐ Arthur Conan Doyle (“Sherlock Holmes: A Scandal in Bohemia”)
Chemical data has grown exponentially
Growth of the Cambridge Structural Database (Image: CSD)
Why? Better tools to determine and record structures, properties.
Data repositories have enabled easy and instant global access to data.
• Chemical Abstracts Service (CAS): 75 million registered substances.
• Protein Data Bank (PDB): 97, 000 protein structures.
• Cambridge Structural Database (CSD): 40, 000 added every year.
• Scifinder, Google Scholar.
Standardization• Chemical structure representation: drawing, manipulation.
Standard, multiple compressed file formats (eg. SMILES strings), error-free sharing of data.
• E-Notebook: Standardized and safe record keeping, organization, analysis and visualization.
ChemDraw SMILES
Data is easier to compare, verify and reproduce.
What can we do with all this data?
Visualization• Instant visualization of data in various forms, user-friendly presentation;
eg. Spotfire, instant Jchem etc.• Tools ranging from basic plots to advanced, on-the-fly statistical analysis
(eg. principal component analysis, regression) now available.• Instant comprehension of complex biomolecular and inorganic structures
(eg. Pymol).
Much easier to make sense of data and property relationships.
Software for chemical analysis• What do you use software for? Analytical, spectroscopic,
purification?
• Advanced techniques now more easily accessible.
• Enormous savings in time and labor.
NMR Crystallography GC-‐MS
Ubiquitously affected everyday chemical research and the work of bench chemists.
Using data intelligently: Cheminformatics• Applying tools from informatics and computer science to extract
meaning from data.
• Most common problems: Searching, finding trends, correlating chemical structures to various properties (descriptors).
If only all correla:ons were this good…
Case Study I: Similarity searching- Simplified representations (eg. bit strings) make searches of millions of molecules very fast
- Tanimoto similarity: Efficient, can be calculated for any property.- Drug side effects similarity prediction especially promising.
Tanimoto similarity between molecules J. Med. Chem., 2010, 53, 4830 Drug side effects: Nature Biotechnology 2007, 25, 197
Case Study II: Diversity analysis• Humans are pattern-seeking; often ignore diversity to focus on
similarity.
• Maximizing diversity = Maximize probability of finding new molecules with novel properties.
• Create molecular libraries of millions of compounds; screening collections for drug discovery, materials science etc.
Shape diversity: Nat. Chem. Biol. 2012, 8, 358 Voltage vs safety of Li-‐ion ba8eries: Nat. Mat. 2013, 12, 191
Simulation and Analysis“Nobody believes a theore=cal result, except the person who calculated it. Everybody believes an experimental result, except
the person who measured it.” -‐-‐ Paul Labute (Chemical Compu=ng Group)
How it happened
Michael Levitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)
How it happened
Michael Levitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)
Major applications: QM and MM• Quantum chemistry made computers; computers made quantum
chemistry.
• Molecular mechanics: Classical mechanics applied to molecules.
• QM equations cannot be solved exactly. Need approximations, iterative processing, and computing power.
• Useful for calculating many properties (energies, dipole moments, reactivity).
Poten:al energy surface for chemical reac:ons
Fullerene from graphene: Nat. Chem. 2010, 2, 450
The 2013 Nobel Prize• Tradeoff: Quantum mechanics (QM) - accurate but expensive.
Molecular mechanics (MM) – inaccurate but cheap.
• QM/MM: Best of both worlds, multiscale.• Applicable to large biological systems (proteins, DNA), extended
materials (zeolites, polymers).
A Few Good Applications
Molecular Dynamics• Molecular Dynamics (MD): Newton’s laws of motion applied to
molecules, millions of steps; large amounts of data.• Parallel processing, special-purpose machines allow MD to surpass
Moore’s Law.• Simulations approaching biological timescales becoming routine.
Knowledge-Based Protein Folding• Knowledge-based protein structure prediction: Taking advantage of
existing information in PDB to predict folded structures.• Use advanced statistical methods based on PDB data for assigning
probabilities to various solutions: Rosetta.• Outstanding success in CASP (Critical Assessment of Protein
Structure).
"The amazing thing is that Rose1a had 31 points and the next best group had 8 points. It is like baseball in 1927, when Babe Ruth hit 60 home runs and the runner up hit 14, and en:re teams didn't hit as many as he did”. – Peter Kollman (UCSF), CASP 2000.
Overlap between predicted (red) and experimental (green) protein structures
Protein design• Protein design: Given a structure, find alternative sequences.• Uses of alternative sequences: Enzymes catalyzing new reactions, new
small molecule-binding proteins (eg. for environmental cleanups).• 2003: First protein designed entirely de novo.• 2008: First enzyme catalyzing reaction with no natural precedent.• As PDB grows, protein design becomes better.
Top7: Protein designed from scratch. (Science, 2003, 302, 1364)
Kemp eliminase enzyme from scratch (Nature, 2008, 453, 190)
Structure-Based Drug Design• Predict structure of drug bound to protein, suggest modifications to
improve properties.• Combination of crystallography data and simulation.• Outstanding success in some areas: eg. HIV protease inhibitors against
AIDS.
Impact of addi:on of HIV protease inhibitors to an:retroviral therapy among AIDS pa:ents in San Francisco (Am J Epidemiol. 152, 2, 2000)
HIV protease bound to indinavir
Katharine Holloway
The wisdom of crowds (and clouds)• FoldIt: Computer game to solve
protein folding and design problems.• Led to HIV protein structure and
algorithm discovery.
PNAS, 2011, 108, 18949
Comparison of Folding@Home with leading supercomputers
• Distributed computing, Folding@Home: 100 million hours logged on Nintendo PS3, also enabled on cloud.
• Used to study folding of proteins involved in cancer, Alzheimer’s disease; drug design.
Exciting Future Areas
New materials for the new millennium• Based on Density Functional Theory (Nobel Prize 1998).• Application of materials simulations and computational screening:
- Hydrogen storage (metal-organic frameworks)- Photovoltaics and solar cells- Alloys and new materials for batteries- Semiconductor design
The age of biology• Human Genome Project: Computers made it possible.• Sequencing has greatly surpassed Moore’s Law. New techniques;
IonTorrent, Nanopore etc.• Computational Biology and Bioinformatics: Comparing genomes,
predicting diseases, mapping ancestral differences.• Aided by massive amounts of data: GenBank, Cancer Genome,
Ensembl, UniProt etc.• Ripe territory for Big Data and new informatics techniques.
Sociology“The democra=za=on of informa=on and exper=se that springs from the world wide web, and the power of groups of mo=vated amateurs to strike out on their own in technical subjects, is weakening the authority of “experts” in society.” -‐-‐ George Whitesides.
The chemical blogosphere• Chemistry blogs took off in 2002, initially focused on research, grad
school hijinks.• Quickly diversified; peer review, job market, academic culture, publishing,
issues in industry, safety culture.Derek Lowe: drug discovery, industry
Chemjobber: The Job Market, safety culture, industry
Paul Bracher: academic culture, peer review
Ash Jogalekar: Nature and evolu:on of chemistry, peer review
SeeArrOh: Chemophobia, food, peer review
C&EN official blog
James Ashenhurst: Org Chem tutoring
What are blogs good for?
Peer Review 2.0• Timely, democratic review of
latest research.• Interesting research highlighted
immediately.• Critiqued by large audience.
• Self-selecting.
• Instrumental in spotting: fallacious research, self-plagiarism, dubious methodologies and fabrication.
Non-research contributions• Lab safety (C&EN).
• Academic culture (Chembark).• The (sad) state of the job market
(Chemjobber).
• Representation of women, minorities (Dr Rubidium).
• Chemophobia, Industry (SeeArrOh, Derek Lowe).
Peer Review 2.0: case study I• First reported instance of comprehensive informal peer review of chemical
literature.• 2006: 37 step synthesis of hexacyclinol described in single-author paper in
Angew. Chem. by James LaClair (Xenobe Institute).• Commenter on blog of Stanford grad student Dylan Stiles points out
inconsistencies in structure, others weigh in and point out many more.• Other official papers refute data, suggest alternative structure.
• Extensive discussion of problems with paper on multiple blogs, hundreds of comments. Paper retracted in 2012, long after problems were clear.
“The proof is in the product”.
Peer Review 2.0: case study II• April 2012: Paper in JACS on amino acid chirality and origin of life.• Two issues: Bad scientific communication and charges of self-plagiarism.
• Extensive similarities with two previous articles highlighted by Nature Chemistry editor Stuart Cantrill exclusively on Twitter.
• Paper retracted in May 2012.• Case illustrates peer-review operating entirely outside formal channels.
ACS Press Release: “New scien:fic research raises the possibility that advanced versions of T. rex and other dinosaurs — monstrous creatures with the intelligence and cunning of humans — may be the life forms that evolved on other planets in the universe.”
Photo uploaded by Stuart Cantrill on Twi1er
Chemists: Embrace open access • Open-access, arXiv@Cornell
– ASAP publishing
– Open access– Instant and free peer review by large community
• Chemical community less open to sharing and arXiv-style publishing?
• Cultural differences between various scientific communities (eg. particle physicists vs total synthesis chemists).
The Future“Predic=on is difficult, especially about the future” -‐-‐ Niels Bohr.
Challenges and promises• Data:
- Bigger, better annotated databases with quality control.- Statistics becoming more useful and appreciated.
- Greater awareness of data mining tools among
experimental chemists.
• Simulation:- Long molecular dynamics simulations approaching
realistic timescales. - Insights from network theory used in synthetic planning.
- First quantum chemistry calculation on quantum computer (2010).
- Better statistical validation of results, quality control.
Challenges and promises• Sociology
- More open access journals, more open access options.- Widespread publicity of research results.
- Discussion, criticism on blogs being taken seriously.
Retraction Watch.- Better use of multimedia (Twitter, Skype, podcasts).
- Cultural changes: - More cross-talk between chemists, statisticians and
computer scientists.
- More cross-talk between academia and industry.- Willingness to share data, code, experimental results.
- Willingness to present and discuss negative data.
But…be afraid of the hype
Fortune Magazine, October 1981
• Jetpacks• Artificial intelligence
• Nuclear fusion• Robot maids
In 30 years… (1950-‐2014)
Translating hype into reality• Fearlessness; ability to jump across boundaries, question received wisdom.
• Resilience; ability to bounce back from failure.• Adaptability; ability to welcome change.
• Teamwork; ability to collaborate and share.
• Imagination; ability to think outside the box.
The future of information technology in chemistry…
…is us
“The best way to predict the future is to invent it.” – Alan Kay. “Be the change that you wish
to see in the world.” – Gandhi.