43
Ashutosh (Ash) Jogalekar (h1p://wavefunc:on.fieldofscience.com)

The Impact of Information Technology on Chemistry and Related Sciences

Embed Size (px)

Citation preview

Page 1: The Impact of Information Technology on Chemistry and Related Sciences

Ashutosh  (Ash)  Jogalekar  (h1p://wavefunc:on.fieldofscience.com)  

Page 2: The Impact of Information Technology on Chemistry and Related Sciences

About me•  Medicinal and computational chemist working in the biopharma

industry in Cambridge, MA.•  Blogger “The Curious Wavefunction”•  Contact:

- Blog: http://wavefunction.fieldofscience.com- Twitter: @curiouswavefn- Email: [email protected]

Page 3: The Impact of Information Technology on Chemistry and Related Sciences

Two kinds of scientific revolutions•  Idea-driven (Kuhn): physics (quantum theory), astronomy

(expanding universe), biology (evolution)

•  Tool-driven (Galison): engineering (transistor), biology (sequencing), astronomy (telescope).

Thomas  Kuhn  The Structure of

Scientific Revolutions(1962)

Image and Logic(1997)Peter  Galison  

Chemistry as an experimental science has benefited much more from tool-driven revolutions.

Page 4: The Impact of Information Technology on Chemistry and Related Sciences

Major tool-driven revolutions in chemistry

Page 5: The Impact of Information Technology on Chemistry and Related Sciences

Our latest (and greatest) tool

The Computer“I  think  it's  fair  to  say  that  personal  computers  have  become  the  most  empowering  tool  we've  ever  created.  They're  tools  of  communica=on,  they're  tools  of  crea=vity,  and  they  can  be  shaped  by  their  user.”  –  Bill  Gates  

Page 6: The Impact of Information Technology on Chemistry and Related Sciences

A brief history of computers in chemistry•  1950s: Driven by quantum chemistry and crystallography.•  Early efforts needed access to centralized machines, travel. Computations

enormously expensive: 1.5 years (1959) vs one day (2014).

Punched  card  (2014)  

Punched  card  (1960)  

UNIVAC  1:  1.5  yrs  to  calculate  12  molecules  Apple  MacBook  Air:  4  hours  for  same  calcula:on  

•  1958: Moore’s Law; doubling of transistors every two years.•  1970s: Use of computers started becoming routine. Still slow.•  1990s: Exponential developments in desktop computing, software, internet.•  2000s: Applications to biology, materials science become routine.  

Page 7: The Impact of Information Technology on Chemistry and Related Sciences

How have computers affected chemistry?•  Publications: ~25 major journals, also described in others.•  Companies: Schrodinger, OpenEye, CCG, Perkin-Elmer etc.•  Conferences: Gordon Conference, IAQMS.•  ACS Division of Computers in Chemistry.•  Awards: ACS Award for Computers in Chemistry.

Page 8: The Impact of Information Technology on Chemistry and Related Sciences

DataSimulation & Analysis

SociologyThe Future

Page 9: The Impact of Information Technology on Chemistry and Related Sciences

Data“It  is  a  capital  mistake  to  theorize  before  one  has  data.”    

-­‐-­‐  Arthur  Conan  Doyle  (“Sherlock  Holmes:  A  Scandal  in  Bohemia”)  

Page 10: The Impact of Information Technology on Chemistry and Related Sciences

Chemical data has grown exponentially

Growth  of  the  Cambridge  Structural  Database  (Image:  CSD)  

Why? Better tools to determine and record structures, properties.

Data repositories have enabled easy and instant global access to data.

 

•  Chemical Abstracts Service (CAS): 75 million registered substances.

•  Protein Data Bank (PDB): 97, 000 protein structures.

•  Cambridge Structural Database (CSD): 40, 000 added every year.

•  Scifinder, Google Scholar.  

Page 11: The Impact of Information Technology on Chemistry and Related Sciences

Standardization•  Chemical structure representation: drawing, manipulation.

Standard, multiple compressed file formats (eg. SMILES strings), error-free sharing of data.

•  E-Notebook: Standardized and safe record keeping, organization, analysis and visualization.

ChemDraw  SMILES  

Data is easier to compare, verify and reproduce.  

Page 12: The Impact of Information Technology on Chemistry and Related Sciences

What can we do with all this data?

Page 13: The Impact of Information Technology on Chemistry and Related Sciences

Visualization•  Instant visualization of data in various forms, user-friendly presentation;

eg. Spotfire, instant Jchem etc.•  Tools ranging from basic plots to advanced, on-the-fly statistical analysis

(eg. principal component analysis, regression) now available.•  Instant comprehension of complex biomolecular and inorganic structures

(eg. Pymol).

Much easier to make sense of data and property relationships.

Page 14: The Impact of Information Technology on Chemistry and Related Sciences

Software for chemical analysis•  What do you use software for? Analytical, spectroscopic,

purification?

•  Advanced techniques now more easily accessible.

•  Enormous savings in time and labor.

NMR   Crystallography   GC-­‐MS  

Ubiquitously affected everyday chemical research and the work of bench chemists.

Page 15: The Impact of Information Technology on Chemistry and Related Sciences

Using data intelligently: Cheminformatics•  Applying tools from informatics and computer science to extract

meaning from data.

•  Most common problems: Searching, finding trends, correlating chemical structures to various properties (descriptors).

If  only  all  correla:ons  were  this  good…  

Page 16: The Impact of Information Technology on Chemistry and Related Sciences

Case Study I: Similarity searching- Simplified representations (eg. bit strings) make searches of millions of molecules very fast

- Tanimoto similarity: Efficient, can be calculated for any property.- Drug side effects similarity prediction especially promising.

Tanimoto  similarity  between  molecules  J.  Med.  Chem.,  2010,  53,  4830   Drug  side  effects:  Nature  Biotechnology  2007,  25,  197  

Page 17: The Impact of Information Technology on Chemistry and Related Sciences

Case Study II: Diversity analysis•  Humans are pattern-seeking; often ignore diversity to focus on

similarity.

•  Maximizing diversity = Maximize probability of finding new molecules with novel properties.

•  Create molecular libraries of millions of compounds; screening collections for drug discovery, materials science etc.

Shape  diversity:  Nat.  Chem.  Biol.  2012,  8,  358   Voltage  vs  safety  of  Li-­‐ion  ba8eries:  Nat.  Mat.  2013,  12,  191  

Page 18: The Impact of Information Technology on Chemistry and Related Sciences

Simulation and Analysis“Nobody  believes  a  theore=cal  result,  except  the  person  who  calculated  it.  Everybody  believes  an  experimental  result,  except  

the  person  who  measured  it.”    -­‐-­‐  Paul  Labute  (Chemical  Compu=ng  Group)  

Page 19: The Impact of Information Technology on Chemistry and Related Sciences

How it happened

Michael Levitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)

Page 20: The Impact of Information Technology on Chemistry and Related Sciences

How it happened

Michael Levitt, Nobel Lecture (http://tinyurl.com/jvhsjvr)

Page 21: The Impact of Information Technology on Chemistry and Related Sciences

Major applications: QM and MM•  Quantum chemistry made computers; computers made quantum

chemistry.

•  Molecular mechanics: Classical mechanics applied to molecules.

•  QM equations cannot be solved exactly. Need approximations, iterative processing, and computing power.

•  Useful for calculating many properties (energies, dipole moments, reactivity).

Poten:al  energy  surface    for  chemical  reac:ons  

Fullerene  from  graphene:  Nat.  Chem.  2010,  2,  450  

Page 22: The Impact of Information Technology on Chemistry and Related Sciences

The 2013 Nobel Prize•  Tradeoff: Quantum mechanics (QM) - accurate but expensive.

Molecular mechanics (MM) – inaccurate but cheap.

•  QM/MM: Best of both worlds, multiscale.•  Applicable to large biological systems (proteins, DNA), extended

materials (zeolites, polymers).

Page 23: The Impact of Information Technology on Chemistry and Related Sciences

A Few Good Applications

Page 24: The Impact of Information Technology on Chemistry and Related Sciences

Molecular Dynamics•  Molecular Dynamics (MD): Newton’s laws of motion applied to

molecules, millions of steps; large amounts of data.•  Parallel processing, special-purpose machines allow MD to surpass

Moore’s Law.•  Simulations approaching biological timescales becoming routine.

Page 25: The Impact of Information Technology on Chemistry and Related Sciences

Knowledge-Based Protein Folding•  Knowledge-based protein structure prediction: Taking advantage of

existing information in PDB to predict folded structures.•  Use advanced statistical methods based on PDB data for assigning

probabilities to various solutions: Rosetta.•  Outstanding success in CASP (Critical Assessment of Protein

Structure).

"The  amazing  thing  is  that  Rose1a  had  31  points  and  the  next  best  group  had  8  points.  It  is  like  baseball  in  1927,  when  Babe  Ruth  hit  60  home  runs  and  the  runner  up  hit  14,  and  en:re  teams  didn't  hit  as  many  as  he  did”.  –  Peter  Kollman  (UCSF),  CASP  2000.  

Overlap  between  predicted  (red)  and  experimental  (green)  protein  structures  

Page 26: The Impact of Information Technology on Chemistry and Related Sciences

Protein design•  Protein design: Given a structure, find alternative sequences.•  Uses of alternative sequences: Enzymes catalyzing new reactions, new

small molecule-binding proteins (eg. for environmental cleanups).•  2003: First protein designed entirely de novo.•  2008: First enzyme catalyzing reaction with no natural precedent.•  As PDB grows, protein design becomes better.

Top7:  Protein  designed  from  scratch.  (Science,  2003,  302,  1364)  

Kemp  eliminase  enzyme  from  scratch  (Nature,  2008,  453,  190)  

Page 27: The Impact of Information Technology on Chemistry and Related Sciences

Structure-Based Drug Design•  Predict structure of drug bound to protein, suggest modifications to

improve properties.•  Combination of crystallography data and simulation.•  Outstanding success in some areas: eg. HIV protease inhibitors against

AIDS.

Impact  of  addi:on  of  HIV  protease  inhibitors  to  an:retroviral  therapy  among  AIDS  pa:ents  in  San  Francisco  (Am  J  Epidemiol.  152,  2,  2000)  

HIV  protease  bound  to  indinavir  

Katharine  Holloway  

Page 28: The Impact of Information Technology on Chemistry and Related Sciences

The wisdom of crowds (and clouds)•  FoldIt: Computer game to solve

protein folding and design problems.•  Led to HIV protein structure and

algorithm discovery.

PNAS,  2011,  108,  18949    

Comparison  of  Folding@Home  with  leading  supercomputers  

•  Distributed computing, Folding@Home: 100 million hours logged on Nintendo PS3, also enabled on cloud.

•  Used to study folding of proteins involved in cancer, Alzheimer’s disease; drug design.

Page 29: The Impact of Information Technology on Chemistry and Related Sciences

Exciting Future Areas

Page 30: The Impact of Information Technology on Chemistry and Related Sciences

New materials for the new millennium•  Based on Density Functional Theory (Nobel Prize 1998).•  Application of materials simulations and computational screening:

- Hydrogen storage (metal-organic frameworks)- Photovoltaics and solar cells- Alloys and new materials for batteries- Semiconductor design

Page 31: The Impact of Information Technology on Chemistry and Related Sciences

The age of biology•  Human Genome Project: Computers made it possible.•  Sequencing has greatly surpassed Moore’s Law. New techniques;

IonTorrent, Nanopore etc.•  Computational Biology and Bioinformatics: Comparing genomes,

predicting diseases, mapping ancestral differences.•  Aided by massive amounts of data: GenBank, Cancer Genome,

Ensembl, UniProt etc.•  Ripe territory for Big Data and new informatics techniques.

Page 32: The Impact of Information Technology on Chemistry and Related Sciences

Sociology“The  democra=za=on  of  informa=on  and  exper=se  that  springs  from  the  world  wide  web,  and  the  power  of  groups  of  mo=vated  amateurs  to  strike  out  on  their  own  in  technical  subjects,  is  weakening  the  authority  of  “experts”  in  society.”  -­‐-­‐  George  Whitesides.  

Page 33: The Impact of Information Technology on Chemistry and Related Sciences

The chemical blogosphere•  Chemistry blogs took off in 2002, initially focused on research, grad

school hijinks.•  Quickly diversified; peer review, job market, academic culture, publishing,

issues in industry, safety culture.Derek  Lowe:  drug  discovery,  industry  

Chemjobber:  The  Job  Market,  safety  culture,  industry  

Paul  Bracher:  academic  culture,  peer  review  

Ash  Jogalekar:  Nature  and  evolu:on  of  chemistry,  peer  review  

SeeArrOh:  Chemophobia,  food,  peer  review  

C&EN  official  blog  

James  Ashenhurst:  Org  Chem  tutoring  

Page 34: The Impact of Information Technology on Chemistry and Related Sciences

What are blogs good for?

Peer Review 2.0•  Timely, democratic review of

latest research.•  Interesting research highlighted

immediately.•  Critiqued by large audience.

•  Self-selecting.

•  Instrumental in spotting: fallacious research, self-plagiarism, dubious methodologies and fabrication.

Non-research contributions•  Lab safety (C&EN).

•  Academic culture (Chembark).•  The (sad) state of the job market

(Chemjobber).

•  Representation of women, minorities (Dr Rubidium).

•  Chemophobia, Industry (SeeArrOh, Derek Lowe).

Page 35: The Impact of Information Technology on Chemistry and Related Sciences

Peer Review 2.0: case study I•  First reported instance of comprehensive informal peer review of chemical

literature.•  2006: 37 step synthesis of hexacyclinol described in single-author paper in

Angew. Chem. by James LaClair (Xenobe Institute).•  Commenter on blog of Stanford grad student Dylan Stiles points out

inconsistencies in structure, others weigh in and point out many more.•  Other official papers refute data, suggest alternative structure.

•  Extensive discussion of problems with paper on multiple blogs, hundreds of comments. Paper retracted in 2012, long after problems were clear.

“The  proof  is  in  the  product”.  

Page 36: The Impact of Information Technology on Chemistry and Related Sciences

Peer Review 2.0: case study II•  April 2012: Paper in JACS on amino acid chirality and origin of life.•  Two issues: Bad scientific communication and charges of self-plagiarism.

•  Extensive similarities with two previous articles highlighted by Nature Chemistry editor Stuart Cantrill exclusively on Twitter.

•  Paper retracted in May 2012.•  Case illustrates peer-review operating entirely outside formal channels.

ACS  Press  Release:  “New  scien:fic  research  raises  the  possibility  that  advanced  versions  of  T.  rex  and  other  dinosaurs  —  monstrous  creatures  with  the  intelligence  and  cunning  of  humans  —  may  be  the  life  forms  that  evolved  on  other  planets  in  the  universe.”  

Photo  uploaded  by  Stuart  Cantrill  on  Twi1er  

Page 37: The Impact of Information Technology on Chemistry and Related Sciences

Chemists:  Embrace  open  access  •  Open-access, arXiv@Cornell

–  ASAP publishing

–  Open access–  Instant and free peer review by large community

•  Chemical community less open to sharing and arXiv-style publishing?

•  Cultural differences between various scientific communities (eg. particle physicists vs total synthesis chemists).

Page 38: The Impact of Information Technology on Chemistry and Related Sciences

The Future“Predic=on  is  difficult,  especially  about  the  future”  -­‐-­‐  Niels  Bohr.  

Page 39: The Impact of Information Technology on Chemistry and Related Sciences

Challenges and promises• Data:

- Bigger, better annotated databases with quality control.- Statistics becoming more useful and appreciated.

- Greater awareness of data mining tools among

experimental chemists.

•  Simulation:- Long molecular dynamics simulations approaching

realistic timescales. - Insights from network theory used in synthetic planning.

- First quantum chemistry calculation on quantum computer (2010).

- Better statistical validation of results, quality control.

Page 40: The Impact of Information Technology on Chemistry and Related Sciences

Challenges and promises•  Sociology

- More open access journals, more open access options.- Widespread publicity of research results.

- Discussion, criticism on blogs being taken seriously.

Retraction Watch.- Better use of multimedia (Twitter, Skype, podcasts).

- Cultural changes: - More cross-talk between chemists, statisticians and

computer scientists.

- More cross-talk between academia and industry.- Willingness to share data, code, experimental results.

- Willingness to present and discuss negative data.

Page 41: The Impact of Information Technology on Chemistry and Related Sciences

But…be afraid of the hype

Fortune  Magazine,  October  1981  

•  Jetpacks•  Artificial intelligence

•  Nuclear fusion•  Robot maids

In  30  years…  (1950-­‐2014)  

Page 42: The Impact of Information Technology on Chemistry and Related Sciences

Translating hype into reality•  Fearlessness; ability to jump across boundaries, question received wisdom.

•  Resilience; ability to bounce back from failure.•  Adaptability; ability to welcome change.

•  Teamwork; ability to collaborate and share.

•  Imagination; ability to think outside the box.

Page 43: The Impact of Information Technology on Chemistry and Related Sciences

The future of information technology in chemistry…

…is us

“The  best  way  to  predict  the  future  is  to  invent  it.”  –  Alan  Kay.   “Be  the  change  that  you  wish  

to  see  in  the  world.”  –  Gandhi.