33
CAS REGISTRY SM : The quality of comprehensiveness is not strained Yvonne Pope (on behalf of Roger Schenck) Major Account Manager EMBL-EBI Industry Programme Workshop Chemical Structure Resources Hinxton, Cambridge December 1, 2010

CAS REGISTRY: The quality of comprehensiveness is not strained

Embed Size (px)

DESCRIPTION

Presented at the EMBL-EBI Industry Programme Workshop, Chemical Structure Resources, Hinxton, Cambridge, on December 1, 2010

Citation preview

Page 1: CAS REGISTRY: The quality of comprehensiveness is not strained

CAS REGISTRYSM: The quality of comprehensiveness is not strained

Yvonne Pope(on behalf of Roger Schenck)Major Account Manager

EMBL-EBI Industry Programme Workshop Chemical Structure ResourcesHinxton, CambridgeDecember 1, 2010

Page 2: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.2

Agenda• How has the CAS substance collection grown over the years?

• What are the sources of these substances?

• How is CAS responding to the challenge of the accelerating discovery of substances?

• How does CAS maintain the REGISTRY “gold standard”

of substance information?

• What can framework analysis tell us about the REGISTRY?

Page 3: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.3

ACS Mission

To advance the broader chemistry enterprise and its practitioners for the benefit of Earth and its people.

CAS Mission

To be the world’s leader in meeting the needs of users of chemical and related scientific information.

CAS supports the mission of the ACS

Page 4: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.4

CAS builds the world’s most authoritative and comprehensive databases

Page 5: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.5

Growth in published chemistry literature has stayed strong in the last decade

Journal articles and patents from 2003-2010 in CAplus

0

200000

400000

600000

800000

1000000

1200000

1400000

2003 2004 2005 2006 2007 2008 2009 2010(Projected)Year

Tota

l Arti

cles

and

Pat

ents

Page 6: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.6

CAS analyzes global chemical information, including publications from AsiaEach year, CAS covers • 10,000 serial journal titles and 61 patent authorities worldwide• 2,100 Asian serial journal titles• All major Asian patent authorities, including offices in

– People’s Republic of China – South Korea– Japan– India

Chinese, Japanese, and Korean language publications account for 33% of new CAplus database records

Year

Per

cent

of N

ew P

ublic

atio

ns in

Chi

nese

, Ja

pane

se, o

r Kor

ean

Lang

uage

s

Page 7: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.7

Patenting of new chemical research has accelerated, especially patenting of Chinese chemical research

Chemistry Patents Published 1999‐present

0

20,000

40,000

60,000

80,000

100,000

120,000

1999

2000

2001

2002

2003

2004

2005

2006

2007

2008

2009

2010

 (Pro

j.)Docum

ents in

 CAS Datab

ases

China Japan USA WIPO

Page 8: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.8

CAS continues to uncover new small molecules in significant numbers

CAS REGISTRY Growth, 2003-2010

56.051.3

41.5

22.3 25.0 27.0 30.033.3

0

10

20

30

40

50

60

2003 2004 2005 2006 2007 2008 2009 2010

Mill

ions

Projection

Page 9: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.9

What were the sources of these molecules in 2009?S

ubst

ance

Cou

nt

Page 10: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.10

Increasingly, new chemical discoveries are being disclosed through patent activities

Perc

enta

ge o

f tot

al

*CA Database annual average is 23% patents

Percentage of New Compounds added to CAS REGISTRY from Patents

%

Page 11: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.11

CHEMCATS continues to grow and remains a source of new small molecules

Number of Catalog Products and Unique Substances in the CAS CHEMCATS Database

Num

ber o

f Cat

alog

Pro

duct

s

Number of Unique Substances

Page 12: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.12

Chemical substances from web-based sources provide a moderate addition to the small molecules in REGISTRY1.6M substances have been captured from Internet substance collections

050,000

100,000150,000200,000250,000300,000350,000400,000

ZINC

Chem

Spid

er

Chem

DB

Broa

d In

st

Ambi

nter

NIST

Mas

s Sp

ec

NCI 3

D

This chart illustrates some of the larger collections

Num

ber o

f Sub

stan

ces

from

Web

Sou

rces

Page 13: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.13

The CAS databases reveal some small molecule trends• New substances still come mainly from journals and patents, but more

and more new substances are coming from the patent literature

• Unique substances are found in chemical catalogs and chemical libraries

• Internet sources provide some otherwise undisclosed substance information

• The Pacific Rim, especially China, is increasingly productive

• Chemists are very inventive –

more new chemical entities, not fewer, are being disclosed every year

Page 14: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.14

What criteria must a substance meet to be included in the CAS REGISTRY?A substance must be

• Identified by CAS as coming from a reputable source, including but not limited to patents, journals, chemical catalogs, and substance collections on the Web

• Described in largely unambiguous terms

• Characterized by physical methods or described in a patent document example or claim

• Consistent with the laws of atomic covalent organization

Page 15: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.15

For complex chemistry, CAS chemists classify substance information and verify graphical processes and structures

2. Create registration record1. Review reaction and structure

Page 16: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.16

CAS chemists interpret when compounds are described in terms other than singular structures or names

Page 17: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.17

Since 1997, patents have provided more new small molecules than journals have

CAS analysis of a typicalPCT application• 917 indexed compounds

from Examples and Claims• 576 new compounds added

to CAS REGISTRY• 613 single-step reactions• 5,394 multi-step reactions• 1,029 reaction participants• 2,119 substituent definitions

for Markush structures added to MARPAT®

Page 18: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.18

CAS specialists in many fields of chemistry interpret author terminology to register compounds

Author identified this compound only as D4GlcUA-GlcNAc- (GlcUA-GlcNAc)5-PA

Page 19: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.19

Patents regularly describe substances in ambiguous ways: In WO 2007089907, this “desired product”

is fully

characterized

Page 20: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.20

Relatively new substance classes can be registered

Metal-organic frameworks show great potential for

capture of H2 or CO2 or in other gas separation

processes

Page 21: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.21

CAS REGISTRY substances are enhanced with spectra, numeric properties, tags, and published sources Spectra

• More than 84M calculated NMR spectra (1H, 13C), with 17M added in 2010

• More than 700,000 experimental spectra (MS, NMR, IR, Raman), with another 190,000 newly acquired MS to be added in 2010

Numeric

• More than 4.2M experimental property values (m.p., b.p., optical rotary power, etc.)

• 9.9M data tags linked to indexed documents

• 2.9B calculated metrics (bio-concentration, Log P, Lipinski, etc.)

Page 22: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.22

Chemical libraries are the second-largest source of new moleculesWhat are chemical libraries?

• Often a collection of drug-like small molecules to be used as leads in high-throughput screening or industrial manufacture

• Each substance has associated information stored in some kind of database, such as the – Chemical structure– Purity – Quantity – Physiochemical characteristics

Page 23: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.23

Chemical catalogs with products “in stock”

are a growing source of new molecular descriptions

Page 24: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.24

Other sources of new small molecules are national chemical regulatory inventories

Page 25: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.25

CAS scientists―biologists, chemists, and information scientists―are

substance experts with advanced degrees

• Collectively they know 50 different languages

• They monitor the entire range of scientific literature that contains chemical information

Page 26: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.26

CAS maintains the REGISTRY gold standard of quality substance information on a daily basis

A recent example

Page 27: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.27

CAS maintains the REGISTRY gold standard of quality substance information on a daily basis

Page 28: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.28

CAS maintains the REGISTRY gold standard of quality substance information on a daily basis

Substance WR319535 is the 1R, 4S enantiomer as drawn.

Substance WR319535 is the 1R, 4S enantiomer as drawn.

Page 29: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.29

CAS maintains the REGISTRY gold standard of quality substance information on a daily basis

Substance WR319581 is the 1S, 4R enantiomer of WR319535

Substance WR319581 is the 1S, 4R enantiomer of WR319535

Page 30: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.30

Framework analysis of REGISTRY can reveal the structural diversity of organic chemistry

Data from more than 24 million compounds was examined.

a Single-component, cyclic organic compounds registered as of the end of June 2007.

category number

compounds a 24,282,284

frameworks, graph level 836,708

frameworks, graph/node level 2,594,176

frameworks, graph/node/bond level 3,380,334

Page 31: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.31

Hetero and graph frameworks both have very top-heavy distributions

A small percentage of frameworks occur in a large percentage of compounds

Page 32: CAS REGISTRY: The quality of comprehensiveness is not strained

December 21, 2010

CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.32

The top 30 framework shapes occur in 35% of organic compounds

Half of all compounds are described by only 143 shapes.

Page 33: CAS REGISTRY: The quality of comprehensiveness is not strained

Thank you for your attention.

Questions?