Upload
chemical-abstracts-service
View
978
Download
0
Embed Size (px)
DESCRIPTION
Presented at the EMBL-EBI Industry Programme Workshop, Chemical Structure Resources, Hinxton, Cambridge, on December 1, 2010
Citation preview
CAS REGISTRYSM: The quality of comprehensiveness is not strained
Yvonne Pope(on behalf of Roger Schenck)Major Account Manager
EMBL-EBI Industry Programme Workshop Chemical Structure ResourcesHinxton, CambridgeDecember 1, 2010
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.2
Agenda• How has the CAS substance collection grown over the years?
• What are the sources of these substances?
• How is CAS responding to the challenge of the accelerating discovery of substances?
• How does CAS maintain the REGISTRY “gold standard”
of substance information?
• What can framework analysis tell us about the REGISTRY?
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.3
ACS Mission
To advance the broader chemistry enterprise and its practitioners for the benefit of Earth and its people.
CAS Mission
To be the world’s leader in meeting the needs of users of chemical and related scientific information.
CAS supports the mission of the ACS
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.4
CAS builds the world’s most authoritative and comprehensive databases
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.5
Growth in published chemistry literature has stayed strong in the last decade
Journal articles and patents from 2003-2010 in CAplus
0
200000
400000
600000
800000
1000000
1200000
1400000
2003 2004 2005 2006 2007 2008 2009 2010(Projected)Year
Tota
l Arti
cles
and
Pat
ents
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.6
CAS analyzes global chemical information, including publications from AsiaEach year, CAS covers • 10,000 serial journal titles and 61 patent authorities worldwide• 2,100 Asian serial journal titles• All major Asian patent authorities, including offices in
– People’s Republic of China – South Korea– Japan– India
Chinese, Japanese, and Korean language publications account for 33% of new CAplus database records
Year
Per
cent
of N
ew P
ublic
atio
ns in
Chi
nese
, Ja
pane
se, o
r Kor
ean
Lang
uage
s
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.7
Patenting of new chemical research has accelerated, especially patenting of Chinese chemical research
Chemistry Patents Published 1999‐present
0
20,000
40,000
60,000
80,000
100,000
120,000
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010
(Pro
j.)Docum
ents in
CAS Datab
ases
China Japan USA WIPO
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.8
CAS continues to uncover new small molecules in significant numbers
CAS REGISTRY Growth, 2003-2010
56.051.3
41.5
22.3 25.0 27.0 30.033.3
0
10
20
30
40
50
60
2003 2004 2005 2006 2007 2008 2009 2010
Mill
ions
Projection
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.9
What were the sources of these molecules in 2009?S
ubst
ance
Cou
nt
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.10
Increasingly, new chemical discoveries are being disclosed through patent activities
Perc
enta
ge o
f tot
al
*CA Database annual average is 23% patents
Percentage of New Compounds added to CAS REGISTRY from Patents
%
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.11
CHEMCATS continues to grow and remains a source of new small molecules
Number of Catalog Products and Unique Substances in the CAS CHEMCATS Database
Num
ber o
f Cat
alog
Pro
duct
s
Number of Unique Substances
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.12
Chemical substances from web-based sources provide a moderate addition to the small molecules in REGISTRY1.6M substances have been captured from Internet substance collections
050,000
100,000150,000200,000250,000300,000350,000400,000
ZINC
Chem
Spid
er
Chem
DB
Broa
d In
st
Ambi
nter
NIST
Mas
s Sp
ec
NCI 3
D
This chart illustrates some of the larger collections
Num
ber o
f Sub
stan
ces
from
Web
Sou
rces
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.13
The CAS databases reveal some small molecule trends• New substances still come mainly from journals and patents, but more
and more new substances are coming from the patent literature
• Unique substances are found in chemical catalogs and chemical libraries
• Internet sources provide some otherwise undisclosed substance information
• The Pacific Rim, especially China, is increasingly productive
• Chemists are very inventive –
more new chemical entities, not fewer, are being disclosed every year
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.14
What criteria must a substance meet to be included in the CAS REGISTRY?A substance must be
• Identified by CAS as coming from a reputable source, including but not limited to patents, journals, chemical catalogs, and substance collections on the Web
• Described in largely unambiguous terms
• Characterized by physical methods or described in a patent document example or claim
• Consistent with the laws of atomic covalent organization
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.15
For complex chemistry, CAS chemists classify substance information and verify graphical processes and structures
2. Create registration record1. Review reaction and structure
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.16
CAS chemists interpret when compounds are described in terms other than singular structures or names
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.17
Since 1997, patents have provided more new small molecules than journals have
CAS analysis of a typicalPCT application• 917 indexed compounds
from Examples and Claims• 576 new compounds added
to CAS REGISTRY• 613 single-step reactions• 5,394 multi-step reactions• 1,029 reaction participants• 2,119 substituent definitions
for Markush structures added to MARPAT®
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.18
CAS specialists in many fields of chemistry interpret author terminology to register compounds
Author identified this compound only as D4GlcUA-GlcNAc- (GlcUA-GlcNAc)5-PA
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.19
Patents regularly describe substances in ambiguous ways: In WO 2007089907, this “desired product”
is fully
characterized
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.20
Relatively new substance classes can be registered
Metal-organic frameworks show great potential for
capture of H2 or CO2 or in other gas separation
processes
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.21
CAS REGISTRY substances are enhanced with spectra, numeric properties, tags, and published sources Spectra
• More than 84M calculated NMR spectra (1H, 13C), with 17M added in 2010
• More than 700,000 experimental spectra (MS, NMR, IR, Raman), with another 190,000 newly acquired MS to be added in 2010
Numeric
• More than 4.2M experimental property values (m.p., b.p., optical rotary power, etc.)
• 9.9M data tags linked to indexed documents
• 2.9B calculated metrics (bio-concentration, Log P, Lipinski, etc.)
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.22
Chemical libraries are the second-largest source of new moleculesWhat are chemical libraries?
• Often a collection of drug-like small molecules to be used as leads in high-throughput screening or industrial manufacture
• Each substance has associated information stored in some kind of database, such as the – Chemical structure– Purity – Quantity – Physiochemical characteristics
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.23
Chemical catalogs with products “in stock”
are a growing source of new molecular descriptions
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.24
Other sources of new small molecules are national chemical regulatory inventories
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.25
CAS scientists―biologists, chemists, and information scientists―are
substance experts with advanced degrees
• Collectively they know 50 different languages
• They monitor the entire range of scientific literature that contains chemical information
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.26
CAS maintains the REGISTRY gold standard of quality substance information on a daily basis
A recent example
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.27
CAS maintains the REGISTRY gold standard of quality substance information on a daily basis
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.28
CAS maintains the REGISTRY gold standard of quality substance information on a daily basis
Substance WR319535 is the 1R, 4S enantiomer as drawn.
Substance WR319535 is the 1R, 4S enantiomer as drawn.
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.29
CAS maintains the REGISTRY gold standard of quality substance information on a daily basis
Substance WR319581 is the 1S, 4R enantiomer of WR319535
Substance WR319581 is the 1S, 4R enantiomer of WR319535
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.30
Framework analysis of REGISTRY can reveal the structural diversity of organic chemistry
Data from more than 24 million compounds was examined.
a Single-component, cyclic organic compounds registered as of the end of June 2007.
category number
compounds a 24,282,284
frameworks, graph level 836,708
frameworks, graph/node level 2,594,176
frameworks, graph/node/bond level 3,380,334
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.31
Hetero and graph frameworks both have very top-heavy distributions
A small percentage of frameworks occur in a large percentage of compounds
December 21, 2010
CAS is a division of the American Chemical Society. Copyright 2010 American Chemical Society. All rights reserved.32
The top 30 framework shapes occur in 35% of organic compounds
Half of all compounds are described by only 143 shapes.
Thank you for your attention.
Questions?