12

Click here to load reader

Collection 4

Embed Size (px)

Citation preview

Page 1: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 1

Chemical Information Sources/Chemical Nameand Formula SearchesIntroduction

Chemical Abstracts Service's Registry File is the largest single collection of data that can be used to identify achemical substance. Each unique chemical substance is assigned a Registry Number, which CAS uses in preferenceto a chemical name to index documents in the CA or CAPlus Files. Much of the descriptive information about acompound (its molecular formula, variant names for the substance, as well as much detailed information about itsmakeup, including the structure) is found in the Registry File. Furthermore, in recent years, actual data (experimentalor calculated data) have been added to the file, making it much more like a huge handbook. The Registry Numberserves as the unique identifier of the record. The Registry File includes a number of search techniques that are builton the chemical name and other fields included in the Registry File records.In the printed CA, there is no Registry Number Index. Instead, the "Chemical Substance Index" ("CSI") links thepreferred CA Index Name for the substance to the documents that have information on it. However, names forclasses of compounds are indexed in the "General Subject Index". Also, in the printed Chemical Abstracts,supplemental access to the printed product is found in the "Formula Indexes". The "CSI" has dictated much of theindexing policy for supplemental terms used to describe the role of the chemical substance in the document. Thebroad indexing terms found in the CAS Roles in the CA File and the Standard Subject Divisions in the printed CSIcan be of considerable use in retrieving the precise information of interest about a compound on which much hasbeen written.Molecular formula searching in CA is based on the Hill Formula system (described below). The concept of thedot-disconnected formula for salts, addition compounds, and mixtures is important in both the database and theprinted "Molecular Formula Index" to Chemical Abstracts.A search for information on a single chemical substance may start with the name of the substance, its molecularformula, or various other words or codes that can be associated with it. (See: Locating All CA File References Citinga Chemical Substance [1] and CAS Registry: Finding CAS Registry Numbers [2]) In this chapter, we will encountervarious coding systems that have been applied to the retrieval of chemical substances from both printed andcomputer-based sources. The main database to search for such information is the CAS Registry File, which now hasin excess of 70,000,000 records for chemical substances (including biosequences). Many of the entries in theRegistry File are for sequences of biological macromolecules. The bulk of the remaining small molecule entries arefor organic compounds, either simple organics (esters, steroids, heterocycles, stereoisomers, etc.) or such things asmixtures, polymers, and organic salts. Just over 10% of the file is comprised of inorganic compounds.

Chemical Nomenclature

Mastery of formal chemical nomenclature is a skill possessed by few chemists nowadays. The International Union ofPure and Applied Chemistry (IUPAC) determines the recommended practices for assigning official names tochemical substances. With a knowledge of the IUPAC nomenclature rules, a chemist can visualize and depict thecorrect structure of even complex chemical compounds. However, creating such a name from scratch is anothermatter. An excellent Web guide to chemical nomenclature is Charles H. Davis's Chemical Nomenclature Lite. [3] Foxand Powell's classic work, Nomenclature of Organic Compounds: Principles and Practice, appeared in a 2nd editionin 2001. For other types of substances (and nomenclature in specific areas of chemistry), see the so-called "colorbooks" of the IUPAC. [4] The Enzyme Commission assigns EC numbers [5] for enzymes that are very useful incomputer searching.Until late 2006, Chemical Abstracts Service (CAS) made major changes to their chemical nomenclature policiesonly at the boundaries of the five-year collective index periods. They have now abandoned that policy, preferring to

Page 2: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 2

make changes to CA Index Names as needed to ensure that the CAS Registry System has the most current, usableinformation. The names will now conform more closely with the names that chemists typically use. Among thenomenclature improvements to be implemented are more uniformly cited locants, reduction in the number ofstereoparent names, and the elimination of nearly 3,000 obscure stereoparents. Unexpressed amides also will bedisregarded.

Substance Searching Using Chemical Abstracts Service Registry Numbers

One very effective method of retrieving chemical substance information from a reference source is to utilize theChemical Abstracts Service REGISTRY NUMBER for the substance. The Registry Number is a unique numberassigned to each substance indexed by CAS. The CAS RN is a number of the format Y-XX-X, where Y can be fromtwo to six digits, and X is one digit, for example, 494-12-2. (Recently, the RN has been expanded to 10 digits.) TheRegistry Number is found in many databases [6] and increasingly as an index to printed reference works. TheRegistry File started in 1965 with new substances that were encounered from that date forward. Older substanceshave now been entered into the system for records that date from 1907-65. Now that CAS has finished this task, allcompounds discovered post-1907 should be in the database. For compounds discovered prior to 1907, it is wise tosearch the Beilstein and Gmelin databases on Reaxys [7], which have coverage back to the 18th century.The Registry Number appears in the indexing of CA and CAPlus File records in preference to the formal name of thecompound. In volume 106 of Chemical Abstracts is found abstract number 195826 for the following article:Grieco, Paul A.; Bahsas, Ali. Reactions of allylstannanes with in situ generated immonium salts in protic solvent: afacile aminomethano destannylation process. J. Org. Chem. (1987), 52(7), 1378-80. CODEN: JOCEAHISSN:0022-3263. CAN 106:195826 AN 1987:195826 CAPLUSThe indexing below includes part of the Registry Numbers for compounds discussed in the article.

SciFinder Example of Registry Number Indexing(Reproduced with permission of CAS, a division of the American Chemical Society.)

CAS Registry Numbers are assigned to organic and inorganic substances, metals, alloys, minerals, polymers,coordination compounds, elements, isotopes, peptides, enzymes, biomolecular sequences, and nuclear particles.However, the mere mention of a compound in a document is not enough to insure that the indexers at ChemicalAbstracts Service will tie a CAS RN to the record for that document. To get an entry in the CA indexes, there mustbe something new reported about the substance. It may be a new method of preparation, a new source for thesubstance, a new reaction, a new kinetic or mechanistic study, new chemical or physical properties, a new method ofanalysis, a new use or application, or a new biological effect. Chemical reactants and the resulting products areroutinely indexed, but reagents are not indexed unless there is a new preparation of the reagent itself or a novel useof a standard reagent.In 2008, CAS entered into a cooperative venture with Wikipedia to provide CAS Registry Numbers for chemicalsubstances of widespread general interest. The result is Common Chemistry [8], a Web resource whereapproximately 7,900 substances can be searched without cost by chemical name or CAS Registry Number. Enteringthe CAS RN for Isatin, 91-56-5, brings up a record with the CAS Preferred Name, 1H-Indole-2,3-dione, 18 othernames for Isatin, the molecular formula, a 2D structural drawing, and the link to the Wikpedia article on Isatin.

Page 3: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 3

The "Index Guide" and Chemical Name Searching in the Printed Chemical Substance Indexes

Just as the "Index Guide" controls the vocabulary that must be used in the Chemical Abstracts "General SubjectIndex," it also provides the correct name to use in searching the CA "Chemical Substance Index". For example, acheck of the "Index Guide" for "Flavan" finds the following:Flavan See 2H-1-Benzopyran, 3,4-dihydro-2-phenyl- [494-12-2]In alphabetizing chemical substance names in the index, locant numbers, stereo designators, etc. are ignored. Thus,we must look in the "B" section of the printed CA "Chemical Substance Index" for "Benzopyran" in order to findindex entries on the compound. Note that the CAS Index Name for Flavan is inverted, with the name of the so-calledHEADING PARENT listed first. This keeps structurally related compounds in the same area of the index. The basicHeading Parent compound is listed first, followed by derivatives and other structurally related compounds. Theentries in the "Chemical Substance Index" include the TEXT MODIFICATIONS (other subject words) that givemore information about the documents that are indexed.From 2007, CAS no longer categorizes information by collective index periods, so the new CA index names nolonger have a "CI" label, e.g., (6CI, 7CI, 8CI, 9CI), etc.

Qualified Substances in CAS Files and Indexes

If not much has been written about the substance during the indexing period, all of the indexed information is foundin a single alphabetical sequence under the Index Name in the printed "Chemical Substance Index". However, whenthe index entries become voluminous, CAS divides them into Standard Subject Divisions. The compounds so treatedare referred to as QUALIFIED SUBSTANCES. Originally seven qualifiers were used, but two additional terms(formation and processes) were added in 1994, and one phrase (uses and miscellaneous) was subsequently splitapart. The qualifiers are:•• ANALYTICAL STUDY (ANST) - for methodology of detection or determination of the substance, or its

analysis; also for separation if the intent is analytical.•• BIOLOGICAL STUDY (BIOL) - for biochemical uses and for processes, properties, occurrence, and formation

in biological systems (including nonfossil by-products of living matter, food, etc.). Studies on the herbicidal,pesticidal, and pharmaceutical use of the material are also placed in this subdivision.

•• FORMATION, NONPREPARATIVE (FORM) - for the incidental formation of the substance in a nonpreparativestudy (from v. 121 onward).

•• MISCELLANEOUS (MSC) - studies not otherwise classifiable.•• OCCURRENCE (OCCU) - for natural occurrence (in other than biological systems).•• PREPARATION (PREP) - for synthesis, manufacture, incidental formation (other than biochemical), recovery,

separation, and purification.•• PROCESS (PROC) - for nonreactive treatment of the substance, nonpreparative removal of the substance, and

complex treatments of the substance (from v. 121 onward).•• PROPERTIES (PRP) - for physical and chemical properties and related non-reaction processes.•• REACTIONS (RACT) - for chemical changes that lead to products differing chemically from the starting

material, including nuclear interactions (other than simple scattering), corrosion, neutralization, enolization,isomerization, and tautomerism.

•• USES (USES) - for applications (other than biochemical), removal (in purification procedures), industrialprocessing.

Page 4: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 4

CAS Roles [9] in the CA and other Files

ROLES are CAS indexing terms assigned to every indexed substance and to controlled index terms for classes ofcompounds. The use of roles began to be appplied to the new online CA File records with v. 121 (July 1994). Theywere then applied retrospectively to all CA File records by means of a computer algorithm. Since there are over 60specific roles and 9 broad super roles, they substantially expand the indexing terms that were used prior to theirintroduction. The role terms give a more precise link to the substance. For example, it is now possible to specify notonly that you want the preparation of the substance, but also that the preparation be a synthetic preparation, asopposed to industrial manufacture. In the past, there was no distinction made in the use of the term "Preparation" insuch cases.

Searching the Registry File with a Chemical Name

The Registry File is the largest single source of chemical names in existence. It can be searched on the STNcommand-language system by a trade or common name for a substance (CN), by its CAS Index Name (CN) ininverted order, or by fragments of the CAS Index Name (CNS field). (See: Tips for Chemical Name Searching [10])Just as we had a Basic Index that is formed from subject words in a bibliographic database, there is also a basic indexfor the Registry File when searched on STN. The BASIC INDEX of the Registry File includes both chemical namefragments and molecular formula fragments. It may be necessary to follow certain protocols for special characters inorder to search for a chemical name. Greek characters, for example, are spelled out in their entirety with a periodbefore and after the Greek part of the name. An example of such a chemical name search in SciFinder Scholar isbelow. Note that in the SciFinder Scholar system, the search will work with or without the periods around the"alpha," but in STN command-language searching, the dots are mandatory.

SciFinder Explore by Substance Name Search for alpha-Methylbenzoin](Reproduced with permission of CAS, a division of the American Chemical Society.)

SciFinder Record for alpha-Methylbenzoin](Reproduced with permission of CAS, a division of the American Chemical Society.)

Note that in SciFinder Scholar, you should not invert the name when searching a CA Index Name. For example,entering Benzene, 1,4-dibromo will not work, but searching 1,4-dibromobenzene will.

Searching the Registry File and Printed CA Indexes with a Molecular Formula: The Hill System

The system most commonly used today for arranging molecular formulas in indexes is the HILL SYSTEM. TheHill System covers both organic and inorganic compounds according to the following rules:1. Sum individually all like atoms within the molecule.2. If carbon is present, place it and the total number of C's first in the formula.3. If both carbon and hydrogen are present, place hydrogen and the total number of H's second. Note that if carbon isnot present, rule 4 applies to the substance, and the H is placed in its regular position in the alphabet.4. All other atoms in the molecule are arranged alphabetically. That means that for inorganic substances withoutcarbon, the arrangement is alphabetical.Within the index itself, the numbers of elements come into play. Here is an example of compounds arranged for aHill System Index:

Page 5: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 5

Al6 Ca5 O14 C5 H8 O2

B2 O3 C8 H5 N O2

B2 Zr3 C15 H24 N2

Br H C22 H24 F N3 O2

C Cl4 Ca O3 Ti

C H Cl3 Cl H

C H N O H2 O4 S

C2 Ca H4 Sn

C2 H4 O3 Pb Rb2

C2 H4 Br Cl O5 P14 Zn7

C2 H5 Al Br2 Sn Zr4

Note that in the Registry File (including the SciFinder approach), the formulas may be searched with or withoutspaces between the element symbols. They are put here for clarity. The Hill System gives rise to some formulas thatare quite different from those a chemist is used to seeing, e.g., H2O4S for sulfuric acid or BrH for hydrobromic acid.The printed CA "Formula Indexes" do not have entries for the 600 or so qualified substances that have lots ofinformation written about them. Thus, we find in the Chemical Abstracts "Formula Index" from the 10th CollectiveIndex period (1977-81):

C8H5NO21H-Indole-2,3-dione [91-56-5].

See Chemical Substance Index

sodium salt [3486-31-5], 90: 6180p; 91: 157670v; 94: 209034z

This tells us that the printed CA "Chemical Substance Index" must be used for detailed information on isatin itself,but it gives direct information that three documents dealt with the sodium salt of isatin during the period. When asustance would have more than 20 entries in a 6-month volume index or more than 50 entries in the 5-year collective"Formula Indexes," a "See" reference is made to the name of the substance in the "Chemical Substance Index". Wefind in the "Formula Index" the abstract numbers for the sodium salt of isatin since there were relatively fewdocuments written about that compound during the 10th Collective Index period.A chemical formula in the Hill System may have more than one substance with that formula. For a given formula,isomers are arranged alphabetically by the CAS Index Name.In the online molecular formula index of the Registry File (/MF), salts, addition compounds, and mixtures have themolecular formulas for the components arranged separately, with ratios for salts and addition compounds specifiedwhen known. If the ratios are unknown, a lower case "x" before the second formula or subsequent formulas is used,e.g.,

C15 H24 N2 . 2 Cl HC22 H24 F N3 O2 .x H2 O4 S

These are examples of the so-called DOT-DISCONNECTED FORMULAS. (See: Tips for Molecular FormulaSearching [11])

Page 6: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 6

Molecular Formulas of Types of Compounds in CA/STN

A. Salts.

Simple salts such as sodium chloride are treated as any other Hill Formula: ClNa.1. Metal Salts of Complex Organic or Organometallic AcidsIn general these substances have the molecular formula of the cation followed by the dot disconnect symbol (theperiod) and a multiplier times the molecular formula of the anion.For metal salts of organic acids, the metal replaces one or more hydrogens attached to N, O, P, As, Se, or Te in anorganic substance. The CAS structuring conventions treat these substances in the following manner:•• The organic portion is treated as a neutral molecule, including the acidic hydrogen atoms.•• The metal is viewed as a separate, unattached fragment.•• The ratio between the organic acid and the metal atom is expressed. (If unknown, the ratio is expressed as "x".)The multiplier for the organic acid is always 1. For the metal, it indicates the oxidation state as a fraction, e.g., C7H6 O2 . 1/2 Cu

Example: C6 H8 O7 . 3 Na1, 2, 3-Propanetricarboxylic acid, 2-hydroxy-, trisodium salt

CAS RN: 68-04-2A search of the SciFinder Scholar product for the molecular formula yielded ten answers at the time of the search,among them:

SciFinder Molecular Formula Answer: Trisodium Citrate](Reproduced with permission of CAS, a division of the American Chemical Society.)

Other examples:•• Unknown ratio: C6 H8 O7 . x Na•• Mixed metal salt: C6 H8 O7 . Ca . Na•• Metal salt of an alcohol: C6 H6 O2 . 1/2 Ba•• Metal salt of a radical ion: C10 H8 . NaExceptions:•• Metal salts of two or more different acids have the hydrogens removed, and bonds are formed from the

heteroatoms of the acids to the metal.•• Metal salts of dithiocarbamates (and Se or Te analogs) are represented as N-C(=Q)-Q, where Q = S or Se.

•• Likewise, metal salts of dithiophosphates are represented as R2P(=Q)-Q, where R = halide, halogenoid, orcarbon-containing substituent.

•• Salts of coordination compounds, e.g., C7 H4 Cu O3 and C18 H18 O8 Zn.Organometallic compounds in the Registry File are substances which have a carbon atom directly bonded to a metalatom, e.g., Phenyl Lithium: C6 H5 Li. Note, however, that carbonium ions and carbanions are generally found asdot-disconnects in the Registry File.Coordination compounds in the Registry File are substances in which an atom or group of atoms is bound to acentral metal atom by a pair of electrons supplied by the coordinate group and not by the central metal atom, e.g.,metallocenes. These substances have the Class Identifier code CCS in the Registry File records.B. Polymers.

Polymers are indicated with the molecular formula of the repeating unit(s) in parentheses to which is appended an"x". The "x" indicates a repeating unit. For example, the molecular formula for 1,3-Butadiene is (C4H6)x. A searchfor a polymer by molecular formula may retrieve variant forms of the substance, because the syndiotactic, isotactic,

Page 7: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 7

graft or co-polymer will all have separate Registry Numbers.

Molecular Formulas in The Basic Index of the Registry File

The Registry File's Basic Index contains chemical name fragments and molecular formula fragments (includingmolecular formulas for individual components of multi-component substances and single component substances).Formula fragments searched in the Basic Index must be entered without spaces.

Element Information

In command-driven searching, it is possible to search for various information about the elements comprising achemical substance, such as:• Element Symbol, indicating the presence of an element (/ELS), e.g., => S B/ELS and H/ELS•• Element Count, to specify the number of unique elements in a component or substance (/ELC or /ELC.SUB)• Element Formula, the molecular formula of components without the numbers that depict the ratios (/ELF), e.g.,

=> S AL CO LA O/ELF• Periodic Group, the column and row designations for elements, e.g., => S B6/PC or => S LNTH/PG•• Material Composition, when looking for alloysThere are many more options for such searching on the STN command-language system.

Ring System Data and Ring Indexes

The Ring Identifier information (RID) lets you search a database for everything from the number of rings in asubstance to the Ring Formula (minus hydrogens). The Registry File now has much information about rings that canbe searched online, such as the Elemental Sequence for the Smallest Ring (/ESS), the number of rings in the ringsystem (/NRRS), etc. These search techniques can be valuable in refining a substance search in the Registry File. Seethe Registry File Database Summary Sheet [12] for more options.The Ring Systems Handbook provides an easy way to find the Heading Parent name for ring compounds. This namecan then be used in the printed CA "Chemical Substance Index" or, for an online search, either the name or theRegistry Number can be used to retrieve the Registry File record. It is important to know that the compound found inthe Ring Systems Handbook may not actually exist. That is, there may be no information in the CA File on thesubstance. When a new ring system is identified, the substituents are stripped off, and a new ring system entry placedin the RSH.The access to the entries in the Ring Systems Handbook is by name or ring analysis (and then by molecular formulaof the rings making up the compound, ignoring hydrogens). The main part of the set is arranged by the number ofrings comprising the compounds and the individual sizes of the smallest set of smallest rings. Thus, the number ofcomponent rings, the sizes of those rings, and the elements comprising them are enough information to find a ringcompound. A section in the main body of the work might be labeled:

2 RINGS: 5,6 C4N-C6

We would find in the section an entry for 1H-Indole [120-72-9]

H

C .

: . . N .

C: .C. . C

. : :

. : :

C: C.........C

: .

Page 8: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 8

:C.

with the molecular formula C8H7N and a 2-dimensional structural drawing of the molecule.It would not be too difficult then to assign the proper Chemical Abstracts Index name for isatin: 1H-Indole-2,3-dione

Chemical Abstracts incudes an "Index of Ring Systems" with each Formula Index, beginning with the 7th CollectiveIndex period (1962-66).

Compound Class Identifiers

There are a number of other indexes that can be used in an online search of the Registry File, e.g., Compound ClassIdentifiers (/CI).

Class Name Code

Alloy AYS

Coordination Compound CCS

Registered Concept CTS

Generic Registration GRS

Incompletely Defined Substance IDS

Manually Registered Substance MAN

Mineral MNS

Mixture MXS

Polymer PMS

Radical Ion RIS

Ring Parent RPS

An example of the use of the CI field in command-level searching is:=> SEARCH PMS/CI (retrieves polymers)

Such searches are of use in combination with other Registry File searches in order to narrow an answer set. See theRegistry File Summary Sheet [12] for additional possibilities.

NLM's Online Chemical Dictionary Files, PubChem and ChemSpider

Databases such as the Registry File are referred to as ONLINE CHEMICAL DICTIONARY FILES. They exist tohelp you identify substances, to gather like substances into a set, and to discover which files on the database vendor'ssystem have information on the substance(s).In the past there was an online chemical dictionary file from the National Library of Medicine. Although not nearly as large as the Registry File, NLM's CHEMLINE file contained over 1,360,000 records as of mid-1995. Work ceased on the CHEMLINE file in 1998. NLM publishes Supplementary Concept Records (formerly, Supplementary Chemical Records). It was an annual printed compilation for many years that contained all of the compound names used in indexing records in the Medline system. See the record on this page [13] for a summary of the data fields included in the Supplementary Concept Records. Various Medical Subject Heading (MeSH) files are available for

Page 9: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 9

download [14].A smaller NLM file is ChemIDplus [15], with nearly 380,000 compounds, over 263,000 of which have structure data.There is also a ChemIDplus Lite [16] version for those who just need to do name or Registry Number searching anddo not want to use a plugin or applet. An important feature of the ChemIDplus file is the link to SuperList [17].SuperList designates a collection of lists of chemical substances maintained by key federal and state governmentregulatory agencies, as well as by scientific organizations concerned with health and environmental hazards ofchemical substances. ChemIDplus provides directory assistance to those lists. Searching the NLM files isconsiderably cheaper than searching the CAS Registry file.Unlike CAS, the National Library of Medicine has attempted to group compounds with related substances in theirindex in a hierarchical fashion. From 1963 through 1995, a chemical was generally "treed" in two places: in one Treeshowing its chemical structure and in a second Tree under its function, or pharmacological action. The arrangementof chemical headings in MeSH (Medical Subject Headings) has not changed, but NLM no longer puts all drugsunder the functional trees.The NIH's PubChem [18] is a free database covering over 27 million unique substances. PubChem has numeroussearch options, including the capability to search by InChi [19], the IUPAC International Chemical Identifier.PubChem includes substance information, compound structures, and BioActivity data in three primary databases,Pcsubstance, Pccompound, and PCBioAssay, respectively.The RSC's ChemSpider [20] is also a free database containing around 25 million compounds from 400 data sources.The easiest way to search in ChemSpider is to use a common name or tradename. For example, benzyl azide is aversatile reaction intermediate. What information can I find about this compound in ChemSpider?STEP 1 Go to www.chemspider.com. On the home page there is a search box, simply type the name of thecompound of interest and click Search. Alternatively, select the Search tab from the top toolbar and choose SimpleSearch from the drop down menu.STEP 2 Look at the results. The default record view will give you the structure, SMILES, InChIKey alternativenames & synonyms.Scroll down the record view to see more information. The record view comprises a number of info boxes which mayinclude a number of different tabs indicating the different pieces of information that are available.In the Associated data sources box for example, those data sources who are commercial vendors from whom you canpurchase the chemicals are indicated with a shopping cart. Other sources may include links to biological data,toxicology data, physical properties, spectral data and safety data.Scroll down the record to view all of the different sections of the page (if they aren’t visible click on the ‘expand’icon in the section heading to expand them).There will be info boxes for links to patent information from SureChem and literature links providing access to RSCjournals, book and databases. The Search Google Scholar link will enable you to expand a search into the widerscientific literature based on the approved names and synonyms in ChemSpider.Records may also have a link to reactions in ChemSpider SyntheticPages. You can view the full article in CS|SP athttp:/ / cssp. chemspider. comThere is also a link to spectral data. This can be HMNR, CNMR, IR or Mass Spectra. The spectra can be viewed in aJava applet and can also be downloaded.

Page 10: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 10

Beilstein and Gmelin

The factual Databases Beilstein and Gmelin are organized a little bit differently. Structure searching is the mostappropriate way to find informaiton in these sources on Reaxys [21]. Although both can be queried using chemicalname and formula searching, for the inorganic compounds in Gmelin, formula searching is actually the mostappropriate approach.Searching by Name

In both Beilstein and Gmelin, there is a field "Chemical Name (CN)" containing the chemical names of thesubstances in the databases. Select the field from the datastructure or use it in advanced mode like cn=*searchterm*.Truncation can be used left and right. It is advised to use the list (expand) function to look for different spellings ofthe same name that might be found from different authors in different publications.The field "Chemical Name Segment" contains the fragmented pieces of the field CN. Querying for "Indole" usingthis field retrieves a list of compounds containing the term "Indole" in their chemical name.While these two fields contain the names and name fragments of registered Substances, the field "All ChemicalNames" includes the names of solvents, derivates and other fields with chemical names in addition, and thus allows abroader approach to searches using chemical names.Searching by Formula

Using molecular formula for searches in Beilstein can be a very powerful option, and there are a few options for sucha search.The field Molecular Formula (MF) contains exact molecular formula for single- and multi-fragment compounds. It iscalculated from the chemical structure in Hill order, with no charge or isotope information. For multi-fragmentcompounds like salts the molecular formulas corresponding to the individual fragments are separated from eachother by an asterisk and have normalized stoichiometric multipliers.For the sodium salt of Isatin the Molecular Formula accordingly is C8H4NO2*NaPositional isomers can be searched very effectively when the molecular formula is combined with a LawsonNumber(LN).The field Linear Structure Formula (LSF) adds the option to explicitly include charges or isotope labels with theexception of Deuterium and Tritium.For the above mentioned Isatin salt this would be C8H4NO2(1-)*Na(1+)The field "Search MF Range" allows searching for derivatives of a certain carbon skeleton or for ranges in themolecular formula. Thus queries like "C(2-4) H(4-8)" or "C8 H7 *" are allowed using this field. Note that it is notpossible to use larger or less than signs or symbols. If you want to require more than 3 oxygens to be present in theresulting structures, use "O(4-99)".For Gmelin searching by molecular formular is the method of choice especially when it comes to inorganiccompounds.

Summary

Chemical nomenclature is an area of expertise claimed by few chemists today, but there are powerful searchcapabilities in databases and printed reference works that make use of chemical names, both trivial and formalnames. On the other hand, all chemists use molecular formulas, and a system such as the Hill System for arrangingmolecular formulas in an index provides a useful retrieval mechanism. Chemical Abstracts Service uses the RegistryNumber to index documents in the CA database. Many tags have been developed to use with the Registry Numbersfor more effective searching in the CA databases. An increasingly popular search site is the PubChem database, andthe Beilstein and Gmelin databases are useful complements to the others.CIIM Chemical Nomenclature

Page 11: Collection 4

Chemical Information Sources/Chemical Name and Formula Searches 11

CIIM Link for further studySIRCh Link for Chemical Name and Formula SearchesProblem Set on this topic [22]

References[1] http:/ / www. indiana. edu/ ~cheminfo/ C471/ chemall. html[2] http:/ / www. cas. org/ ASSETS/ 2F6CF61DE9D843F9B5AF0A6B199E57BE/ casregnumbersweb. pdf[3] http:/ / php. indiana. edu/ ~davisc/ Abstract. htm[4] http:/ / www. iupac. org/ publications/ books/ seriestitles/ nomenclature. html[5] http:/ / www. genome. jp/ dbget-bin/ get_htext?ECtable+ -f+ T+ w+ D[6] http:/ / www. indiana. edu/ ~cheminfo/ C471/ stnfiles. html[7] http:/ / www. reaxys. com/[8] http:/ / www. commonchemistry. org/[9] http:/ / www. cas. org/ ASSETS/ EB85B919049C4E448DCF8D391788F0DD/ casroles. pdf[10] http:/ / www. indiana. edu/ ~cheminfo/ C471/ cnametip. html[11] http:/ / www. indiana. edu/ ~cheminfo/ C471/ molftip. html[12] http:/ / info. cas. org/ ONLINE/ DBSS/ registryss. html[13] http:/ / www. nlm. nih. gov/ mesh/ ctype. html[14] http:/ / www. nlm. nih. gov/ mesh/ filelist. html[15] http:/ / chem. sis. nlm. nih. gov/ chemidplus/[16] http:/ / chem. sis. nlm. nih. gov/ chemidplus/ chemidlite. jsp[17] http:/ / sis. nlm. nih. gov/ chem/ superlist. html[18] http:/ / pubchem. ncbi. nlm. nih. gov/[19] http:/ / en. wikipedia. org/ wiki/ International_Chemical_Identifier[20] http:/ / chemspider. com/[21] http:/ / www. reaxys. com[22] http:/ / www. indiana. edu/ ~cheminfo/ C471/ 471ps3. html

Page 12: Collection 4

Article Sources and Contributors 12

Article Sources and ContributorsChemical Information Sources/Chemical Name and Formula Searches  Source: http://en.wikibooks.org/w/index.php?oldid=2063862  Contributors: Adrignola, Avicennasis, Daviesje7, GaryDorman Wiggins

Image Sources, Licenses and ContributorsFile:Isatine.svg  Source: http://en.wikibooks.org/w/index.php?title=File:Isatine.svg  License: Public Domain  Contributors: Dschanz

LicenseCreative Commons Attribution-Share Alike 3.0 Unported//creativecommons.org/licenses/by-sa/3.0/