Simplified molecular input line entry specification
From Wikipedia, the free encyclopedia
smiles | |
File extension: | .smi |
---|---|
Type of format: | chemical file format |
The simplified molecular input line entry specification or SMILES is a specification for unambiguously describing the structure of chemical molecules using short ASCII strings. SMILES strings can be imported by most molecule editors for conversion back into two-dimensional drawings or three-dimensional models of the molecules.
The original SMILES specification was developed by Arthur Weininger and David Weininger in the late 1980s. It has since been modified and extended by others, most notably by Daylight Chemical Information Systems Inc. Other 'linear' notations include the Wiswesser Line Notation (WLN), ROSDAL and SLN (Tripos Inc). Recently, the IUPAC has introduced the InChI as a standard for formula representation. SMILES is generally considered to have the advantage of being slightly more human-readable than InChI; it also has a wide base of software support with extensive theoretical (e.g., graph theory) backing.
Contents |
[edit] Canonical SMILES and Isomeric SMILES
The term Canonical SMILES refers to the version of the SMILES specification that includes rules for ensuring that each distinct chemical molecule has a single unique SMILES representation. A common application of Canonical SMILES is for indexing and ensuring uniqueness of molecules in a database.
The term Isomeric SMILES refers to the version of the SMILES specification that includes extensions to support the specification of isotopes, chirality, and configuration about double bonds. A notable feature of these rules is that they allow rigorous partial specification of chirality.
[edit] Graph-based definition
In terms of a graph-based computational procedure, SMILES is a string obtained by printing the symbol nodes encountered in a depth-first tree traversal of a chemical graph. The chemical graph is first trimmed to remove hydrogen atoms and cycles are broken to turn it into a spanning tree. Where cycles have been broken, numeric suffix labels are included to indicate the connected nodes. Parentheses are used to indicate points of branching on the tree.
[edit] Examples
Atoms are represented by the standard abbreviation of the chemical elements, in square brackets, such as [Au] for gold. The hydroxide anion is [OH-]. Brackets can be omitted for the "organic subset" of B, C, N, O, P, S, F, Cl, Br, and I. All other elements must be enclosed in brackets. If the brackets are omitted, the proper number of implicit hydrogen atoms is assumed; for instance the SMILES for water is simply O and that for ethanol is CCO.
The double-bonded carbon dioxide is represented as O=C=O and the triple-bonded hydrogen cyanide as C#N.
Branches are described with parentheses, as in CCC(=O)O for propionic acid and C(F)(F)F for fluoroform, which could also be described by the non-canonical formula FC(F)F.
Cyclohexane is represented as C1CCCCC1, the idea being that the two 'number ones' label the same position in the molecule, thus forming a ring with six carbons. Note that the label is the numeral (in this case the 1) rather than the combination of 'C1'.
Aromatic C, O, S and N atoms are shown in their lower case 'c', 'o', 's' and 'n' respectively. Bonds in an aromatic cycle are rarely marked explicitly except in SMARTS search patterns. Thus Benzene is c1ccccc1.
[edit] Isomeric SMILES
Configuration around double bonds is specified using the characters "/" and "\". For example, F/C=C/F is one representation of trans-difluoroethene, in which the Fs are on opposite sides of the double bond, whereas F/C=C\F is one possible representation of cis-difluoroethene, in which the Fs are on the same side of the double bond, as shown in the figure.
[edit] Extensions
SMARTS is a modification of SMILES that allows, in addition to the SMILES elements, the specification of wildcard atoms and bonds. This is used in specifying search structures and is widely used in chemical database search applications. This practice has led to a common misconception that chemical substructure search is achieved computationally by matching SMILES/SMARTS strings, when, in fact, it is achieved by the computationally more intensive search for subgraph isomorphism in the graphs reconstructed from the SMILES representations.
[edit] Conversion
SMILES can be converted back to 2-dimensional representations using Structure Diagram Generation algorithms (Helson, 1999). This conversion is not always unambiguous. Conversion to 3-dimensional representation is achieved by energy minimization approaches.
[edit] See also
- SYBYL Line Notation (another line notation)
- Molecular Query Language - query language allowing also numerical properties, e.g. physicochemical values or distances
- Chemistry Development Kit (2D layout and conversion)
- International Chemical Identifier (InChI), the free and open alternative to SMILES by the IUPAC.
- OpenBabel, JOELib, OELib (conversion)
[edit] References
- Anderson, E., G.D. Veith, and D. Weininger. 1987. SMILES: A line notation and computerized interpreter for chemical structures. Report No. EPA/600/M-87/021. U.S. EPA, Environmental Research Laboratory-Duluth, Duluth, MN 55804
- Weininger, D. (1988), 'SMILES, a chemical language and information system. 1. Introduction to methodology and encoding rules', J. Chem. Inf. Comput. Sci. 28, 31 - 36.
- Helson, HE (1999). Structure Diagram Generation In Rev. Comput. Chem. edited by Lipkowitz, K. B. and Boyd, D. B. Wiley-VCH, New York, pages 313-398.
[edit] External links
[edit] Specifications
- "SMILES - A Simplified Chemical Language"
- "SMARTS - SMILES Extension"
- Daylight SMILES tutorial
- Parsing SMILES
[edit] SMILES related software utilities
- Daylight Depict
- CACTVS at NCI
- PubChem online molecule editor
- JME molecule editor
- ACD/ChemSketch
- CSMILES aware Java based molecule editor and 2D/3D viewer
- Smormo-Ed:A Molecule editor for Linux which can read and write SMILES
- InChI.info: an unofficial InChI website featuring on-line converter from InChI and SMILES to molecular drawings