About 25 years ago the former MDL introduced the substance groups (Sgroups) into the world of chemical representation in order to mainly address the needs of industrial chemistry where chemical structures are not so well defined than in the world of small molecules of the pharmaceutical industry,

  • The main Sgroup components are
  • pairs of square brackets [ ] to mark an entire molecule or any suitable collection of atoms and bonds within a molecule, 
  • Sgroup data as text strings or numbers that can be linked with any collection of atoms, bonds, brackets, fragments or the entire molecule 
  • the “wildcard” * atom that finds any other atom in a search (therefore “wildcard”) but has no meaning on its own unless Sgroup data give any additional information.

With these 3 elements MDL started to handle a variety of different structural entities including

  • Abbreviations
  • Multiple Groups
  • Polymers
  • Chemical representations for mixtures and formulations
  • Chemical structures that cannot be described by a full structure representation
  • Special cases in stereochemistry
  • Statistically distributed structural elements

The search for Sgroup data is based on standard Oracle string and number search operators like “=”, “<=”, “>”, “like” and others with “%” as wildcard.
In SSS searches Sgroup data are found like an additional query element in the search, i.e. if you search for a structure without Sgroup data it will find the structure in compounds with and without Sgroup data, while any SSS with one or more Sgroup data elements only returns those structures that fulfill the structural query conditions and the text or number query for the Sgroup data.

To keep the results of your structure searches and the structure duplicate check consistent you must ensure that the collection of atoms, bonds etc. is consistently defined in your database. More details described in an example can be found in “Special cases in stereochemistry”.