Compounds that cannot be described by structural representation and no-structures

Typical example in this category are natural products (for example plant extracts with unknown chemical structure(s)) or biologics like antibodies that are too “big” or structurally unknown to be fully represented in a structure database. For this type of cases the *-atom as “wildcard” atom has been introduced. It does not have any chemical meaning unless Sgroup data are used to specify it. Let’s assume the antibody is called SAB1400491 and that a peptide sequence builds a Cysteine bridge to the antibody. A structural representation may look like the following depiction

In the case of compounds with unresolved structure (like many natural products for example) the *-atom is frequently used to keep a “placeholder” for this compound in the structure database that keep the structural part consistent with other relational data tables. In the following example it is assumed that the ID / primary key of the compound is “123456” you may use following drawing as database structure

Alternatively you may use

because the structure is not known. In case you have a duplicate check on the structure database you may only have one no structure. That may be compensated by the data model but alternatively you may keep the consistency between the relational data table and the structure table by using the *-atom + ID Sgroup data field like in the example above so that all “no structures” get their own ID that makes them unique in the structure set.

*-atoms are as well found as start or end groups of polymers


Reflecting the issue that the start and end groups of polymers are not known in most of the cases.