Polyethylene glycol (PEG) is a polymer with a characteristic structure repeating unit (SRU) that can be represented by
The square brackets mark the definition of the structural repeating part [-O-CH2-CH2-] and the index n makes the representation to the SRU.
The formula is calculated with H2O(C2H4O)n while the molweight is not defined because there are no information given how many repeating units are represented in our case.
Quite frequently polymers do not have a fixed molecular weight (like in the example of PEG under multiple groups) but a statistical distribution of units that may be described by the average molecular weight or by a range with lower and upper molecular weight limits. By introducing the Sgroup data fields Average_MW (for the average molweight), Lower_MW (for the lower MW limit) and Upper_MW (for the upper MW limit) you may use following representations
In the left example the Sgroup data field Average_MW is attached to the brackets with a number value of 4400. In the right example the Sgroup data field Lower_MW is filled with 3900 (and manually moved to the left bracket) while the value 4900 is stored in the Sgroup data field Upper_MW positioned underneath the right bracket.
If the parameter for Sgroup data is enabled for exact searches in BioVia (Accelrys) Direct only those structures are returned that exactly match in the chemical structure and in the molecular weight values. If the Sgroup search key is omitted the exact match with the PEG drawn as SRU will return all entries with PEG (written as SRU) independent of the molecular weight values.
For Sub Structure Searches (SSS) with the PEG-SRU and without additional values as query you get all SRUs of PEG that are in the database. But if you add the molecular weight as additional search condition the hit list is drastically reduced as demonstrated in the following search examples:
The first query finds all PEG structures in SRU format. The second one reduces the hit list to those database entries that provide an average molecular weight between 4200 and 4400 Dalton. The third query cannot find our original entry because it asks for average molecular weights > 5000, while our example entry “only” has an average molecular weight of 4400.
In the second case (PEG with molecular weight range) the lower and upper molecular weight fields are searched independently from each other together with the structure so that you may start searches with the SRU of PEG together with one or both fields as query conditions like in the following example:
In the first search example the queried lower and upper molecular weights fit into the values of our PEG example. The second query only operates on the lower molecular weight field of the PEG structure so that the 3900 of our example structure is identified while the Upper_MW is not searched for. The last example cannot find our PEG because it requests that the upper molecular weight is greater than 5200 while our example has an upper limit of 4900.
Although our two example representations of the PEG (multigroup on the left, structure repeating unit on the right with an average molecular weight of 4400) mainly describe the same polymer they do not find each other, because the types of Sgroup elements being used are not identical. The maximum common sub structure is ethylene glycol.
Beside “Standard Repeating Units” the molfile format knows additional Sgroup types handling all kinds of polymers like block (blk), alternating (alt), or random(ran) copolymers as in the following example where 80% and 20% reflects the relative composition of the co-polymer out of polyethylene and polystyrene with “either unknown” repeating unit (eu = either unknown, ht = head to tail, hh = head to head).: