Two or More X Nodes at the Same Atom Generate Wrong Strategy
RIN Codes for Certain Non-Aromatic Versions of Polycyclic Carbocycles To Be Edited
Missing J-Codes for Certain Types of Fused Heterocycles Leading to Loss of Precision
Issue: For aromatic rings with hydroxy or thiol substituents (i.e., the hydrogen is present in the structure), mandatory codes are not generated (e.g., phenol lacks codes (H401, H441), thiophenol lacks code (H494)).
Workaround: Do not draw the hydrogen or use the respective -OH or –SH shortcut for these types of structures. In case of the "naked“ heteroatom (e.g., Phe-O, Ph-S) the correct strategy is generated.
Note: Saturated carbocycles are not affected (e.g., for cyclohexanol with explicitly present hydrogen, the correct strategy is generated).
Issue: For saturated heterocycles with –O and –S substituents (i.e., the hydrogen is not present in the structure), wrong codes are generated for –O substituted structures (codes J521, J522, J523 (1, 2,
>=3 – Het-Oxo)) and –S substituted structures (codes J592 (Het-thioxo)).
Workaround: Draw the hydrogen or use the respective –OH or –SH shortcut for these types of structures.
When searching for tautomeric structures, please make sure that your STNext structure query is inline with the coding rules.
For example, when keto-enol tautomerism is possible, the structure is coded in the keto form unless the -OH group of the enol form is bonded to a fully conjugated carbocyclic ring (e.g., benzene). For this reason, the search structures should be drawn as follows:
For detailed information, please refer to the “Tautomerism” section in Chapter 8: Functional Groups of the CPI Chemical Indexing User Guide.
Important: It is important to note that in some cases, the generated codes of the tautomeric structure are nevertheless 100% correct. We therefore highly recommend to always check the corresponding script again for correctness.
Issue: If two or more X nodes (generic node for halogen) are attached on the same atom (e.g., benzene-CCl2 or benzene-CF3), a wrong fragcode strategy is generated.
Workaround: Use R-groups containing F, Cl, Br, I.
The indexing of carbohydrates includes the required code L8 as well as code K0.
Issue: In the fragmentation code strategy for carbohydrates, the code K0 is included in the negation codes. As a consequence, relevant records are not found since K0 is usually indexed for carbohydrates.
Workaround: For carbohydrates, the code K0 needs to be manually deleted from the negation codes. In addition, codes K1-9 and L1-L7 and L9 should be added since it will lead to more accurate results by eliminating those structures that have other functional groups present which are not part of the original structure.
Example: BETA-D-METHYLGALACTOSIDE (DCR-83195)
Indexed fragmentation codes
M2 *01* F012 F013 F014 F015 F016 F123 H4 H404 H423 H481 H5 H521 H8 K0 L8
L815 L821 L831 M210 M211 M272 M281 M311 M321 M342 M373 M391 M413
M431 M510 M521 M530 M540 M782 P220 P420 P943 Q261 R032 M905
M904
Note: In order to avoid -OH and -SH Substituents on Saturated Heterocycles, the hydrogens of the hydroxy groups should be explicitly present.
=>s (M210 OR M211)/M0,M2,M3,M4 \>_line1 =>s (M413(P)F123(P)H423(P)H481(P)H521)/M0,M2,M3,M4 \>_line2 =>s _line2(P)(M521(P)M510(P)M530(P)M540)/M0,M2,M3,M4 \>_line3 =>s _line3(P)((M272 OR M270)(P)M281(P)M311(P)M321(P)M342(P)(M373 OR M370)(P)M391)/M0,M2,M3,M4 \>_line4 =>s _line4(P)_line1 \>_line5 =>s _line5(P)(F012(P)F013(P)F014(P)F015(P)F016(P)H404)/M0,M2,M3,M4 \>_line6 =>s (_line2(P)M900/M0) OR (_line3(P)M901/M2,M3,M4) OR (_line5(P)M902/M2,M3,M4) OR _line6 \>_line7 =>s _line7(NOTP)(H1 OR H2 OR H3 OR H6 OR H7 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line8 |
Issue: The indexed code K0 is included in the negation codes and has to be deleted.
=>s (M210 OR M211)/M0,M2,M3,M4 \>_line1 =>s (M413(P)F123(P)H423(P)H481(P)H521)/M0,M2,M3,M4 \>_line2 =>s _line2(P)(M521(P)M510(P)M530(P)M540)/M0,M2,M3,M4 \>_line3 =>s _line3(P)((M272 OR M270)(P)M281(P)M311(P)M321(P)M342(P)(M373 OR M370)(P)M391)/M0,M2,M3,M4 \>_line4 =>s _line4(P)_line1 \>_line5 =>s _line5(P)(F012(P)F013(P)F014(P)F015(P)F016(P)H404)/M0,M2,M3,M4 \>_line6 =>s (_line2(P)M900/M0) OR (_line3(P)M901/M2,M3,M4) OR (_line5(P)M902/M2,M3,M4) OR _line6 \>_line7 =>s _line7(NOTP)(H1 OR H2 OR H3 OR H6 OR H7 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K1 OR K2 OR K3 OR K4 OR K5 OR K6 OR K7 OR K8 OR K9 OR L1 OR L2 OR L3 OR L4 OR L5 OR L6 OR L7 OR L9 OR M1)/M2,M3,M4 \>_line8 |
Solution: K0 is removed (mandatory). To enhance accuracy of results, codes K and L are added to the required K0 and L8.
Issue: For the chemotype of steroids (i.e., cholesterol), a wrong fragcode strategy is generated. There is no workaround.
The DCR indexing of the L9 code set is not consistent due to the complexity of complete recognition of these structural elements within a chemical structure. Therefore, fragmentation code strategies which include such codes may miss relevant records.
The L9 code set includes:
Best Practice: Delete those codes from the fragmentation code strategy.
There is RIN indexing for complete spiro systems (e.g., RIN 06706 for 9,9′- Spirobifluorene). In certain cases, there is additional RIN indexing for the individual ring systems that are joined by the spiro link. Usually, ring index numbers apply to a ring system irrespective of the degree of unsaturation; there are a small number of polycyclic carbocyclic ring systems where there is no specific code for the aromatic version of the system and a specific code for the non-aromatic version of the ring system (even though it is the same ring system with all of the benzene rings wholly or partially hydrogenated – i.e., no intact aromatic ring system or quinoid variant thereof present in the system).
The following chemotypes are affected:
Fluorene (Aromatic) G310 (no RIN) Non-aromatic version G720 RIN 03126
Anthracene (Aromatic) G331/G332 (No RIN) Non-aromatic version G730 RIN 03618
Phenanthrene (Aromatic) G341/G342 (No RIN) Non aromatic version G730 RIN 03619
Chrysene (Aromatic) G410 (NO RIN) Non-Aromatic version G800 RIN 05254
Naphthacene (Aromatic) G420 (NO RIN) Non-aromatic version G800 RIN 05252
Dibenzo(a,d)cycloheptene (Aromatic) G360 (NO RIN) Non-aromatic version G750 RIN 03708
Dibenzo(a,c)cycloheptene (Aromatic) G380 (NO RIN) Non-aromatic version G750 RIN 03714
Issue: In STNext, it may occur that even for aromatic systems of the type described above, the respective RIN codes for the non-aromatic versions are included. For instance, for 9,9′-Spirobifluorene (CAS-Nr.: 159-66-0), a fully aromatic system, only the RIN code 06706 relating to the spiro system should be applied, but on STNext, the wrong RIN code 03126 relating to non-aromatic systems (and triggered by the presence of code G720) is additionally included. This leads to a reduced answer set, and consequently, relevant hits are missed.
Best Practice: Check your fragmentation code script to ensure that, for instance, RIN code 03126 is not applied if only G310 and not G720 is present (and similarly check for the other systems listed above).
Example: Comparison of fragmentation code strategies on STNext for STR1, STR2, and STR3:
This specific example relates to the following codes
Fluorene – G310 (This code covers only the ring system fluorene and hydrogenated versions where at least one benzene ring retains its 3 double bonds (or a quinoid variant thereof). There is no asterisk as the code only describes one ring system.
Polyhydrofluorene – G720 (Neither of the 6-membered rings are aromatic or quinoids.) Note that G720 has an asterisk indicating an RIN is required as it covers several possible ring systems.
Autogenerated Fragmentation Code Strategy for STR1: RIN 06706 is correct, RIN 03126 is wrong (to be deleted manually from the STNext fragmention code script).
=>s (M414(P)G041(P)G310(P)G399(P)M532)/M0,M2,M3,M4 \>_line1 =>s _line1(P)(M610(P)M510(P)M520(P)M540)/M0,M2,M3,M4 \>_line2 =>s _line2(P)(M280(P)M320)/M0,M2,M3,M4 \>_line3 =>s _line3(P)(03126(P)06706)/RIN \>_line4 =>s _line4(P)(G031(P)G039)/M0,M2,M3,M4 \>_line5 =>s (_line1(P)M900/M0) OR (_line2(P)M901/M2,M3,M4) OR (_line4(P)M902/M2,M3,M4) OR _line5 \>_line6 =>s _line6(NOTP)(H1 OR H2 OR H3 OR H4 OR H5 OR H6 OR H7 OR H8 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line7 |
Autogenerated Fragmentation Code Strategy for STR2: RINs 03126 and 06706 are correct.
=>s (M414(P)G041(P)G052(P)G310(P)G720(P)M531)/M0,M2,M3,M4 \>_line1 =>s _line1(P)(M541(P)M610(P)M510(P)M520)/M0,M2,M3,M4 \>_line2 =>s _line2(P)(M280(P)M320)/M0,M2,M3,M4 \>_line3 =>s _line3(P)(03126(P)06706)/RIN \>_line4 =>s _line4(P)(G031(P)G039)/M0,M2,M3,M4 \>_line5 =>s (_line1(P)M900/M0) OR (_line2(P)M901/M2,M3,M4) OR (_line4(P)M902/M2,M3,M4) OR _line5 \>_line6 =>s _line6(NOTP)(H1 OR H2 OR H3 OR H4 OR H5 OR H6 OR H7 OR H8 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line7 |
Autogenerated Fragmentation Code Strategy for STR3: RINs 03126 and 06706 are correct.
=>s (M415(P)G052(P)G720(P)G799)/M0,M2,M3,M4 \>_line1 =>s _line1(P)(M542(P)M610(P)M510(P)M520(P)M530)/M0,M2,M3,M4 \>_line2 =>s _line2(P)(M280(P)M320)/M0,M2,M3,M4 \>_line3 =>s _line3(P)(03126(P)06706)/RIN \>_line4 =>s _line4(P)(G031(P)G039)/M0,M2,M3,M4 \>_line5 =>s (_line1(P)M900/M0) OR (_line2(P)M901/M2,M3,M4) OR (_line4(P)M902/M2,M3,M4) OR _line5 \>_line6 =>s _line6(NOTP)(H1 OR H2 OR H3 OR H4 OR H5 OR H6 OR H7 OR H8 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line7 |
Note: STN Express strategy for STR2 and STR3 is incomplete as RIN code 03126 is missing. However, the omission of 03126 will probably have no effect on the retrieval (or at worst a marginal effect) since RIN 06706 relates to a spiro system linking 2 fluorene ring systems together. This means that when RIN 06706 is present along with G720, it implies that the ring system it applies to is a hydrogenated fluorine.
Rule: “If the atom bonded to the functional group is a non-angular C atom in a bridged ring system, and if the C atom could be seen as being a member of different sized rings, the smallest ring size is chosen, even if it is of lower priority than the larger ring.”
In the example below for STR1, the non-angular C atom that is substituted by a functional group (Oxo) is part of two, 6-membered rings (the bridged all carbon and nitrogen containing rings). For STR2, the bridged all-carbon ring is 5-membered, and the nitrogen-containing heterocyclic ring is 6-membered. According to the rule, the system should generated J521 for STR1 and J561 for STR2.
Issue: For fused heterocycles of the chemotype of STR2, the respective J-code is missing in the fragmentation code strategy. For the example above, STNext fails to generate code J561 for STR2. The issue affects structures with a non-angular C atom substituted by oxo (J561 is missing) or thio (J596 is missing); the corresponding imino chemotype is not affected. Furthermore, it affects not only nitrogen heterocycles, but also heterocycles containing other heteroatoms. The consequence of this issue is loss of precision resulting in a larger answer set. For the example above, STR2 leads to 40 hits in WPIX with the automatically generated fragmentation code strategy (lacking J561), whereas the corrected strategy (including J561) leads to 7 hits.
Workaround: Add the respective J-codes for the affected chemotypes to enhance precision.
Back to Structure Searching of Derwent World Patent Index Chemical Fragmentation Codes