Critical Aspects for Your Search: Limitations and Best Practices

Contents

-OH and -SH Substituents on Aromatic Rings

Issue: For aromatic rings with hydroxy or thiol substituents (i.e., the hydrogen is present in the structure), mandatory codes are not generated (e.g., phenol lacks codes (H401, H441), thiophenol lacks code (H494)).

Workaround: Do not draw the hydrogen or use the respective -OH or –SH shortcut for these types of structures. In case of the "naked“ heteroatom (e.g., Phe-O, Ph-S) the correct strategy is generated.

Note: Saturated carbocycles are not affected (e.g., for cyclohexanol with explicitly present hydrogen, the correct strategy is generated).

Back to Top

-OH and -SH Substituents on Saturated Heterocycles

Issue: For saturated heterocycles with –O and –S substituents (i.e., the hydrogen is not present in the structure), wrong codes are generated for –O substituted structures (codes J521, J522, J523 (1, 2,

>=3 – Het-Oxo)) and –S  substituted structures (codes J592 (Het-thioxo)).

Workaround: Draw the hydrogen or use the respective –OH or –SH shortcut for these types of structures.

Back to Top

Tautomerism: Keto-enol and Iminine/Enamine Tautomers

When searching for tautomeric structures, please make sure that your STNext structure query is inline with the coding rules.

For example, when keto-enol tautomerism is possible, the structure is coded in the keto form unless the -OH group of the enol form is bonded to a fully conjugated carbocyclic ring (e.g., benzene). For this reason, the search structures should be drawn as follows:

For detailed information, please refer to the “Tautomerism” section in Chapter 8: Functional Groups of the CPI Chemical Indexing User Guide.

Important: It is important to note that in some cases, the generated codes of the tautomeric structure are nevertheless 100% correct. We therefore highly recommend to always check the corresponding script again for correctness.

Back to Top

Two or More X Nodes at the Same Atom Generate Wrong Strategy

Issue: If two or more X nodes (generic node for halogen) are attached on the same atom (e.g., benzene-CCl2 or benzene-CF3), a wrong fragcode strategy is generated.

Workaround: Use R-groups containing F, Cl, Br, I.

Back to Top

Carbohydrates Require Negation Code Revision

The indexing of carbohydrates includes the required code L8 as well as code K0.

Issue: In the fragmentation code strategy for carbohydrates, the code K0 is included in the negation codes. As a consequence, relevant records are not found since K0 is usually indexed for carbohydrates.

Workaround: For carbohydrates, the code K0 needs to be manually deleted from the negation codes. In addition, codes K1-9 and L1-L7 and L9 should be added since it will lead to more accurate results by eliminating those structures that have other functional groups present which are not part of the original structure.

Example: BETA-D-METHYLGALACTOSIDE (DCR-83195)

Indexed fragmentation codes

     M2 *01*   F012 F013 F014 F015 F016 F123 H4 H404 H423 H481 H5 H521 H8 K0 L8

               L815 L821 L831 M210 M211 M272 M281 M311 M321 M342 M373 M391 M413

               M431 M510 M521 M530 M540 M782 P220 P420 P943 Q261 R032  M905

               M904

Query Structure in STNext

Note: In order to avoid -OH and -SH Substituents on Saturated Heterocycles, the hydrogens of the hydroxy groups should be explicitly present.

Autogenerated Fragmentation Code Strategy from the Structure Editor

=>s (M210 OR M211)/M0,M2,M3,M4 \>_line1

=>s (M413(P)F123(P)H423(P)H481(P)H521)/M0,M2,M3,M4 \>_line2

=>s _line2(P)(M521(P)M510(P)M530(P)M540)/M0,M2,M3,M4 \>_line3

=>s _line3(P)((M272 OR M270)(P)M281(P)M311(P)M321(P)M342(P)(M373 OR M370)(P)M391)/M0,M2,M3,M4 \>_line4

=>s _line4(P)_line1 \>_line5

=>s _line5(P)(F012(P)F013(P)F014(P)F015(P)F016(P)H404)/M0,M2,M3,M4 \>_line6

=>s (_line2(P)M900/M0) OR (_line3(P)M901/M2,M3,M4) OR (_line5(P)M902/M2,M3,M4) OR _line6

\>_line7

=>s _line7(NOTP)(H1 OR H2 OR H3 OR H6 OR H7 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR

K0 OR M1)/M2,M3,M4 \>_line8

 

Issue: The indexed code K0 is included in the negation codes and has to be deleted.

Manually Corrected Codes

=>s (M210 OR M211)/M0,M2,M3,M4 \>_line1

=>s (M413(P)F123(P)H423(P)H481(P)H521)/M0,M2,M3,M4 \>_line2

=>s _line2(P)(M521(P)M510(P)M530(P)M540)/M0,M2,M3,M4 \>_line3

=>s _line3(P)((M272 OR M270)(P)M281(P)M311(P)M321(P)M342(P)(M373 OR M370)(P)M391)/M0,M2,M3,M4 \>_line4

=>s _line4(P)_line1 \>_line5

=>s _line5(P)(F012(P)F013(P)F014(P)F015(P)F016(P)H404)/M0,M2,M3,M4 \>_line6

=>s (_line2(P)M900/M0) OR (_line3(P)M901/M2,M3,M4) OR (_line5(P)M902/M2,M3,M4) OR _line6

\>_line7

=>s _line7(NOTP)(H1 OR H2 OR H3 OR H6 OR H7 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K1 OR K2 OR K3 OR K4 OR K5 OR K6 OR K7 OR K8 OR K9 OR L1 OR L2 OR L3 OR L4 OR L5 OR L6 OR L7 OR L9 OR M1)/M2,M3,M4 \>_line8

 

Solution: K0 is removed (mandatory). To enhance accuracy of results, codes K and L are added to the required K0 and L8.

Back to Top

Code Generation for Steroids Not Supported

Issue: For the chemotype of steroids (i.e., cholesterol), a wrong fragcode strategy is generated. There is no workaround.

Back to Top

L9 Code Set Should Be Manually Edited

The DCR indexing of the L9 code set is not consistent due to the complexity of complete recognition of these structural elements within a chemical structure. Therefore, fragmentation code strategies which include such codes may miss relevant records.

The L9 code set includes:

Best Practice: Delete those codes from the fragmentation code strategy.

Back to Top

RIN Codes for Certain Non-Aromatic Versions of Polycyclic Carbocycles To Be Edited

There is RIN indexing for complete spiro systems (e.g., RIN 06706 for 9,9′- Spirobifluorene). In certain cases, there is additional RIN indexing for the individual ring systems that are joined by the spiro link. Usually, ring index numbers apply to a ring system irrespective of the degree of unsaturation; there are a small number of polycyclic carbocyclic ring systems where there is no specific code for the aromatic version of the system and a specific code for the non-aromatic version of the ring system (even though it is the same ring system with all of the benzene rings wholly or partially hydrogenated – i.e., no intact aromatic ring system or quinoid variant thereof present in the system).

The following chemotypes are affected:

Issue: In STNext, it may occur that even for aromatic systems of the type described above, the respective RIN codes for the non-aromatic versions are included. For instance, for 9,9′-Spirobifluorene (CAS-Nr.: 159-66-0), a fully aromatic system, only the RIN code 06706 relating to the spiro system should be applied, but on STNext, the wrong RIN code 03126 relating to non-aromatic systems (and triggered by the presence of code G720) is additionally included. This leads to a reduced answer set, and consequently, relevant hits are missed.

Best Practice: Check your fragmentation code script to ensure that, for instance, RIN code 03126 is not applied if only G310 and not G720 is present (and similarly check for the other systems listed above).

Example: Comparison of fragmentation code strategies on STNext for STR1, STR2, and STR3:

This specific example relates to the following codes

Autogenerated Fragmentation Code Strategy for STR1: RIN 06706 is correct, RIN 03126 is wrong (to be deleted manually from the STNext fragmention code script).

=>s (M414(P)G041(P)G310(P)G399(P)M532)/M0,M2,M3,M4 \>_line1

=>s _line1(P)(M610(P)M510(P)M520(P)M540)/M0,M2,M3,M4 \>_line2

=>s _line2(P)(M280(P)M320)/M0,M2,M3,M4 \>_line3

=>s _line3(P)(03126(P)06706)/RIN \>_line4

=>s _line4(P)(G031(P)G039)/M0,M2,M3,M4 \>_line5

=>s (_line1(P)M900/M0) OR (_line2(P)M901/M2,M3,M4) OR (_line4(P)M902/M2,M3,M4) OR _line5

\>_line6

=>s _line6(NOTP)(H1 OR H2 OR H3 OR H4 OR H5 OR H6 OR H7 OR H8 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line7

 

Autogenerated Fragmentation Code Strategy for STR2: RINs 03126 and 06706 are correct.

=>s (M414(P)G041(P)G052(P)G310(P)G720(P)M531)/M0,M2,M3,M4 \>_line1

=>s _line1(P)(M541(P)M610(P)M510(P)M520)/M0,M2,M3,M4 \>_line2

=>s _line2(P)(M280(P)M320)/M0,M2,M3,M4 \>_line3

=>s _line3(P)(03126(P)06706)/RIN \>_line4

=>s _line4(P)(G031(P)G039)/M0,M2,M3,M4 \>_line5

=>s (_line1(P)M900/M0) OR (_line2(P)M901/M2,M3,M4) OR (_line4(P)M902/M2,M3,M4) OR _line5

\>_line6

=>s _line6(NOTP)(H1 OR H2 OR H3 OR H4 OR H5 OR H6 OR H7 OR H8 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line7

 

Autogenerated Fragmentation Code Strategy for STR3: RINs 03126 and 06706 are correct.

=>s (M415(P)G052(P)G720(P)G799)/M0,M2,M3,M4 \>_line1

=>s _line1(P)(M542(P)M610(P)M510(P)M520(P)M530)/M0,M2,M3,M4 \>_line2

=>s _line2(P)(M280(P)M320)/M0,M2,M3,M4 \>_line3

=>s _line3(P)(03126(P)06706)/RIN \>_line4

=>s _line4(P)(G031(P)G039)/M0,M2,M3,M4 \>_line5

=>s (_line1(P)M900/M0) OR (_line2(P)M901/M2,M3,M4) OR (_line4(P)M902/M2,M3,M4) OR _line5

\>_line6

=>s _line6(NOTP)(H1 OR H2 OR H3 OR H4 OR H5 OR H6 OR H7 OR H8 OR H9 OR J0 OR J1 OR J2 OR J3 OR J4 OR J5 OR J9 OR K0 OR M1)/M2,M3,M4 \>_line7

 

Note: STN Express strategy for STR2 and STR3 is incomplete as RIN code 03126 is missing. However, the omission of 03126 will probably have no effect on the retrieval (or at worst a marginal effect) since RIN 06706 relates to a spiro system linking 2 fluorene ring systems together. This means that when RIN 06706 is present along with G720, it implies that the ring system it applies to is a hydrogenated fluorine.

Back to Top

Missing J-Codes for Certain Types of Fused Heterocycles Leading to Loss of Precision

Rule: “If the atom bonded to the functional group is a non-angular C atom in a bridged ring system, and if the C atom could be seen as being a member of different sized rings, the smallest ring size is chosen, even if it is of lower priority than the larger ring.”

In the example below for STR1, the non-angular C atom that is substituted by a functional group (Oxo) is part of two, 6-membered rings (the bridged all carbon and nitrogen containing rings). For STR2, the bridged all-carbon ring is 5-membered, and the nitrogen-containing heterocyclic ring is 6-membered. According to the rule, the system should generated J521 for STR1 and J561 for STR2.

Issue: For fused heterocycles of the chemotype of STR2, the respective J-code is missing in the fragmentation code strategy. For the example above, STNext fails to generate code J561 for STR2. The issue affects structures with a non-angular C atom substituted by oxo (J561 is missing) or thio (J596 is missing); the corresponding imino chemotype is not affected. Furthermore, it affects not only nitrogen heterocycles, but also heterocycles containing other heteroatoms. The consequence of this issue is loss of precision resulting in a larger answer set. For the example above, STR2 leads to 40 hits in WPIX with the automatically generated fragmentation code strategy (lacking J561), whereas the corrected strategy (including J561) leads to 7 hits.

Workaround: Add the respective J-codes for the affected chemotypes to enhance precision.

Back to Top

Back to Structure Searching of Derwent World Patent Index Chemical Fragmentation Codes