MDL Molfiles, RGfiles, SDfiles, Rxnfiles, RDfiles

Codenames: mol, mol:V3, mol:V3ec, mol:V3ea, rgf, sdf, rxn, rxn:V3, rdf, file extensions: .mol, .sdf, .rxn, .rdf

Contents

MDL Molfiles, RGfiles, SDfiles, Rxnfiles, RDfiles formats

Marvin imports and exports MDL Molfiles, RGfiles, SDfiles, REACCS Rxnfiles and RDfiles. The following features are supported in V2.0 molfiles:

Extended molfiles (V3.0). If the number of atoms or bonds in a molecule exceeds 999, then the extended format is used. In an extended molfile, the following properties are supported:

Reaction files (V2.0). A reaction file consists of a REACTANT block, a PRODUCT block, and (optionally) an AGENT block. Reaction files containing reaction agents are non-standard.

A reaction agent is a molecule structure that does not take part in the chemical reaction, but is added to the reaction equation for informative purpose only. Agents are normally displayed graphically above the reaction arrow, added to the reaction file after the reactants and the products. The number of agents is displayed in the file header (after the number of reactants and the number of products) if it is non-zero. Reaction files containing agents are non-standard.

Extended reaction files (V3.0). This format is used automatically if a reaction includes Rgroups and/or the number of atoms or bonds exceeds 999. An extended reaction file consists of a REACTANT block, a PRODUCT block, (optionally) an AGENT block, and (optionally) RGROUP blocks.

In SDfiles read by marvin, the name field is special, it overrides the molecule name specified in the molfile part.

A special feature of Marvin RGfiles is that they can contain a reaction as the root structure. This feature is non-standard, such mixed RG/Rxnfiles can only be imported by Marvin.

Special data types in SDfile and RDfile fields

Data fields store strings normally, but other data types are also supported in Marvin, in a non-standard way. If the data starts with the "MProp:scalar:" or "MProp:array:" string, then it can have a special type:

Molfile compression

MarvinSketch and MarvinView can handle compressed molfiles that are typically five times smaller than their original, uncompressed version. This reduces the download time of HTML pages containing molecule applets.

Compressed molfiles can be created by choosing Edit/Source, then Format/Compressed Molfile in MarvinSketch or MarvinView. If you cannot find the Edit menu, then click on the upper left arrow in MarvinSketch, right click or double click the compound in MarvinView.

Codenames: csmol, csrgf, cssdf, csrxn, csrdf, file extensions: .csmol, .cssdf, .csrxn, .csrdf

Implicit hydrogens on aromatic nitrogen

The mol family of formats cannot store the implicit hydrogens of atoms, so it is calculated from the bond orders. This is always correct when the molecule is in Kekule format, but causes problems when nitrogen-containing aromatic rings are saved with aromatic bond types.

To counteract the information loss, implicit hydrogen count is stored in these formats as attached data on the nitrogen. The associated data sgroup has field name of MRV_IMPLICIT_H and value IMPL_H<n> where n is the number of implicit hydrogens. These special data attachments are then converted back to implicit hydrogens upon import. When the file is read in ISIS/Draw, the lost hydrogen will not reappear, but the attached data will be visible as a warning.

Multipage molecular document

To save information about multipage molecular document, properties are stored as attached data. The field names and values are the following:

Import options

Xsg Expand all S-groups.
Usg Ungroup all S-groups.
bXXX     Set C-C bond length. The molecule file is supposed to store coordinates in 1.54Å/XXX units. Marvin uses Å units internally, thus coordinates are rescaled by factor 1.54/XXX at import if XXX is nonzero. If XXX = 0, then coordinates are not rescaled. Examples: "caffeine.mol(b0)" or "caffeine.mol(b1.54)" (bond lengths are in angstroms), "caffeine.mol(b0.825)" (bond lengths are in ISISDraw's units).
Default: C-C bond length is calculated by averaging in 2D V2 molfiles, Å units are used in any other case.
nomolp Read molecule type data fields ($DTYPE $MFMT and $RFMT in RDfiles) as strings instead of Molecule objects.

Export options

... Basic options for aromatization and H atom adding/removal.
V2 or V3     Force writing V2 or V3 (extended) molfiles. The default format is V2 for simple molecules, V3 if the number of atoms or bonds exceeds 999 and in case of reactions with Rgroups. Example: "mol:V3"
P Write floating point numbers with maximum precision. Only meaningful for V3 molfiles. Example: "mol:V3P"
bXXX Set C-C bond length. If XXX is nonzero, then the exported atom coordinates are scaled in such a way that the average C-C bond length will be the specified number. If XXX = 0, then coordinates are not rescaled.
Examples: "mol:b0" or "mol:b1.54" (bond lengths are in angstroms), "mol:b1.54a" (set bond length, aromatize).
Default: 0.825 in V2 format for 2D molecules, 1.54 (Å units) in any other case.
ec Convert to enhanced stereo representation, considering the chiral flag. Only meaningful with option V3. (Chiral centers are grouped into ABS or an AND stereo group, depending on the chiral flag. When the input molecule contained any enhanced stereo labels, the unlabeled stereo centers always will form a new AND group.) Example: "mol:V3ec"
ea Convert to enhanced stereo representation, assuming absolute stereochemistry. Only meaningful with option V3. (Chiral centers are grouped into the ABS group. In case the input molecule already contains enhanced stereo labels, the behaviour is similar to the one described at option ec above.) Example: "mol:V3ec"

Reference