Indexing in MELD

When interacting with MELD we often need to give the index of a specific atom or residue. This document describes how indexing works in MELD and explains the various methods of indexing.

MELD uses zero-based indexing internally

MELD is based on the python programming language, which, like most modern programming languages, uses zero-based indexing. However, in structural biology, we often use one-based indexing. The difference is that zero-based indexing starts counting from zero, while one-based indexing starts from one.

Internally, MELD uses zero-based indexing, but provides various methods for using one-based indexing.

To help eliminate errors, all functions in meld that take an atom index require that it is of type AtomIndex. This is effectively just an integer, but it has be labeled as an AtomIndex to indicate that it is a zero-based absolute atom index. Similarly, functions that take a residue index require that it has type ResidueIndex.

Functions for indexing

The two primary ways for indexing are both methods of the sytem object:

system.index.atom(resid, atom_name, expected_resname=None, chainid=None, one_based=False)
system.index.residue(resid, expected_resname=None, chainid=None, one_based=False)

Calls to inex.atom will return a zero-based absolute AtomIndex. Calls to index.residue will return a zero-based absolute ResidueIndex.

Specifying `resname` to catch errors

Indexing can be tricky and errors can result in strange behavior, as e.g. restraints may be created between the wrong atoms.

To help catch errors, it is possible to specify expected_resname. When rexpcected_resname is specified, calls to index.atom and index.residue will check that actual residue name that is found matches expected_resname.

Note that the residue names will be those after processing by tleap, so they may not correspond exactly to those in a pdb file. Normally, the expected_resname will be three characters in all-caps, e.g. "ALA".

Using one-based indexing

By default both index.atom and index.residue use zero-based indexing, where both chainid and resid start from zero. To use one-based indexing set one_based=True, which will cause both resid and chainid to be interpreted as one-based.

Using relative indexing

By default, the resid refers to absolute residue index, which starts from zero (one for one-based indexing) and does not consider which chain the residue resides in. The ordering of residues corresponds to the order that sub-systems were added when the system was built.

If chainid is set, then resid refers to the relative index of a residue within the corresponding chain. So, resid=0, chainid=0 would refer to the first residue in the first chain (assuming zero-based indexing).

Ordering of `chainids`

Chains are indexed sequentially starting from zero (one for one-based indexing). The order of chains is partially determined by the order that sub-systems are added in.

When created by sequence, each sub-system corresponds to exactly one chain. When created from a pdb file, each sub-system will have the same number of chains as the pdb file has unique chain indentifiers. The ordering of the chains is alphabetical with a blank chain identifier coming first, followed by “A”, etc.

To be more concrete, consider the following example:

A sub-system is added from sequence
A second subsystem is added from a pdb file
- The pdb file contains two chain identifiers, “A” and “B”.

In this case, the chainid would be defined as follows:

0: the chain added by sequence
1: chain “A” from the pdb file
2: chain “B” from the pdb file

In some cases, MELD will add additional residues that were not present in either the sequence or pdb file. Examples include extra residues added to encode RDC alignment tensors, which are added the RdcAlignmentPatcher and solvent and ions that are added when explicit solvent calculations are specified. These additional residues are considered to be in an additional chain that is added in the final position.