Indexing in MELD
When interacting with MELD we often need to give the index of a specific atom or residue. This document describes how indexing works in MELD and explains the various methods of indexing.
MELD uses zero-based indexing internally
MELD is based on the python programming language, which, like most modern programming languages, uses zero-based indexing. However, in structural biology, we often use one-based indexing. The difference is that zero-based indexing starts counting from zero, while one-based indexing starts from one.
Internally, MELD uses zero-based indexing, but provides various methods for using one-based indexing.
To help eliminate errors, all functions in meld that take an atom index require that it is
AtomIndex. This is effectively just an integer, but it has be labeled as
AtomIndex to indicate that it is a zero-based absolute atom index. Similarly,
functions that take a residue index require that it has type
Functions for indexing
The two primary ways for indexing are both methods of the sytem object:
system.index.atom(resid, atom_name, expected_resname=None, chainid=None, one_based=False)
system.index.residue(resid, expected_resname=None, chainid=None, one_based=False)
inex.atom will return a zero-based absolute
index.residue will return a zero-based absolute
resname to catch errors
Indexing can be tricky and errors can result in strange behavior, as e.g. restraints may be created between the wrong atoms.
To help catch errors, it is possible to specify
rexpcected_resname is specified, calls to
index.residue will check that actual residue name that is found
Note that the residue names will be those after processing by
tleap, so they may not correspond
exactly to those in a pdb file. Normally, the
expected_resname will be three characters in all-caps,
Using one-based indexing
By default both
index.residue use zero-based indexing,
resid start from zero. To use one-based indexing
one_based=True, which will cause both
be interpreted as one-based.
Using relative indexing
By default, the
resid refers to absolute residue index, which starts from zero
(one for one-based indexing) and does not consider which chain the residue resides in.
The ordering of residues corresponds to the order that sub-systems were added when the system
chainid is set, then
resid refers to the relative index of a residue
within the corresponding chain. So,
resid=0, chainid=0 would refer to the first residue
in the first chain (assuming zero-based indexing).
Chains are indexed sequentially starting from zero (one for one-based indexing). The order of chains is partially determined by the order that sub-systems are added in.
When created by sequence, each sub-system corresponds to exactly one chain. When created from a pdb file, each sub-system will have the same number of chains as the pdb file has unique chain indentifiers. The ordering of the chains is alphabetical with a blank chain identifier coming first, followed by “A”, etc.
To be more concrete, consider the following example:
A sub-system is added from sequence
A second subsystem is added from a pdb file
The pdb file contains two chain identifiers, “A” and “B”.
In this case, the
chainid would be defined as follows:
0: the chain added by sequence
1: chain “A” from the pdb file
2: chain “B” from the pdb file
In some cases, MELD will add additional residues that were not present in either
the sequence or pdb file. Examples include extra residues added to encode RDC
alignment tensors, which are added the
solvent and ions that are added when explicit solvent calculations are specified.
These additional residues are considered to be in an additional chain that is
added in the final position.