Indexing in MELD
When interacting with MELD we often need to give the index of a specific atom or residue. This document describes how indexing works in MELD and explains the various methods of indexing.
MELD uses zero-based indexing internally
MELD is based on the python programming language, which, like most modern programming languages, uses zero-based indexing. However, in structural biology, we often use one-based indexing. The difference is that zero-based indexing starts counting from zero, while one-based indexing starts from one.
Internally, MELD uses zero-based indexing, but provides various methods for using one-based indexing.
To help eliminate errors, all functions in meld that take an atom index require that it is
of type AtomIndex
. This is effectively just an integer, but it has be labeled as
an AtomIndex
to indicate that it is a zero-based absolute atom index. Similarly,
functions that take a residue index require that it has type ResidueIndex
.
Functions for indexing
The two primary ways for indexing are both methods of the sytem object:
system.index.atom(resid, atom_name, expected_resname=None, chainid=None, one_based=False)
system.index.residue(resid, expected_resname=None, chainid=None, one_based=False)
Calls to inex.atom
will return a zero-based absolute AtomIndex
.
Calls to index.residue
will return a zero-based absolute ResidueIndex
.
Specifying resname
to catch errors
Indexing can be tricky and errors can result in strange behavior, as e.g. restraints may be created between the wrong atoms.
To help catch errors, it is possible to specify expected_resname
. When
rexpcected_resname
is specified, calls to index.atom
and
index.residue
will check that actual residue name that is found
matches expected_resname
.
Note that the residue names will be those after processing by tleap
, so they may not correspond
exactly to those in a pdb file. Normally, the expected_resname
will be three characters in all-caps,
e.g. "ALA"
.
Using one-based indexing
By default both index.atom
and index.residue
use zero-based indexing,
where both chainid
and resid
start from zero. To use one-based indexing
set one_based=True
, which will cause both resid
and chainid
to
be interpreted as one-based.
Using relative indexing
By default, the resid
refers to absolute residue index, which starts from zero
(one for one-based indexing) and does not consider which chain the residue resides in.
The ordering of residues corresponds to the order that sub-systems were added when the system
was built.
If chainid
is set, then resid
refers to the relative index of a residue
within the corresponding chain. So, resid=0, chainid=0
would refer to the first residue
in the first chain (assuming zero-based indexing).
Ordering of chainids
Chains are indexed sequentially starting from zero (one for one-based indexing). The order of chains is partially determined by the order that sub-systems are added in.
When created by sequence, each sub-system corresponds to exactly one chain. When created from a pdb file, each sub-system will have the same number of chains as the pdb file has unique chain indentifiers. The ordering of the chains is alphabetical with a blank chain identifier coming first, followed by “A”, etc.
To be more concrete, consider the following example:
A sub-system is added from sequence
A second subsystem is added from a pdb file
The pdb file contains two chain identifiers, “A” and “B”.
In this case, the chainid
would be defined as follows:
0: the chain added by sequence
1: chain “A” from the pdb file
2: chain “B” from the pdb file
In some cases, MELD will add additional residues that were not present in either
the sequence or pdb file. Examples include extra residues added to encode RDC
alignment tensors, which are added the RdcAlignmentPatcher
and
solvent and ions that are added when explicit solvent calculations are specified.
These additional residues are considered to be in an additional chain that is
added in the final position.