PDBModel¶
-
class
biskit.PDBModel(source=None, pdbCode=None, noxyz=0, skipRes=None, headPatterns=[])[source]¶ Bases:
objectStore and manipulate coordinates and atom infos stemming from a PDB file. Coordinates are stored in the numpy array ‘xyz’; the additional atom infos from the PDB (name, residue_name, and many more) are efficiently stored in a
PDBProfilesinstance ‘atoms’ which can be used to also associate arbitrary other data to the atoms. Moreover, a similar collection ‘residues’ can hold data associated to residues (but is initially empty). A normal dictionary ‘info’ accepts any information about the whole model.For detailed documentation, see http://biskit.pasteur.fr/doc/handling_structures/PDBModel
- @todo:
- outsource validSource into PDBParserFactory
- prevent repeated loading of test PDB for each test
Methods Overview
__init__Examples: addChainFromSegidTakes the last letter of the segment ID and adds it as chain ID. addChainIdAssign consecutive chain identifiers A - Z to all atoms. argsortPrepare sorting atoms within residues according to comparison function. atom2chainIndicesConvert atom indices to chain indices. atom2chainMaskMask (set to 0) chains for which all atoms are masked (0) in atomMask. atom2resIndicesGet list of indices of residues for which any atom is in indices. atom2resMaskMask (set 0) residues for which all atoms are masked (0) in atomMask. atom2resProfileGet a residue profile where each residue has the value that its first atom has in the atom profile. atomNamesReturn a list of atom names from start to stop RESIDUE index atomRange>>> m.atomRange() == range( m.lenAtoms() )
atomkeyCreate a string key encoding the atom content of this model independent of the order in which atoms appear within residues. biomodelReturn the ‘biologically relevant assembly’ of this model according to the information in the PDB’s BIOMT record (captured in info[‘BIOMT’]). centerGeometric centar of model. centerOfMassCenter of mass of PDBModel. centeredGet model with centered coordinates. chain2atomIndicesConvert chain indices into atom indices. chain2atomMaskConvert chain mask to atom mask. chainBreaksIdentify discontinuities in the molecule’s backbone. chainEndIndexGet the position of the each residue’s last atom. chainIndexGet indices of first atom of each chain. chainMapGet chain index of each atom. cloneClone PDBModel. compareAtomsGet list of atom indices for this and reference model that converts both into 2 models with identical residue and atom content. compareChainsGet list of corresponding chain indices for this and reference model. compressCompress PDBmodel using mask. concatConcatenate atoms, coordinates and profiles. disconnectDisconnect this model from its source (if any). equalsCompares the residue and atom sequence in the given range. extendIndexTranslate a list of positions that is defined, e.g., on residues (/chains) to a list of atom positions AND also return the starting position of each residue (/chain) in the new sub-list of atoms. extendMaskTranslate a mask that is defined,e.g., on residues(/chains) to a mask that is defined on atoms. filterExtract atoms that match a combination of key=values. filterIndexGet atom positions that match a combination of key=values. fitLeast-square fit this model onto refMode getAtomsGet atom CrossViews that can be used like dictionaries. getPdbCodeReturn pdb code of model. getXyzGet coordinates, fetch from source PDB or pickled PDBModel, if necessary. index2mapCreate a map of len_i length, giving the residue(/chain) numer of each atom, from list of residue(/chain) starting positions. indicesGet atom indices conforming condition. indicesFromGet atom indices conforming condition applied to an atom profile. keepReplace atoms,coordinates,profiles of this(!) model with sub-set. lenAtomsNumber of atoms in model. lenBiounitsNumber of biological assemblies defined in PDB BIOMT record, if any. lenChainsNumber of chains in model. lenResiduesNumber of residues in model. magicFitSuperimpose this model onto a ref. map2indexIdentify the starting positions of each residue(/chain) from a map giving the residue(/chain) number of each atom. maskGet atom mask. maskBBShort cut for mask of all backbone atoms. maskCAShort cut for mask of all CA atoms. maskCBShort cut for mask of all CB I{and} CA of GLY. maskDNAShort cut for mask of all atoms in DNA (based on residue name). maskFCreate list whith result of atomFunction( atom ) for each atom. maskFromCreate an atom mask from the values of a specific profile. maskHShort cut for mask of hydrogens. maskH2OShort cut for mask of all atoms in residues named TIP3, HOH and WAT maskHeavyShort cut for mask of all heavy atoms. maskHetatmShort cut for mask of all HETATM maskNAShort cut for mask of all atoms in DNA or RNA (based on residue name). maskProteinShort cut for mask containing all atoms of amino acids. maskRNAShort cut for mask of all atoms in RNA (based on residue name). maskSolventShort cut for mask of all atoms in residues named TIP3, HOH, WAT, Na+, Cl-, CA, ZN massMolecular weight of PDBModel. massesCollect the molecular weight of all atoms in PDBModel. mergeChainsMerge two adjacent chains. mergeResiduesMerge two adjacent residues. plotGet a quick & dirty overview over the content of a PDBModel. profileUse:: profile( name, updateMissing=0) -> atom or residue profile profile2atomMaskSame as profile2mask, but converts residue mask to atom mask.profile2maskparam cutoff_min: low value cutoff (all values >= cutoff_min) :type cutoff_min: float :param cutoff_max: high value cutoff (all values < cutoff_max) :type cutoff_max: float profile2resListGroup the profile values of each residue’s atoms into a separate list. profileChangedFromDiscCheck if profile has changed compared to source. profileInfoUse: removeConvenience access to the 3 different remove methods. removeProfileRemove residue or atom profile(s) removeResRemove all atoms with a certain residue name. renameAmberResRename special residue names from Amber back into standard names (i.e CYX S{->} CYS ) renumberResiduesMake all residue numbers consecutive and remove any insertion code letters. reportPrint (or return) a brief description of this model. reportAtomsparam i: optional list of atom positions to report (default: all) :type i: [ int ] :return: formatted string with atom and residue names similar to PDB :rtype: str res2atomIndicesConvert residue indices to atom indices. res2atomMaskConvert residue mask to atom mask. res2atomProfileGet an atom profile where each atom has the value its residue has in the residue profile. resEndIndexGet the position of the each residue’s last atom. resIndexGet the position of the each residue’s first atom. resListReturn list of lists of atom pseudo dictionaries per residue, which allows to iterate over residues and atoms of residues. resMapGet list to map from any atom to a continuous residue numbering (starting with 0). resMapOriginalGenerate list to map from any atom to its ORIGINAL(!) PDB residue number. resModelsCreates one new PDBModel for each residue in the parent PDBModel. residusMaximusTake list of value per atom, return list where all atoms of any residue are set to the highest value of any atom in that residue. rmsRmsd between two PDBModels. saveAsPickle this PDBModel to a file, set the ‘source’ field to this file name and mark atoms, xyz, and profiles as unchanged. sequenceAmino acid sequence in one letter code. setPdbCodeSet model pdb code. setSourceparam source: LocalPath OR PDBModel OR str setXyzReplace coordinates. slimRemove xyz array and profiles if they haven’t been changed and could hence be loaded from the source file (only if there is a source file…). sortApply a given sort list to the atoms of this model. sourceFileName of pickled source or PDB file. structureFitStructure-align this model onto a reference model using the external TM-Align program (which needs to be installed). takeExtract a PDBModel with a subset of atoms: takeChainsGet copy of this model with only the given chains. takeResiduesCopy the given residues into a new model. transformTransform coordinates of PDBModel. transformationGet the transformation matrix which least-square fits this model onto the other model. unequalAtomsIdentify atoms that are not matching between two models. unsortUndo a previous sorting on the model itself (no copy). updateRead coordinates, atoms, fileName, etc. validSourceCheck for a valid source on disk. versionwritePdbSave model as PDB file. xplor2amberRename atoms so that tleap from Amber can read the PDB. xyzChangedFromDiscTell whether xyz can currently be reconstructed from a source on disc. xyzIsChangedTell if xyz or atoms have been changed compared to source file or source object (which can be still in memory). Attributes Overview
PDB_KEYSkeys of all atom profiles that are read directly from the PDB file
PDBModel Method & Attribute Details
-
PDB_KEYS= ['name', 'residue_number', 'insertion_code', 'alternate', 'name_original', 'chain_id', 'occupancy', 'element', 'segment_id', 'charge', 'residue_name', 'after_ter', 'serial_number', 'type', 'temperature_factor']¶ keys of all atom profiles that are read directly from the PDB file
-
__init__(source=None, pdbCode=None, noxyz=0, skipRes=None, headPatterns=[])[source]¶ Examples:
PDBModel()creates an empty Model to which coordinates (field xyz) and PDB records (atom profiles) have still to be added.PDBModel( file_name )creates a complete model with coordinates and PDB records from file_name (pdb, pdb.gz, or pickled PDBModel)PDBModel( PDBModel )creates a copy of the given modelPDBModel( PDBModel, noxyz=1 )creates a copy without coordinates
Parameters: - source (str or PDBModel) – str, file name of pdb/pdb.gz file OR pickled PDBModel OR PDBModel, template structure to copy atoms/xyz field from
- pdbCode (str or None) – PDB code, is extracted from file name otherwise
- noxyz (0||1) – 0 (default) || 1, create without coordinates
- headPatterns ([(str, str)]) – [(putIntoKey, regex)] extract given REMARK values
Raises: PDBError – if file exists but can’t be read
-
residues= None¶ save atom-/residue-based values
-
xyzChanged= None¶ monitor changes of coordinates
-
initVersion= None¶ version as of creation of this object
-
info= None¶ to collect further informations
-
report(prnt=True, plot=False, clipseq=60)[source]¶ Print (or return) a brief description of this model.
Parameters: - prnt (bool) – directly print report to STDOUT (default True)
- plot (bool) – show simple 2-D line plot using gnuplot [False]
- clipseq (int) – clip chain sequences at this number of letters [60]
Returns: if prnt==True: None, else: formatted description of this model
Return type: None or str
-
plot(hetatm=False)[source]¶ Get a quick & dirty overview over the content of a PDBModel. plot simply creates a 2-D plot of all x-coordinates versus all y coordinates, colored by chain. This is obviously not publication-quality ;-). Use the Biskit.Pymoler class for real visalization.
Parameters: hetatm (bool) – include hetero & solvent atoms (default False)
-
update(skipRes=None, updateMissing=0, force=0, headPatterns=[])[source]¶ Read coordinates, atoms, fileName, etc. from PDB or pickled PDBModel - but only if they are currently empty. The atomsChanged and xyzChanged flags are not changed.
Parameters: - skipRes (list of str) – names of residues to skip if updating from PDB
- updateMissing (0|1) – 0(default): update only existing profiles
- force (0|1) – ignore invalid source (0) or report error (1)
- headPatterns ([(str, str)]) – [(putIntoKey, regex)] extract given REMARKS
Raises: PDBError – if file can’t be unpickled or read:
-
setXyz(xyz)[source]¶ Replace coordinates.
Parameters: xyz (array) – Numpy array ( 3 x N_atoms ) of float Returns: array( 3 x N_atoms ) or None, old coordinates Return type: array
-
getXyz(mask=None)[source]¶ Get coordinates, fetch from source PDB or pickled PDBModel, if necessary.
Parameters: mask (list of int OR array of 1||0) – atom mask Returns: xyz-coordinates, array( 3 x N_atoms, Float32 ) Return type: array
-
getAtoms(mask=None)[source]¶ Get atom CrossViews that can be used like dictionaries. Note that the direct manipulation of individual profiles is more efficient than the manipulation of CrossViews (on profiles)!
Parameters: mask (list of int OR array of 1||0) – atom mask Returns: list of CrossView dictionaries Return type: [ ProfileCollection.CrossView]
-
profile(name, default=None, update=True, updateMissing=False)[source]¶ - Use::
- profile( name, updateMissing=0) -> atom or residue profile
Parameters: - name (str) – name to access profile
- default – default result if no profile is found, if None,
try to update from source and raise error [None] :type default: any :param update: update from source before returning empty profile [True] :type update: bool :param updateMissing: update from source before reporting missing
profile [False]Raises: ProfileError – if neither atom- nor rProfiles contains |name|
-
profileInfo(name, updateMissing=0)[source]¶ Use:
profileInfo( name ) -> dict with infos about profileParameters: - name (str) – name to access profile
- updateMissing (0|1) –
update from source before reporting missing profile. Guaranteed infos are:
- ’version’ (str)
- ’comment’ (str)
- ’changed’ (1||0)
Raises: ProfileError – if neither atom - nor rProfiles contains |name|
-
removeProfile(*names)[source]¶ Remove residue or atom profile(s)
Use:
removeProfile( str_name [,name2, name3] ) -> 1|0,Parameters: names (str OR list of str) – name or list of residue or atom profiles Returns: 1 if at least 1 profile has been deleted, 0 if none has been found Return type: int
-
xyzIsChanged()[source]¶ Tell if xyz or atoms have been changed compared to source file or source object (which can be still in memory).
Returns: xyz field has been changed with respect to source Return type: (1||0, 1||0)
-
xyzChangedFromDisc()[source]¶ Tell whether xyz can currently be reconstructed from a source on disc. Same as xyzChanged() unless source is another not yet saved PDBModel instance that made changes relative to its own source.
Returns: xyz has been changed Return type: bool
-
profileChangedFromDisc(pname)[source]¶ Check if profile has changed compared to source.
Returns: 1, if profile |pname| can currently not be reconstructed from a source on disc. Return type: int Raises: ProfileError – if there is no atom or res profile with pname
-
slim()[source]¶ Remove xyz array and profiles if they haven’t been changed and could hence be loaded from the source file (only if there is a source file…). AUTOMATICALLY CALLED BEFORE PICKLING Currently also called by deepcopy via getstate
-
validSource()[source]¶ Check for a valid source on disk.
Returns: str or PDBModel, None if this model has no valid source Return type: str or PDBModel or None
-
sourceFile()[source]¶ Name of pickled source or PDB file. If this model has another PDBModel as source, the request is passed on to this one.
Returns: file name of pickled source or PDB file Return type: str Raises: PDBError – if there is no valid source
-
disconnect()[source]¶ Disconnect this model from its source (if any).
Note
If this model has an (in-memory) PDBModel instance as source, the entries of ‘atoms’ could still reference the same dictionaries.
-
sequence(mask=None, xtable={'ca': '+', 'cl-': '-', 'hoh': '~', 'na+': '+', 'nap': 'X', 'ndp': 'X', 'tip3': '~', 'wat': '~'})[source]¶ Amino acid sequence in one letter code.
Parameters: - mask (list or array) – atom mask, to apply before (default None)
- xtable (dict) – dict {str:str}, additional residue:single_letter mapping for non-standard residues (default molUtils.xxDic) [currently not used]
Returns: 1-letter-code AA sequence (based on first atom of each res).
Return type: str
-
xplor2amber(aatm=True, parm10=False)[source]¶ Rename atoms so that tleap from Amber can read the PDB. If HIS residues contain atoms named HE2 or/and HD2, the residue name is changed to HIE or HID or HIP, respectively. Disulfide bonds are not yet identified - CYS -> CYX renaming must be done manually (see AmberParmBuilder for an example). Internally amber uses H atom names ala HD21 while (old) standard pdb files use 1HD2. By default, ambpdb produces ‘standard’ pdb atom names but it can output the less ambiguous amber names with switch -aatm.
Parameters: - change (1|0) – change this model’s atoms directly (default:1)
- aatm (1|0) – use, for example, HG23 instead of 3HG2 (default:1)
- parm10 (1|0) – adapt nucleic acid atom names to 2010 Amber forcefield
Returns: [ {..} ], list of atom dictionaries
Return type: list of atom dictionaries
-
renameAmberRes()[source]¶ Rename special residue names from Amber back into standard names (i.e CYX S{->} CYS )
-
writePdb(fname, ter=1, amber=0, original=0, left=0, wrap=0, headlines=None, taillines=None)[source]¶ Save model as PDB file.
Parameters: - fname (str) – name of new file
- ter (int) –
Option of how to treat the terminal record:
- 0 - don’t write any TER statements
- 1 - restore original TER statements (doesn’t work, if preceeding atom has been deleted) [default]
- 2 - put TER between all detected chains
- 3 - as 2 but also detect and split discontinuous chains
- amber (1||0) – amber formatted atom names (implies ter=3, left=1, wrap=0) (default 0)
- original (1||0) – revert atom names to the ones parsed in from PDB (default 0)
- left (1||0) – left-align atom names (as in amber pdbs)(default 0)
- wrap (1||0) – write e.g. ‘NH12’ as ‘2NH1’ (default 0)
- headlines (list of tuples) – [( str, dict or str)], list of record / data tuples:: e.g. [ (‘SEQRES’, ‘ 1 A 22 ALA GLY ALA’), ]
- taillines (list of tuples) – same as headlines but appended at the end of file
-
saveAs(path)[source]¶ Pickle this PDBModel to a file, set the ‘source’ field to this file name and mark atoms, xyz, and profiles as unchanged. Normal pickling of the object will only dump those data that can not be reconstructed from the source of this model (if any). saveAs creates a ‘new source’ without further dependencies.
Parameters: path (str OR LocalPath instance) – target file name
-
maskF(atomFunction, numpy=1)[source]¶ Create list whith result of atomFunction( atom ) for each atom. (Depending on the return value of atomFunction, the result is not necessarily a mask of 0 and 1. Creating masks should be just the most common usage).
Note:
This method is slow compared to maskFrom because the dictionaries that are given to the atomFunction have to be created from aProfiles on the fly. If performance matters, better combine the result from several maskFrom calls, e.g. instead of:
r = m.maskF( lambda a: a['name']=='CA' and a['residue_name']=='ALA' )
use:
r = m.maskFrom( 'name', 'CA' ) * m.maskFrom('residue_name', 'ALA')
Parameters: - atomFunction (1||0) – function( dict_from_aProfiles.toDict() ), true || false (Condition)
- numpy (int) – 1(default)||0, convert result to Numpy array of int
Returns: Numpy array( [0,1,1,0,0,0,1,0,..], Int) or list
Return type: array or list
-
maskFrom(key, cond)[source]¶ Create an atom mask from the values of a specific profile. Example, the following three statements are equivalent:
>>> mask = m.maskFrom( 'name', 'CA' ) >>> mask = m.maskFrom( 'name', lambda a: a == 'CA' ) >>> mask = N0.array( [ a == 'CA' for a in m.atoms['name'] ] )
However, the same can be also achieved with standard numpy operators:
>>> mask = numpy.array(m.atoms['name']) == 'CA'
Parameters: - key (str) – the name of the profile to use
- cond (function OR any OR [ any ]) – either a function accepting a single value or a value or an iterable of values (to allow several alternatives)
Returns: array or list of indices where condition is met
Return type: list or array of int
-
maskCA(force=0)[source]¶ Short cut for mask of all CA atoms.
Parameters: force (0||1) – force calculation even if cached mask is available Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskBB(force=0, solvent=0)[source]¶ Short cut for mask of all backbone atoms. Supports standard protein and DNA atom names. Any residues classified as solvent (water, ions) are filtered out.
Parameters: - force (0||1) – force calculation even if cached mask is available
- solvent (1||0) – include solvent residues (default: false)
Returns: array( 1 x N_atoms ) of 0||1
Return type: array
-
maskHeavy(force=0)[source]¶ Short cut for mask of all heavy atoms. (‘element’ <> H)
Parameters: force (0||1) – force calculation even if cached mask is available Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskH()[source]¶ Short cut for mask of hydrogens. (‘element’ == H)
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskCB()[source]¶ Short cut for mask of all CB I{and} CA of GLY.
Returns: mask of all CB plus CA of GLY Return type: array
-
maskH2O()[source]¶ Short cut for mask of all atoms in residues named TIP3, HOH and WAT
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskSolvent()[source]¶ Short cut for mask of all atoms in residues named TIP3, HOH, WAT, Na+, Cl-, CA, ZN
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskHetatm()[source]¶ Short cut for mask of all HETATM
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskProtein(standard=0)[source]¶ Short cut for mask containing all atoms of amino acids.
Parameters: standard (0|1) – only standard residue names (not CYX, NME,..) (default 0) Returns: array( 1 x N_atoms ) of 0||1, mask of all protein atoms (based on residue name) Return type: array
-
maskDNA()[source]¶ Short cut for mask of all atoms in DNA (based on residue name).
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskRNA()[source]¶ Short cut for mask of all atoms in RNA (based on residue name).
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
maskNA()[source]¶ Short cut for mask of all atoms in DNA or RNA (based on residue name).
Returns: array( 1 x N_atoms ) of 0||1 Return type: array
-
indicesFrom(key, cond)[source]¶ Get atom indices conforming condition applied to an atom profile. Corresponds to:
>>> numpy.nonzero( m.maskFrom( key, cond) )
Parameters: - key (str) – the name of the profile to use
- cond (function OR any OR [any]) – either a function accepting a single value or a value or an iterable of values
Returns: array of indices where condition is met
:rtype : array of int
-
indices(what)[source]¶ Get atom indices conforming condition. This is a convenience method to ‘normalize’ different kind of selections (masks, atom names, indices, functions) to indices as they are e.g. required by
PDBModel.take.Parameters: what (function OR list of str or int OR int) – Selection:: - function applied to each atom entry,
e.g. lambda a: a[‘residue_name’]==’GLY’- list of str, allowed atom names
- list of int, allowed atom indices OR mask with only 1 and 0
- int, single allowed atom index
Returns: N_atoms x 1 (0||1 ) Return type: Numeric array Raises: PDBError – if what is neither of above
-
mask(what)[source]¶ Get atom mask. This is a convenience method to ‘normalize’ different kind of selections (masks, atom names, indices, functions) to a mask as it is e.g. required by
PDBModel.compress.Parameters: what (function OR list of str or int OR int) – Selection:: - function applied to each atom entry,
e.g. lambda a: a[‘residue_name’]==’GLY’- list of str, allowed atom names
- list of int, allowed atom indices OR mask with only 1 and 0
- int, single allowed atom index
Returns: N_atoms x 1 (0||1 ) Return type: Numeric array Raises: PDBError – if what is neither of above
-
index2map(index, len_i)[source]¶ Create a map of len_i length, giving the residue(/chain) numer of each atom, from list of residue(/chain) starting positions.
Parameters: - index ([ int ] or array of int) – list of starting positions, e.g. [0, 3, 8]
- len_i (int) – length of target map, e.g. 10
Returns: list mapping atom positions to residue(/chain) number, e.g. [0,0,0, 1,1,1,1,1, 2,2] from above example
Return type: array of int (and of len_i length)
-
map2index(imap)[source]¶ Identify the starting positions of each residue(/chain) from a map giving the residue(/chain) number of each atom.
Parameters: imap ([ int ]) – something like [0,0,0,1,1,1,1,1,2,2,2,…] Returns: list of starting positions, e.g. [0, 3, 8, …] in above ex. Return type: array of int
-
extendMask(mask, index, len_i)[source]¶ Translate a mask that is defined,e.g., on residues(/chains) to a mask that is defined on atoms.
:param mask : mask marking positions in the list of residues or chains :type mask : [ bool ] or array of bool or of 1||0 :param index: starting positions of all residues or chains :type index: [ int ] or array of int :param len_i: length of target mask :type len_i: int
Returns: mask that blows up the residue / chain mask to an atom mask Return type: array of bool
-
extendIndex(i, index, len_i)[source]¶ Translate a list of positions that is defined, e.g., on residues (/chains) to a list of atom positions AND also return the starting position of each residue (/chain) in the new sub-list of atoms.
:param i : positions in higher level list of residues or chains :type i : [ int ] or array of int :param index: atomic starting positions of all residues or chains :type index: [ int ] or array of int :param len_i: length of atom index (total number of atoms) :type len_i: int
Returns: (ri, rindex) - atom positions & new index Return type: array of int, array of int
-
atom2resMask(atomMask)[source]¶ Mask (set 0) residues for which all atoms are masked (0) in atomMask.
Parameters: atomMask (list/array of int) – list/array of int, 1 x N_atoms Returns: 1 x N_residues (0||1 ) Return type: array of int
-
atom2resIndices(indices)[source]¶ Get list of indices of residues for which any atom is in indices.
Note: in the current implementation, the resulting residues are returned in their old order, regardless of the order of input positions.
Parameters: indices (list of int) – list of atom indices Returns: indices of residues Return type: list of int
-
res2atomMask(resMask)[source]¶ Convert residue mask to atom mask.
Parameters: resMask (list/array of int) – list/array of int, 1 x N_residues Returns: 1 x N_atoms Return type: array of int
-
res2atomIndices(indices)[source]¶ Convert residue indices to atom indices.
Parameters: indices (list/array of int) – list/array of residue indices Returns: array of atom positions Return type: array of int
-
atom2chainIndices(indices, breaks=0)[source]¶ Convert atom indices to chain indices. Each chain is only returned once.
Parameters: - indices (list of int) – list of atom indices
- breaks (0||1) – look for chain breaks in backbone coordinates (def. 0)
Returns: chains any atom which is in indices
Return type: list of int
-
atom2chainMask(atomMask, breaks=0)[source]¶ Mask (set to 0) chains for which all atoms are masked (0) in atomMask. Put another way: Mark all chains that contain any atom that is marked ‘1’ in atomMask.
Parameters: atomMask (list/array of int) – list/array of int, 1 x N_atoms Returns: 1 x N_residues (0||1 ) Return type: array of int
-
chain2atomMask(chainMask, breaks=0)[source]¶ Convert chain mask to atom mask.
Parameters: - chainMask (list/array of int) – list/array of int, 1 x N_chains
- breaks (0||1) – look for chain breaks in backbone coordinates (def. 0)
Returns: 1 x N_atoms
Return type: array of int
-
chain2atomIndices(indices, breaks=0)[source]¶ Convert chain indices into atom indices.
Parameters: indices (list/array of int) – list/array of chain indices Returns: array of atom positions, new chain index Return type: array of int
-
res2atomProfile(p)[source]¶ Get an atom profile where each atom has the value its residue has in the residue profile.
Parameters: p (str) – name of existing residue profile OR … [ any ], list of lenResidues() length Returns: [ any ] OR array, atom profile Return type: list or array
-
atom2resProfile(p, f=None)[source]¶ Get a residue profile where each residue has the value that its first atom has in the atom profile. :param p: name of existing atom profile OR …
[ any ], list of lenAtoms() lengthParameters: f (func) – function to calculate single residue from many atom values f( [atom_value1, atom_value2,…] ) -> res_value (default None, simply take value of first atom in each res.) Returns: [ any ] OR array, residue profile Return type: list or array
-
profile2mask(str_profname[, cutoff_min, cutoff_max=None])[source]¶ Parameters: - cutoff_min (float) – low value cutoff (all values >= cutoff_min)
- cutoff_max (float) – high value cutoff (all values < cutoff_max)
Returns: mask len( profile(profName) ) x 1||0
Return type: array
Raises: ProfileError – if no profile is found with name profName
-
profile2atomMask(str_profname[, cutoff_min, cutoff_max=None])[source]¶ Same as
profile2mask, but converts residue mask to atom mask.Parameters: - cutoff_min (float) – low value cutoff
- cutoff_max (float) – high value cutoff
Returns: mask N_atoms x 1|0
Return type: array
Raises: ProfileError – if no profile is found with name profName
-
profile2resList(p)[source]¶ Group the profile values of each residue’s atoms into a separate list. :param p: name of existing atom profile OR …
[ any ], list of lenAtoms() lengthReturns: a list (one entry per residue) of lists (one entry per resatom) Return type: [ [ any ] ]
-
mergeChains(c1, id='', segid='', rmOxt=True, renumberAtoms=False, renumberResidues=True)[source]¶ Merge two adjacent chains. This merely removes all internal markers for a chain boundary. Atom content or coordinates are not modified.
PDBModel tracks chain boundaries in an internal _chainIndex. However, there are cases when this chainIndex needs to be re-built and new chain boundaries are then infered from jumps in chain- or segment labelling or residue numbering. mergeChains automatically re-assigns PDB chain- and segment IDs as well as residue numbering to prepare for this situation.
:param c1 : first of the two chains to be merged :type c1 : int :param id : chain ID of the new chain (default: ID of first chain) :type id : str :param segid: ew chain’s segid (default: SEGID of first chain) :type segid: str :param renumberAtoms: rewrite PDB serial numbering of the adjacent
chain to be consequtive to the last atom of the first chain (default: False)Parameters: renumberResidues (bool) – shift PDB residue numbering so that the first residue of the adjacent chain follows the previous residue. Other than for atom numbering, later jumps in residue numbering are preserved. (default: True)
-
mergeResidues(r1, name='', residue_number=None, chain_id='', segment_id='', renumberAtoms=False)[source]¶ Merge two adjacent residues. Duplicate atoms are labelled with alternate codes ‘A’ (first occurrence) to ‘B’ or later. :param r1: first of the two residues to be merged :type r1: int :param name: name of the new residue (default: name of first residue) :type name: str
-
concat(*models, **kw)[source]¶ Concatenate atoms, coordinates and profiles. source and fileName are lost, so are profiles that are not available in all models. model0.concat( model1 [, model2, ..]) -> single PDBModel.
Parameters: - models (one or more PDBModel instances) – models to concatenate
- newRes (bool) – treat beginning of second model as new residue (True)
- newChain (bool) – treat beginning of second model as new chain (True)
Note: info records of given models are lost.
-
take(i, rindex=None, cindex=None, *initArgs, **initKw)[source]¶ Extract a PDBModel with a subset of atoms:
take( atomIndices ) -> PDBModelAll other PDBModel methods that extract portions of the model (e.g. compress, takeChains, takeResidues, keep, clone, remove) are ultimately using
take()at their core.Note: take employs fast numpy vector mapping methods to re-calculate the residue and chain index of the result model. The methods generally work but there is one scenario were this mechanism can fail: If take is used to create repetitions of residues or chains directly next to each other, these residues or chains can get accidentally merged. For this reason, calling methods can optionally pre-calculate and provide a correct version of the new residue or chain index (which will then be used as is).
Parameters: - i (list/array of int) – atomIndices, positions to take in the order to take
- rindex (array of int) – optional residue index for result model after extraction
- cindex (array of int) – optional chain index for result model after extraction
- initArgs – any number of additional arguments for constructor of result model
- initKw – any additional keyword arguments for constructure of result model
Returns: new PDBModel or sub-class
Return type:
-
keep(i)[source]¶ Replace atoms,coordinates,profiles of this(!) model with sub-set. (in-place version of N0.take() )
Parameters: i (list or array of int) – atom positions to be kept
-
clone()[source]¶ Clone PDBModel.
Returns: PDBModel / subclass, copy of this model, see comments to numpy.take() Return type: PDBModel
-
compress(mask, *initArgs, **initKw)[source]¶ Compress PDBmodel using mask.
compress( mask ) -> PDBModelParameters: mask (array) – array( 1 x N_atoms of 1 or 0 ):
- 1 .. keep this atom
Returns: compressed PDBModel using mask Return type: PDBModel
-
remove(what)[source]¶ Convenience access to the 3 different remove methods. The mask used to remove atoms is returned. This mask can be used to apply the same change to another array of same dimension as the old(!) xyz and atoms.
Parameters: what (list of int or int) – Decription of what to remove:
- function( atom_dict ) -> 1 || 0 (1..remove) OR
- list of int [4, 5, 6, 200, 201..], indices of atoms to remove
- list of int [11111100001101011100..N_atoms], mask (1..remove)
- int, remove atom with this index
Returns: array(1 x N_atoms_old) of 0||1, mask used to compress the atoms and xyz arrays. Return type: array Raises: PDBError – if what is neither of above
-
takeResidues(i)[source]¶ Copy the given residues into a new model.
Parameters: i ([ int ]) – residue indices Returns: PDBModel with given residues in given order Return type: PDBModel
-
takeChains(chains, breaks=0, force=0)[source]¶ Get copy of this model with only the given chains.
Note, there is one very special scenario where chain boundaries can get lost: If breaks=1 (chain positions are based on normal chain boundaries as well as structure-based chain break detection) AND one or more chains are extracted several times next to each other, for example chains=[0, 1, 1, 2], then the repeated chain will be merged. So in the given example, the new model would have chainLength()==3. This case is tested for and a PDBIndexError is raised. Override with force=1 and proceed at your own risk. Which, in this case, simply means you should re-calculate the chain index after takeChains(). Example:
>>> repeat = model.takeChains( [0,0,0], breaks=1, force=1 ) >>> repeat.chainIndex( force=1, cache=1 )
This works because the new model will have back-jumps in residue numbering.
Parameters: - chains (list of int) – list of chains, e.g. [0,2] for first and third
- breaks (0|1) – split chains at chain breaks (default 0)
- maxDist (float) – (if breaks=1) chain break threshold in Angstrom
- force (bool) – override check for chain repeats (only for breaks==1)
Returns: PDBModel consisting of the given chains in the given order
Return type:
-
addChainFromSegid(verbose=1)[source]¶ Takes the last letter of the segment ID and adds it as chain ID.
-
addChainId(first_id=None, keep_old=0, breaks=0)[source]¶ Assign consecutive chain identifiers A - Z to all atoms.
Parameters: - first_id (str) – str (A - Z), first letter instead of ‘A’
- keep_old (1|0) – don’t override existing chain IDs (default 0)
- breaks (1|0) – consider chain break as start of new chain (default 0)
-
renumberResidues(mask=None, start=1, addChainId=1)[source]¶ Make all residue numbers consecutive and remove any insertion code letters. Note that a backward jump in residue numbering (among other things) is interpreted as end of chain by chainMap() and chainIndex() when a PDB file is loaded.
Parameters: - mask (list of int) – [ 0||1 x N_atoms ] atom mask to apply BEFORE
- start (int) – starting number (default 1)
- addChainId (1|0) – add chain IDs if they are missing
-
atomRange()[source]¶ >>> m.atomRange() == range( m.lenAtoms() )
Returns: integer range for lenght of this model Return type: [ int ]
-
lenResidues()[source]¶ Number of residues in model.
Returns: total number of residues Return type: int
-
lenChains(breaks=0, maxDist=None, singleRes=0, solvent=0)[source]¶ Number of chains in model.
Parameters: - breaks (0||1) – detect chain breaks from backbone atom distances (def 0)
- maxDist (float) – maximal distance between consequtive residues [ None ] .. defaults to twice the average distance
- singleRes (1||0) – allow chains consisting of single residues (def 0)
- solvent (1||0) – also check solvent residues for “chain breaks” (def 0)
Returns: total number of chains
Return type: int
-
resList(mask=None)[source]¶ Return list of lists of atom pseudo dictionaries per residue, which allows to iterate over residues and atoms of residues.
Parameters: mask – [ 0||1 x N_atoms ] atom mask to apply BEFORE Returns: a list (one per residue) of lists (one per atom) of dictionaries [ [ CrossView{'name':'N', ' residue_name':'LEU', ..}, CrossView{'name':'CA', 'residue_name':'LEU', ..} ], [ CrossView{'name':'CA', 'residue_name':'GLY', ..}, .. ] ]
Return type: [ [ biskit.ProfileCollection.CrossView] ]
-
resModels(i=None)[source]¶ Creates one new PDBModel for each residue in the parent PDBModel.
Parameters: i ([ int ] or array( int )) – range of residue positions (default: all residues) Returns: list of PDBModels, one for each residue Return type: [ PDBModel]
-
resMapOriginal(mask=None)[source]¶ Generate list to map from any atom to its ORIGINAL(!) PDB residue number.
Parameters: mask (list of int (1||0)) – [00111101011100111…] consider atom: yes or no len(mask) == N_atoms Returns: list all [000111111333344444..] with residue number for each atom Return type: list of int
-
resIndex(mask=None, force=0, cache=1)[source]¶ Get the position of the each residue’s first atom.
Parameters: - force (1||0) – re-calculate even if cached result is available (def 0)
- cache (1||0) – cache the result if new (def 1)
- mask (list of int (1||0)) – atom mask to apply before (i.e. result indices refer to compressed model)
Returns: index of the first atom of each residue
Return type: list of int
-
resMap(force=0, cache=1)[source]¶ Get list to map from any atom to a continuous residue numbering (starting with 0). A new residue is assumed to start whenever the ‘residue_number’ or the ‘residue_name’ record changes between 2 atoms.
See
resList()for an example of how to use the residue map.Parameters: - force (0||1) – recalculate map even if cached one is available (def 0)
- cache (0||1) – cache new map (def 1)
Returns: array [00011111122223333..], residue index for each atom
Return type: list of int
-
resEndIndex()[source]¶ Get the position of the each residue’s last atom.
Returns: index of the last atom of each residue Return type: list of int
-
chainIndex(breaks=0, maxDist=None, force=0, cache=0, singleRes=0, solvent=0)[source]¶ Get indices of first atom of each chain.
Parameters: - breaks (1||0) – split chains at chain breaks (def 0)
- maxDist (float) – (if breaks=1) chain break threshold in Angstrom
- force (1||0) – re-analyze residue numbering, chain and segids to find chain boundaries, use with care! (def 0)
- cache (1||0) – cache new index even if it was derrived from non-default parameters (def 0) Note: a simple m.chainIndex() will always cache
- singleRes (1||0) – allow chains consisting of single residues (def 0) Otherwise group consecutive residues with identical name into one chain.
- solvent (1||0) – also check solvent residues for “chain breaks” (default: false)
Returns: array (1 x N_chains) of int
Return type: list of int
-
chainEndIndex(breaks=0, solvent=0)[source]¶ Get the position of the each residue’s last atom.
Returns: index of the last atom of each residue Return type: list of int
-
chainMap(breaks=0, maxDist=None)[source]¶ Get chain index of each atom. A new chain is started between 2 atoms if the chain_id or segment_id changes, the residue numbering jumps back or a TER record was found.
Parameters: - breaks (1||0) – split chains at chain breaks (def 0)
- maxDist (float) – (if breaks=1) chain break threshold in Angstrom
Returns: array 1 x N_atoms of int, e.g. [000000011111111111122222…]
Return type: list of int
-
chainBreaks(breaks_only=1, maxDist=None, force=0, solvent=0, z=6.0)[source]¶ Identify discontinuities in the molecule’s backbone. By default, breaks are identified from the distribution of distances between the last backbone atom of a residue and the first backbone atom of the next residue. The median distance and standard deviation are determined iteratively and outliers (i.e. breaks) are identified as any pairs of residues with a distance that is more than z standard deviations (default 10) above the median. This heuristics can be overriden by specifiying a hard distance cutoff (maxDist).
Parameters: - breaks_only (1|0) – don’t report ends of regular chains (def 1)
- maxDist (float) – maximal distance between consequtive residues [ None ] .. defaults median + z * standard dev.
:param z : z-score for outlier distances between residues (def 6.) :type z : float :param solvent: also check selected solvent residues (buggy!) (def 0) :type solvent: 1||0 :param force: force re-calculation, do not use cached positions (def 0) :type force: 1||0
Returns: atom indices of last atom before a probable chain break Return type: list of int
-
removeRes(what)[source]¶ Remove all atoms with a certain residue name.
Parameters: what (str OR [ str ] OR int OR [ int ]) – indices or name(s) of residue to be removed
-
rms(other, mask=None, mask_fit=None, fit=1, n_it=1)[source]¶ Rmsd between two PDBModels.
Parameters: - other (PDBModel) – other model to compare this one with
- mask (list of int) – atom mask for rmsd calculation
- mask_fit (list of int) – atom mask for superposition (default: same as mask)
- fit (1||0) – superimpose first (default 1)
- n_it (int) – number of fit iterations:: 1 - classic single fit (default) 0 - until convergence, kicking out outliers on the way
Returns: rms in Angstrom
Return type: float
-
transformation(refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')[source]¶ Get the transformation matrix which least-square fits this model onto the other model.
Parameters: - refModel (PDBModel) – reference PDBModel
- mask (list of int) – atom mask for superposition
- n_it (int) – number of fit iterations:: 1 - classic single fit (default) 0 - until convergence
- z (float) – number of standard deviations for outlier definition (default 2)
- eps_rmsd (float) – tolerance in rmsd (default 0.5)
- eps_stdv (float) – tolerance in standard deviations (default 0.05)
- profname (str) – name of new atom profile getting outlier flag
Returns: array(3 x 3), array(3 x 1) - rotation and translation matrices
Return type: array, array
-
transform(*rt)[source]¶ Transform coordinates of PDBModel.
Parameters: rt (array OR array, array) – rotational and translation array: array( 4 x 4 ) OR array(3 x 3), array(3 x 1) Returns: PDBModel with transformed coordinates Return type: PDBModel
-
fit(refModel, mask=None, n_it=1, z=2, eps_rmsd=0.5, eps_stdv=0.05, profname='rms_outlier')[source]¶ Least-square fit this model onto refMode
Parameters: - refModel (PDBModel) – reference PDBModel
- mask (list of int (1||0)) – atom mask for superposition
- n_it (int) – number of fit iterations:: 1 - classic single fit (default) 0 - until convergence
- z (float) – number of standard deviations for outlier definition (default 2)
- eps_rmsd (float) – tolerance in rmsd (default 0.5)
- eps_stdv (float) – tolerance in standard deviations (default 0.05)
- profname (str) – name of new atom profile containing outlier flag
Returns: PDBModel with transformed coordinates
Return type:
-
magicFit(refModel, mask=None)[source]¶ Superimpose this model onto a ref. model with similar atom content. magicFit( refModel [, mask ] ) -> PDBModel (or subclass )
Parameters: - refModel (PDBModel) – reference PDBModel
- mask (list of int (1||0)) – atom mask to use for the fit
Returns: fitted PDBModel or sub-class
Return type:
-
structureFit(refModel, mask=None)[source]¶ Structure-align this model onto a reference model using the external TM-Align program (which needs to be installed).
structureFit( refModel [, mask] ) -> PDBModel (or subclass)
The result model has additional TM-Align statistics in its info record: r = m.structureFit( ref ) r.info[‘tm_score’] -> TM-Align score the other keys are: ‘tm_rmsd’, ‘tm_len’, ‘tm_id’
See also
biskit.TMAlignParameters: - refModel (PDBModel) – reference PDBModel
- mask (list of int (1||0)) – atom mask to use for the fit
Returns: fitted PDBModel or sub-class
Return type:
-
centered(mask=None)[source]¶ Get model with centered coordinates.
Parameters: mask (list of int (1||0)) – atom mask applied before calculating the center Returns: model with centered coordinates Return type: PDBModel
-
center(mask=None)[source]¶ Geometric centar of model.
Parameters: mask (list of int (1||0)) – atom mask applied before calculating the center Returns: xyz coordinates of center Return type: (float, float, float)
-
centerOfMass()[source]¶ Center of mass of PDBModel.
Returns: array(Float32) Return type: (float, float, float)
-
masses()[source]¶ Collect the molecular weight of all atoms in PDBModel.
Returns: 1-D array with mass of every atom in 1/12 of C12 mass. Return type: array of floats Raises: PDBError – if the model contains elements of unknown mass
-
mass()[source]¶ Molecular weight of PDBModel.
Returns: total mass in 1/12 of C12 mass Return type: float Raises: PDBError – if the model contains elements of unknown mass
-
residusMaximus(atomValues, mask=None)[source]¶ Take list of value per atom, return list where all atoms of any residue are set to the highest value of any atom in that residue. (after applying mask)
Parameters: - atomValues (list) – values per atom
- mask (list of int (1||0)) – atom mask
Returns: array with values set to the maximal intra-residue value
Return type: array of float
-
argsort(cmpfunc=None)[source]¶ Prepare sorting atoms within residues according to comparison function.
Parameters: - cmpfunc (function) – old style function(m.atoms[i], m.atoms[j]) -> -1, 0, +1
- key (function) – new style sort key function(m.atoms[i]) -> sortable
Returns: suggested position of each atom in re-sorted model ( e.g. [2,1,4,6,5,0,..] )
Return type: list of int
-
sort(sortArg=None)[source]¶ Apply a given sort list to the atoms of this model.
Parameters: sortArg (function) – comparison function Returns: copy of this model with re-sorted atoms (see numpy.take() ) Return type: PDBModel
-
unsort(sortList)[source]¶ Undo a previous sorting on the model itself (no copy).
Parameters: sortList (list of int) – sort list used for previous sorting. Returns: the (back)sort list used ( to undo the undo…) Return type: list of int Raises: PDBError – if sorting changed atom number
-
atomNames(start=None, stop=None)[source]¶ Return a list of atom names from start to stop RESIDUE index
Parameters: - start (int) – index of first residue
- stop (int) – index of last residue
Returns: [‘C’,’CA’,’CB’ …. ]
Return type: list of str
-
filterIndex(mode=0, **kw)[source]¶ Get atom positions that match a combination of key=values. E.g. filter( chain_id=’A’, name=[‘CA’,’CB’] ) -> index
Parameters: - mode (0||1) – 0 combine with AND (default), 1 combine with OR
- kw (filter options, see example) – combination of atom dictionary keys and values/list of values that will be used to filter
Returns: sort list
Return type: list of int
-
filter(mode=0, **kw)[source]¶ Extract atoms that match a combination of key=values. E.g. filter( chain_id=’A’, name=[‘CA’,’CB’] ) -> PDBModel
Parameters: - mode (0||1) – 0 combine with AND (default), 1 combine with OR
- kw (filter options, see example) – combination of atom dictionary keys and values/list of values that will be used to filter
Returns: filterd PDBModel
Return type:
-
equals(ref, start=None, stop=None)[source]¶ Compares the residue and atom sequence in the given range. Coordinates are not checked, other profiles are not checked.
Parameters: - start (int) – index of first residue
- stop (int) – index of last residue
Returns: [ 1||0, 1||0 ], first position sequence identity 0|1, second positio atom identity 0|1
Return type: list if int
-
compareAtoms(ref)[source]¶ Get list of atom indices for this and reference model that converts both into 2 models with identical residue and atom content.
- E.g.
>>> m2 = m1.sort() ## m2 has now different atom order >>> i2, i1 = m2.compareAtoms( m1 ) >>> m1 = m1.take( i1 ); m2 = m2.take( i2 ) >>> m1.atomNames() == m2.atomNames() ## m2 has again same atom order
Returns: indices, indices_ref Return type: ([int], [int])
-
unequalAtoms(ref, i=None, iref=None)[source]¶ Identify atoms that are not matching between two models. This method returns somewhat of the opposite of compareAtoms().
Not matching means: (1) residue is missing, (2) missing atom within a residue, (3) atom name is different. Differences in coordinates or other atom profiles are NOT evaluated and will be ignored.
(not speed-optimized)
Parameters: - ref (PDBModel) – reference model to compare to
- i (array( int ) or [ int ]) – pre-computed positions that are equal in this model (first value returned by compareAtoms() )
- iref – pre-computed positions that are equal in ref model (first value returned by compareAtoms() )
Returns: missmatching atoms of self, missmatching atoms of ref
Return type: array(int), array(int)
-
reportAtoms(i=None, n=None)[source]¶ Parameters: i ([ int ]) – optional list of atom positions to report (default: all) Returns: formatted string with atom and residue names similar to PDB Return type: str
-
compareChains(ref, breaks=0, fractLimit=0.2)[source]¶ Get list of corresponding chain indices for this and reference model. Use takeChains() to create two models with identical chain content and order from the result of this function.
Parameters: - ref (PDBModel) – reference PDBModel
- breaks (1||0) – look for chain breaks in backbone coordinates
- fractLimit (float) –
Returns: chainIndices, chainIndices_ref
Return type: ([int], [int])
-
biomodel(assembly=0)[source]¶ Return the ‘biologically relevant assembly’ of this model according to the information in the PDB’s BIOMT record (captured in info[‘BIOMT’]).
This removes redundant chains and performs symmetry operations to complete multimeric structures. Some PDBs define several alternative biological units: usually (0) the author-defined one and (1) software-defined – see
lenBiounits.Note: The BIOMT data are currently not updated during take/compress calls which may change chain indices and content. This method is therefore best run on an original PDB record before any other modifications are performed.
Parameters: assembly (int) – assembly index (default: 0 .. author-determined unit) Returns: PDBModel; biologically relevant assembly
-
lenBiounits()[source]¶ Number of biological assemblies defined in PDB BIOMT record, if any.
Returns: number of alternative biological assemblies defined in PDB header Return type: int
-
atomkey(compress=True)[source]¶ Create a string key encoding the atom content of this model independent of the order in which atoms appear within residues. Atom names are simply sorted alphabetically within residues and then concatenated.
Parameters: compress (bool) – compress key with zlib (default: true) Returns: key formed from sorted atom content of model Return type: str