PTM¶
- CoDIAC.PTM.write_PTM_features(Interpro_ID, uniprot_ref_file, feature_dir, mapping_file='', n_term_offset=0, c_term_offset=0, gap_threshold=0.7, num_PTM_threshold=5, PHOSPHOSITE_PLUS=False)[source]¶
Writes all PTM features from ProteomeScout or PhosphoSitePlus on Interpro domains from a uniprot reference file, if there are more than num_PTM_threshold that occur across all domains of that type in the reference. Returns the ptm_count_dict for reference and the feature dict that is generated to write the files. Files are named Interpro_ID_<PTM_Type>.feature and the reference fasta is also generated so that it is clear the features are attached to that particular run of the domains.
- Parameters:
- Interpro_ID: string
Interpro ID - for example in a reference line such as SH3_domain:IPR001452:82:143; SH2:IPR000980:147:246; Prot_kinase_dom:IPR000719:271:524 the interpro ID for the SH3_domain is IPR001452; for the SH2 domain is IPR000980
- uniprot_reference_file: string
File location that contains the reference of interest (like produced from Uniprot.makeRefFile)
- feature_dir: string
Feature Directory to place files in
- mapping_file: string
A CSV file location, if wanted, that holds a translation of the long header into a shorter header If this is an empty string, then it will not attempt mapping
- n_term_offset: int
Number of amino acids to extend in the n-term direction (up to start of protein)
- c_term_offset: int
Number of amino acids to extend in the c-term direction (up to end of protein)
- gap_threshold: float
fraction gap allowed before dispanding with PTM translation from ProteomeScout
- num_PTM_threshold: int
Number of PTMs in all domains of a type required to generate a feature file
- PHOSPHOSITE_PLUS: bool
If True, will generate PTMs from PhosphoSitePlus instead of ProteomeScout. See PhosphoSitePlus_Tools.py convert_pSiteDataFiles can be used to update or create the API-formatted files. These resources are stored in GitHub LFS.
- Returns:
- file_list: list
List of files generated as features
- ptm_count_dict: dict
keys are the modification type and values are total number encountered
- domain_feature_dict: dict of dicts
keys are the fasta headers Inner dict keys are the modification types and the values of this is a list of zero-based positions.