Annotating PTMs with Functional Information#
In order to facilitate downstream functional analysis, PTM-POSE provides functions to create annotation files in a .gmt file format (similar to the ones used by PTMsigDB) and append annotation information to PTMs impacted by splice events. With this information, you can identify individual PTMs of interest and assess global enrichment of certain types of functions, such as protein interactions or enzyme regulation (see analysis gallery for examples of the types of analysis that can be done with PTM-POSE).
Annotating PTMs with pre-installed annotation files#
For easy analysis, PTM-POSE comes pre-installed with annotation files from various databases. Should you use data from these sources, please make sure to abide by the terms of use of the respective databases and cite them appropriately. The following databases are included in PTM-POSE:
Database |
Annotation types |
Version/Date of Download |
---|---|---|
|
||
|
v2.0.0 |
|
PTMInt |
|
|
|
||
|
||
|
||
|
||
iKiP (Downloaded via PTMsigDB) |
|
While appending these annotations directly to the PTM dataframes generated by the project module is not necessary for the analyis modules described in the analysis gallery, you may wish to append this information directly. To append this information to either the spliced_ptms or altered_flanks dataframe, you can use the annotate.append_from_gmt() function, indicating the desired database and annotation type. For example, to append PhosphoSitePlus kinase substrate information to the spliced_ptms dataframe, you would use the following code:
from ptm_pose import annotate
spliced_ptms = annotate.append_from_gmt(database = 'PhosphoSitePlus', annot_type = 'Enzyme')
This will add a new column to the spliced_ptms dataframe called ‘PhosphoSitePlus:Enzyme’. Any rows where the PTM was associated with an annotation will contain a semicolon-separated list of the annotations, and all other rows will contain ‘NaN’.
If you ever need to check what annotations are available, you can use the annotate.get_available_gmt_annotations(format = ‘dataframe’) function to see what annotations are available.
Updating annotation files#
While we aimed to include a comprehensive set of available annotations that will be updated periodically, you may wish to use the most up-to-date versions of a given database. For databases already included in PTM-POSE, you can use the associated function from the annotate module, making sure to provide the required files and set overwrite to true.
For example, to create a new gmt file for PhosphoSitePlus kinase substrates, you would download the kinase_substrate file from PhosphoSitePlus and use the following code: .. code-block:: python
from ptm_pose import annotate
annotate.construct_PhosphoSitePlus_gmt_files(kinase_substrate_file = “/path/to/file/Kinase_Substrate_Dataset.gz”, overwrite = True)
This will generate a new gmt file in the PTM-POSE resource file directory that can be used with the annotate.append_from_gmt() function to update your PTM annotations with the latest data.
Creating Custom Annotation files#
In some cases, you may want to use PTM-POSE analysis functions with database information that is not yet provided. For this, use the annotate.construct_custom_gmt_file() function to create a custom annotation file. This function requires a dataframe that contains columns with the annotations data (annot_col) and UniProt ID (acc_col), residue (residue_col), and position in the protein (position_col) of each PTM. For best results, the UniProt ID should only include the isoform label if it is an alternative isoform. If you have this information, you can use the following code to create a custom annotation file:
from ptm_pose import annotate
#indicate where to find the data
annot_col = 'Annotation'
acc_col = 'UniProtKB Accession'
residue_col = 'Residue'
position_col = 'Position'
#indicate whether multiple annotations are present in each row, and if so, how they are separated. If each row contains only one, set to None
annotation_separator = None
annotate.construct_custom_gmt_file(annotation_df, annot_col = annot_col, acc_col = acc_col, residue_col = residue_col, position_col = position_col, annotation_separator = annotation_separator)
By default, this will save the gmt file in the PTM-POSE resource directory. If you would like to save it elsewhere, you can specify the odir parameter.
Future Directions#
If you have any suggestions or requests for additional annotation databases, please let us know! We are always looking to expand the functionality of PTM-POSE to better serve the community.