PTM-POSE Reference

Contents

PTM-POSE Reference#

Configuration#

ptm_pose.pose_config.download_translator(save=False)[source]#

Using rest API from UniProt, download mapping information between UniProt IDs, Gene names, and Ensembl Gene IDs. This information is used to convert between different gene identifiers and UniProt IDs

Parameters:
savebool, optional

Whether to save the translator file locally. The default is False.

PTM Projection#

ptm_pose.project.find_ptms_in_region(ptm_coordinates, chromosome, strand, start, end, gene=None, coordinate_type='hg38')[source]#

Given an genomic region in either hg38 or hg19 coordinates (such as the region encoding an exon of interest), identify PTMs that are mapped to that region. If so, return the exon number. If none are found, return np.nan.

Parameters:
chromosome: str

chromosome where region is located

strand: int

DNA strand for region is found on (1 for forward, -1 for reverse)

start: int

start position of region on the chromosome/strand (should always be less than end)

end: int

end position of region on the chromosome/strand (should always be greater than start)

coordinate_type: str

indicates the coordinate system used for the start and end positions. Either hg38 or hg19. Default is ‘hg38’.

Returns:
ptms_in_region: pandas.DataFrame

dataframe containing all PTMs found in the region. If no PTMs are found, returns np.nan.

ptm_pose.project.project_ptms_onto_MATS(SE_events=None, A5SS_events=None, A3SS_events=None, RI_events=None, MXE_events=None, coordinate_type='hg38', identify_flanking_sequences=False, dPSI_col='meanDeltaPSI', sig_col='FDR', extra_cols=None, separate_modification_types=False, PROCESSES=1, ptm_coordinates=None, **kwargs)[source]#

Given splice quantification from the MATS algorithm, annotate with PTMs that are found in the differentially included regions.

Parameters:
ptm_coordinates: pandas.DataFrame

dataframe containing PTM information, including chromosome, strand, and genomic location of PTMs

SE_events: pandas.DataFrame

dataframe containing skipped exon event information from MATS

A5SS_events: pandas.DataFrame

dataframe containing 5’ alternative splice site event information from MATS

A3SS_events: pandas.DataFrame

dataframe containing 3’ alternative splice site event information from MATS

RI_events: pandas.DataFrame

dataframe containing retained intron event information from MATS

MXE_events: pandas.DataFrame

dataframe containing mutually exclusive exon event information from MATS

coordinate_type: str

indicates the coordinate system used for the start and end positions. Either hg38 or hg19. Default is ‘hg38’.

dPSI_col: str

Column name indicating delta PSI value. Default is ‘meanDeltaPSI’.

sig_col: str

Column name indicating significance of the event. Default is ‘FDR’.

extra_cols: list

List of column names for additional information to add to the results. Default is None.

separate_modification_types: bool

Indicate whether residues with multiple modifications (i.e. phosphorylation and acetylation) should be treated as separate PTMs and be placed in unique rows of the output dataframe. Default is False.

PROCESSES: int

Number of processes to use for multiprocessing. Default is 1.

**kwargs: additional keyword arguments

Additional keyword arguments to pass to the find_ptms_in_many_regions function, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.project.project_ptms_onto_SpliceSeq(psi_data, splicegraph, gene_col='symbol', dPSI_col=None, sig_col=None, extra_cols=None, coordinate_type='hg19', separate_modification_types=False, identify_flanking_sequences=False, flank_size=5, ptm_coordinates=None, PROCESSES=1, **kwargs)[source]#

Given splice event quantification from SpliceSeq (such as what can be downloaded from TCGASpliceSeq), annotate with PTMs that are found in the differentially included regions.

Parameters:
psi_data: pandas.DataFrame

dataframe containing splice event quantification from SpliceSeq. Must contain the following columns: ‘symbol’, ‘exons’, ‘splice_type’.

splicegraph: pandas.DataFrame

dataframe containing exon information from the splicegraph used during splice event quantification. Must contain the following columns: ‘Symbol’, ‘Exon’, ‘Chr_Start’, ‘Chr_Stop’.

gene_col: str

column name in psi_data that contains the gene name. Default is ‘symbol’.

dPSI_col: str

column name in psi_data that contains the delta PSI value for the splice event. Default is None, which will not include this information in the output.

sig_col: str

column name in psi_data that contains the significance value for the splice event. Default is None, which will not include this information in the output.

extra_cols: list

list of additional columns to include in the output dataframe. Default is None, which will not include any additional columns.

coordinate_type: str

indicates the coordinate system used for the start and end positions. Either hg38 or hg19. Default is ‘hg19’.

separate_modification_types: bool

Indicate whether to store PTM sites with multiple modification types as multiple rows. For example, if a site at K100 was both an acetylation and methylation site, these will be separated into unique rows with the same site number but different modification types. Default is True.

identify_flanking_sequences: bool

Indicate whether to identify and return the flanking sequences for the splice events. Default is False.

flank_size: int

Size of the flanking sequence to extract from the splice event. Default is 5, which will extract 5 bases upstream and downstream of the splice event. Only relevant if identify_flanking_sequences is True.

PROCESSES: int

Number of processes to use for multiprocessing. Default is 1 (single processing).

**kwargs: additional keyword arguments

Additional keyword arguments to pass to the find_ptms_in_many_regions function, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.project.project_ptms_onto_splice_events(splice_data, annotate_original_df=True, chromosome_col='chr', strand_col='strand', region_start_col='exonStart_0base', region_end_col='exonEnd', dPSI_col=None, sig_col=None, event_id_col=None, gene_col=None, extra_cols=None, separate_modification_types=False, coordinate_type='hg38', start_coordinate_system='1-based', end_coordinate_system='1-based', taskbar_label=None, ptm_coordinates=None, PROCESSES=1, **kwargs)[source]#

Given splice event quantification data, project PTMs onto the regions impacted by the splice events. Assumes that the splice event data will have chromosome, strand, and genomic start/end positions for the regions of interest, and each row of the splice_event_data corresponds to a unique region.

Important note: PTM-POSE relies on Ensembl based coordinates (1-based), so if the coordinates are 0-based, make sure to indicate using the start_coordinate_system and end_coordinate_system parameters. For example, rMATS uses 0-based for the start coordinates, but 1-based for the end coordinates. In this case, set start_coordinate_system = ‘0-based’ and end_coordinate_system = ‘1-based’.

Parameters:
splice_data: pandas.DataFrame

dataframe containing splice event information, including chromosome, strand, and genomic location of regions of interest

ptm_coordinates: pandas.DataFrame

dataframe containing PTM information, including chromosome, strand, and genomic location of PTMs. If none, it will pull from the config file.

chromosome_col: str

column name in splice_data that contains chromosome information. Default is ‘chr’. Expects it to be a str with only the chromosome number: ‘Y’, ‘1’, ‘2’, etc.

strand_col: str

column name in splice_data that contains strand information. Default is ‘strand’. Expects it to be a str with ‘+’ or ‘-’, or integers as 1 or -1. Will convert to integers automatically if string format is provided.

region_start_col: str

column name in splice_data that contains the start position of the region of interest. Default is ‘exonStart_0base’.

region_end_col: str

column name in splice_data that contains the end position of the region of interest. Default is ‘exonEnd’.

event_id_col: str

column name in splice_data that contains the unique identifier for the splice event. If provided, will be used to annotate the ptm information with the specific splice event ID. Default is None.

gene_col: str

column name in splice_data that contains the gene name. If provided, will be used to make sure the projected PTMs stem from the same gene (some cases where genomic coordiantes overlap between distinct genes). Default is None.

dPSI_col: str

column name in splice_data that contains the delta PSI value for the splice event. Default is None, which will not include this information in the output

sig_col: str

column name in splice_data that contains the significance value for the splice event. Default is None, which will not include this information in the output.

extra_cols: list

list of additional columns to include in the output dataframe. Default is None, which will not include any additional columns.

coordinate_type: str

indicates the coordinate system used for the start and end positions. Either hg38 or hg19. Default is ‘hg38’.

start_coordinate_system: str

indicates the coordinate system used for the start position. Either ‘0-based’ or ‘1-based’. Default is ‘1-based’.

end_coordinate_system: str

indicates the coordinate system used for the end position. Either ‘0-based’ or ‘1-based’. Default is ‘1-based’.

separate_modification_types: bool

Indicate whether to store PTM sites with multiple modification types as multiple rows. For example, if a site at K100 was both an acetylation and methylation site, these will be separated into unique rows with the same site number but different modification types. Default is True.

taskbar_label: str

Label to display in the tqdm progress bar. Default is None, which will automatically state “Projecting PTMs onto regions using —– coordinates”.

PROCESSES: int

Number of processes to use for multiprocessing. Default is 1 (single processing)

**kwargs: additional keyword arguments

Additional keyword arguments to pass to the find_ptms_in_many_regions function, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

Returns:
spliced_ptm_info: pandas.DataFrame

Contains the PTMs identified across the different splice events

splice_data: pandas.DataFrame

dataframe containing the original splice data with an additional column ‘PTMs’ that contains the PTMs found in the region of interest, in the format of ‘SiteNumber(ModificationType)’. If no PTMs are found, the value will be np.nan.

Flanking Sequences#

ptm_pose.flanking_sequences.get_flanking_changes(ptm_coordinates, chromosome, strand, first_flank_region, spliced_region, second_flank_region, gene=None, dPSI=None, sig=None, event_id=None, flank_size=5, coordinate_type='hg38', lowercase_mod=True, order_by='Coordinates')[source]#

Given flanking and spliced regions associated with a splice event, identify PTMs that have potential to have an altered flanking sequence depending on whether spliced region is included or excluded (if PTM is close to splice boundary). For these PTMs, extract the flanking sequences associated with the inclusion and exclusion cases and translate into amino acid sequences. If the PTM is not associated with a codon that codes for the expected amino acid, the PTM will be excluded from the results.

Important note: It is assumed that all region coordinates are based on a 1-based coordinate system, not 0-based, consistent with Ensembl. If using a 0-based system, please adjust the coordinates accordingly prior to running this function

Parameters:
ptm_coordinatespandas.DataFrame

DataFrame containing PTM coordinate information for identify PTMs in the flanking regions

chromosomestr

Chromosome associated with the splice event

strandint

Strand associated with the splice event (1 for forward, -1 for negative)

first_flank_regionlist

List containing the start and stop locations of the first flanking region (first is currently defined based on location the genome not coding sequence)

spliced_regionlist

List containing the start and stop locations of the spliced region

second_flank_regionlist

List containing the start and stop locations of the second flanking region (second is currently defined based on location the genome not coding sequence)

event_idstr, optional

Event ID associated with the splice event, by default None

flank_sizeint, optional

Number of amino acids to include flanking the PTM, by default 7

coordinate_typestr, optional

Coordinate system used for the regions, by default ‘hg38’. Other options is hg19.

lowercase_modbool, optional

Whether to lowercase the amino acid associated with the PTM in returned flanking sequences, by default True

order_bystr, optional

Whether the first, spliced and second regions are defined by their genomic coordinates (first has smallest coordinate, spliced next, then second), or if they are defined by their translation (first the first when translated, etc.)

Returns:
pandas.DataFrame

DataFrame containing the PTMs associated with the flanking regions and the amino acid sequences of the flanking regions in the inclusion and exclusion cases

ptm_pose.flanking_sequences.get_flanking_changes_from_rMATS(ptm_coordinates=None, SE_events=None, A5SS_events=None, A3SS_events=None, RI_events=None, coordinate_type='hg38', dPSI_col='meanDeltaPSI', sig_col='FDR', extra_cols=None, **kwargs)[source]#

Given splice events identified rMATS extract quantified PTMs that are nearby the splice boundary (potential for flanking sequence to be altered). Coordinate information of individual exons should be found in splicegraph. You can also provide columns with specific psi or significance information. Extra cols not in these categories can be provided with extra_cols parameter.

Only use this function if you do not care about differentially included sites, otherwise you can use the project module set identify_flanking_sequences = True (project.project_ptms_onto_MATS(identify_flanking_sequences = True))

Parameters:
ptm_coordinates: pandas.DataFrame

dataframe containing PTM information, including chromosome, strand, and genomic location of PTMs. If none, will use the PTM coordinates from the pose_config file.

SE_events: pandas.DataFrame

dataframe containing skipped exon event information from MATS

A5SS_events: pandas.DataFrame

dataframe containing 5’ alternative splice site event information from MATS

A3SS_events: pandas.DataFrame

dataframe containing 3’ alternative splice site event information from MATS

RI_events: pandas.DataFrame

dataframe containing retained intron event information from MATS

MXE_events: pandas.DataFrame

dataframe containing mutually exclusive exon event information from MATS

coordinate_type: str

indicates the coordinate system used for the start and end positions. Either hg38 or hg19. Default is ‘hg38’.

dPSI_col: str

Column name indicating delta PSI value. Default is ‘meanDeltaPSI’.

sig_col: str

Column name indicating significance of the event. Default is ‘FDR’.

extra_cols: list

List of column names for additional information to add to the results. Default is None.

**kwargs: additional keyword arguments

Additional keyword arguments, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.flanking_sequences.get_flanking_changes_from_splice_data(splice_data, ptm_coordinates=None, chromosome_col=None, strand_col=None, first_flank_start_col=None, first_flank_end_col=None, spliced_region_start_col=None, spliced_region_end_col=None, second_flank_start_col=None, second_flank_end_col=None, dPSI_col=None, sig_col=None, event_id_col=None, gene_col=None, extra_cols=None, flank_size=5, coordinate_type='hg38', start_coordinate_system='1-based', end_coordinate_system='1-based', lowercase_mod=True, **kwargs)[source]#

Given a DataFrame containing information about splice events, extract the flanking sequences associated with the PTMs in the flanking regions if there is potential for this to be altered. The DataFrame should contain columns for the chromosome, strand, start and stop locations of the first flanking region, spliced region, and second flanking region. The DataFrame should also contain a column for the event ID associated with the splice event. If the DataFrame does not contain the necessary columns, the function will raise an error.

Parameters:
splice_datapandas.DataFrame

DataFrame containing information about splice events

ptm_coordinatespandas.DataFrame

DataFrame containing PTM coordinate information for identify PTMs in the flanking regions

chromosome_colstr, optional

Column name indicating chromosome, by default None

strand_colstr, optional

Column name indicating strand, by default None

first_flank_start_colstr, optional

Column name indicating start location of the first flanking region, by default None

first_flank_end_colstr, optional

Column name indicating end location of the first flanking region, by default None

spliced_region_start_colstr, optional

Column name indicating start location of the spliced region, by default None

spliced_region_end_colstr, optional

Column name indicating end location of the spliced region, by default None

second_flank_start_colstr, optional

Column name indicating start location of the second flanking region, by default None

second_flank_end_colstr, optional

Column name indicating end location of the second flanking region, by default None

event_id_colstr, optional

Column name indicating event ID, by default None

gene_colstr, optional

Column name indicating gene name, by default None

extra_colslist, optional

List of additional columns to include in the output DataFrame, by default None

flank_sizeint, optional

Number of amino acids to include flanking the PTM, by default 7

coordinate_typestr, optional

Coordinate system used for the regions, by default ‘hg38’. Other options is hg19.

lowercase_modbool, optional

Whether to lowercase the amino acid associated with the PTM in returned flanking sequences, by default True

start_coordinate_systemstr, optional

Coordinate system used for the start locations of the regions, by default ‘1-based’. Other option is ‘0-based’.

end_coordinate_systemstr, optional

Coordinate system used for the end locations of the regions, by default ‘1-based’. Other option is ‘0-based’.

kwargskeyword arguments, optional

Additional keyword arguments to pass to the find_ptms_in_many_regions function, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

Returns:
list

List containing DataFrames with the PTMs associated with the flanking regions and the amino acid sequences of the flanking regions in the inclusion and exclusion cases

ptm_pose.flanking_sequences.get_flanking_changes_from_splicegraph(psi_data, splicegraph, ptm_coordinates=None, dPSI_col=None, sig_col=None, event_id_col=None, extra_cols=None, gene_col='symbol', flank_size=5, coordinate_type='hg19', **kwargs)[source]#

Given a DataFrame containing information about splice events obtained from SpliceSeq and the corresponding splicegraph, extract the flanking sequences of PTMs that are nearby the splice boundary (potential for flanking sequence to be altered). Coordinate information of individual exons should be found in splicegraph. You can also provide columns with specific psi or significance information. Extra cols not in these categories can be provided with extra_cols parameter.

Parameters:
psi_datapandas.DataFrame

DataFrame containing information about splice events obtained from SpliceSeq

splicegraphpandas.DataFrame

DataFrame containing information about individual exons and their coordinates

ptm_coordinatespandas.DataFrame

DataFrame containing PTM coordinate information for identify PTMs in the flanking regions

dPSI_colstr, optional

Column name indicating delta PSI value, by default None

sig_colstr, optional

Column name indicating significance of the event, by default None

event_id_colstr, optional

Column name indicating event ID, by default None

extra_colslist, optional

List of column names for additional information to add to the results, by default None

gene_colstr, optional

Column name indicating gene symbol of spliced gene, by default ‘symbol’

flank_sizeint, optional

Number of amino acids to include flanking the PTM, by default 5

coordinate_typestr, optional

Coordinate system used for the regions, by default ‘hg19’. Other options is hg38.

**kwargs: additional keyword arguments

Additional keyword arguments, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

Returns:
altered_flankspandas.DataFrame

DataFrame containing the PTMs associated with the flanking regions that are altered, and the flanking sequences that arise depending on whether the flanking sequence is included or not

Annotating PTMs#

ptm_pose.annotate.add_ELM_interactions(ptms, file=None, report_success=True)[source]#

Given a spliced ptms or altered flanks dataframe from the project module, add ELM interaction data to the dataframe

Parameters:
ptms: pandas.DataFrame

Contains the PTMs identified across the different splice events, either differentially included events, or altered flanking sequences

file: str

Path to the ELM data file. If not provided, the data will be downloaded directly from the ELM website

report_success: bool

If True, will print out the number of PTMs identified in the dataset that have ELM interaction information

Returns:
ptms: pandas.DataFrame

Contains the PTMs identified across the different splice events with additional columns for ELM interaction data

ptm_pose.annotate.add_ELM_matched_motifs(ptms, flank_size=7, file=None, report_success=True)[source]#

Given spliced ptms or altered flanks dataframes, compare the canonical flanking sequences of each PTM to motifs recorded in the ELM database. If a match is found, the ELM motif will be recorded in the ELM:Motif Matches column

Parameters:
ptms: pandas.DataFrame

Contains the PTMs identified across the different splice events, either differentially included events, or altered flanking sequences

flank_size: int

Number of residues to include on either side of the PTM for the motif search. Default is 7

file: str

Path to the ELM data file. If not provided, the data will be downloaded directly from the ELM website

report_success: bool

If True, will print out the number of PTMs identified in the dataset that have ELM motif data

ptm_pose.annotate.add_custom_annotation(ptms, annotation_data, source_name, annotation_type, annotation_col, accession_col='UniProtKB Accession', residue_col='Residue', position_col='PTM Position in Isoform')[source]#

Add custom annotation data to ptms or altered flanking sequence dataframes

Parameters:
annotation_data: pandas.DataFrame

Dataframe containing the annotation data to be added to the ptms dataframe. Must contain columns for UniProtKB Accession, Residue, PTM Position in Isoform, and the annotation data to be added

source_name: str

Name of the source of the annotation data, will be used to label the columns in the ptms dataframe

annotation_type: str

Type of annotation data being added, will be used to label the columns in the ptms dataframe

annotation_col: str

Column name in the annotation data that contains the annotation data to be added to the ptms dataframe

accession_col: str

Column name in the annotation data that contains the UniProtKB Accession information. Default is ‘UniProtKB Accession’

residue_col: str

Column name in the annotation data that contains the residue information

position_col: str

Column name in the annotation data that contains the PTM position information

Returns:
ptms: pandas.DataFrame

Contains the PTMs identified across the different splice events with an additional column for the custom annotation data

ptm_pose.annotate.add_omnipath_data(ptms, min_sources=1, min_references=1, convert_to_gene_name=True, replace_old_annotations=True, report_success=True)[source]#

Given spliced ptms or altered flanks dataframe, append enzyme-substrate interactions recorded in OmniPath database. These will be split between ‘Writer’ enzymes, or enzymes that add the modification (OmniPath:Writer Enzyme), and ‘Eraser’ enzymes, or enzymes that remove the modification (OmniPath:Eraser Enzyme). Note, we do not consider the ‘post translational modification’ or ‘cleavage’ entries for this purpose.

Parameters:
ptmspandas.DataFrame

Spliced PTMs or altered flanks dataframe.

min_sourcesint

Minimum number of sources (i.e. database) for enzyme-substrate interaction. Default is 1, or all entries.

min_referencesint

Minimum number of references (i.e. publications) for enzyme-substrate interaction. Default is 1, or all entries.

convert_to_gene_namebool

Whether to convert enzyme names from UniProt IDs to gene names using pose_config.uniprot_to_genename. Default is True.

report_successbool

Whether to report success message. Default is True.

ptm_pose.annotate.annotate_ptms(ptms, annot_type='All', phosphositeplus=True, ptmsigdb=True, ptmcode=True, ptmint=True, omnipath=True, regphos=True, depod=True, elm=False, interactions_to_combine='All', enzymes_to_combine='All', combine_similar=True, report_success=True, **kwargs)[source]#

Given spliced ptm data, add annotations from various databases. The annotations that can be added are the following:

PhosphoSitePlus: regulatory site data (file must be provided), kinase-substrate data (file must be provided), and disease association data (file must be provided) ELM: interaction data (can be downloaded automatically or provided as a file), motif matches (elm class data can be downloaded automatically or provided as a file) PTMInt: interaction data (will be downloaded automatically) PTMcode: intraprotein interactions (can be downloaded automatically or provided as a file), interprotein interactions (can be downloaded automatically or provided as a file) DEPOD: phosphatase-substrate data (will be downloaded automatically) RegPhos: kinase-substrate data (will be downloaded automatically)

Parameters:
ptms: pd.DataFrame

Spliced PTM data from project module

psp_regulatorybool
interactions_to_combine: list

List of databases to combine interaction data from. Default is [‘PTMcode’, ‘PhosphoSitePlus’, ‘RegPhos’, ‘PTMInt’]

kinases_to_combine: list

List of databases to combine kinase-substrate data from. Default is [‘PhosphoSitePlus’, ‘RegPhos’]

combine_similar: bool

Whether to combine annotations of similar information (kinase, interactions, etc) from multiple databases into another column labeled as ‘Combined’. Default is True

ptm_pose.annotate.append_from_gmt(ptms, database=None, annot_type=None, gmt_df=None, column_name=None, **kwargs)[source]#

Given a gmt annotation file format, add the annotations to the ptms dataframe

Parameters:
ptmspd.DataFrame

dataframe containing ptm information, which can be the spliced_ptms or altered_flanks dataframe generated during projection

databasestr

Name of the database for the annotation. Used to identify proper annotation if gmt_df not provided

annot_typestr

Type of annotation to append to the ptms dataframe

gmt_dfpd.DataFrame

If using custom gmt file, provide the dataframe loaded from the GMT file. This will override the database and annot_type parameters if provided.

column_namestr or None

Name of the column to use for the annotations in the ptms dataframe. If None, will use a default name based on the database and annot_type. Default is None.

**kwargsadditional keyword arguments

Passes additional keyword arguments to annotation specific functions. For example, you could pass min_sources for the construct_omnipath_gmt() function

ptm_pose.annotate.check_file(fname, expected_extension='.tsv')[source]#

Given a file name, check if the file exists and has the expected extension. If the file does not exist or has the wrong extension, raise an error.

Parameters:
fname: str

File name to check

expected_extension: str

Expected file extension. Default is ‘.tsv’

ptm_pose.annotate.check_gmt_file(gmt_file, database, annot_type, automatic_download=False, odir=None, **kwargs)[source]#

Given a gmt file path, check to make sure it exists. If it doesn’t, either raise error or download and save a gmt file in the provided directory.

Parameters:
gmt_filestr

file path to gmt file

databasestr

name of database associated with gmt file

annot_typestr

type of annotation to check for. This is used to provide more specific error messages

automatic download: bool

whether to automatically download data and process into gmt file if it does not exist and can be done. Default is false

odirstr or None

location to save annotations, if automatic download is true

kwargsadditional keyword arguments

Passes additional keyword arguments to annotation specific functions. For example, you could pass min_sources for the construct_omnipath_gmt() function

ptm_pose.annotate.combine_enzyme_data(ptms, enzyme_databases=['PhosphoSitePlus', 'RegPhos', 'OmniPath', 'DEPOD'], regphos_conversion={'ABL1(ABL)': 'ABL1', 'CDC2': 'CDK1', 'CK2A1': 'CSNK2A1', 'ERK1(MAPK3)': 'MAPK3', 'ERK2(MAPK1)': 'MAPK1', 'JNK2(MAPK9)': 'MAPK9', 'PKACA': 'PRKACA'})[source]#

Given spliced ptm information, combine enzyme-substrate data from multiple databases (currently support PhosphoSitePlus, RegPhos, OmniPath, DEPOD, and iKiP downloaded from PTMsigDB), assuming that the enzyme data from these resources has already been added to the spliced ptm data. The combined kinase data will be added as a new column labeled ‘Combined:Writer Enzyme’ and ‘Combined:Eraser Enzyme’

Parameters:
ptms: pd.DataFrame

Spliced PTM data from project module

enzyme_databases: list

List of databases to combine enzyme data from. Currently support PhosphoSitePlus, RegPhos, OmniPath, and DEPOD

regphos_conversion: dict

Allows conversion of RegPhos names to matching names in PhosphoSitePlus.

Returns:
ptms: pd.DataFrame

PTM data with combined kinase data added

ptm_pose.annotate.combine_interaction_data(ptms, interaction_databases=['PhosphoSitePlus', 'PTMcode', 'PTMInt', 'RegPhos', 'DEPOD'], include_enzyme_interactions=True)[source]#

Given annotated spliced ptm data, extract interaction data from various databases and combine into a single dataframe. This will include the interacting protein, the type of interaction, and the source of the interaction data

Parameters:
ptms: pd.DataFrame

Dataframe containing PTM data and associated interaction annotations from various databases

interaction_databases: list

List of databases to extract interaction data from. Options include ‘PhosphoSitePlus’, ‘PTMcode’, ‘PTMInt’, ‘RegPhos’, ‘DEPOD’. These should already have annotation columns in the ptms dataframe, otherwise they will be ignored. For kinase-substrate interactions, if combined column is present, will use that instead of individual databases

include_enzyme_interactions: bool

If True, will include kinase-substrate and phosphatase interactions in the output dataframe

Returns:
interact_data: list

List of dataframes containing PTMs and their interacting proteins, the type of influence the PTM has on the interaction (DISRUPTS, INDUCES, or REGULATES), and the source of the interaction data

ptm_pose.annotate.construct_DEPOD_gmt_file(odir=None, overwrite=False, max_retries=5, delay=10)[source]#

Download and process DEPOD data to create a GMT file for PTM-POSE. DEPOD contains information on dephosphorylation sites and their corresponding substrates.

Parameters:
odirstr, optional

Output directory for the GMT file. If not provided, it will default to the ‘Resource_Files/Annotations/DEPOD’ directory within the PTM-POSE package directory.

overwritebool, optional

If True, will overwrite any existing GMT file in the output directory. If False and the GMT file already exists, the function will skip processing and print a message.

max_retriesint, optional

Number of times to try downloading data from DEPOD

delayint, optional

Delay in seconds between download attempts. Default is 10 seconds.

ptm_pose.annotate.construct_PTMInt_gmt_file(file=None, odir=None, overwrite=False, max_retries=5, delay=10)[source]#

Download and process PTMInt interaction data to create gmt files for PTM-POSE

Parameters:
filestr, optional

Path to the PTMInt data file. If not provided, the data will be downloaded directly from the PTMInt website. Default is None.

odirstr, optional

Output directory for the gmt file. If not provided, will default to the PTM-POSE resource directory for annotations. Default is None.

overwritebool, optional

If True, will overwrite any existing gmt files in the output directory. If False, will skip the creation of the gmt file if it already exists. Default is False.

max_retriesint, optional

Number of times to retry downloading the PTMInt data if the download fails. Default is 5.

delayint, optional

Amount of time to wait (in seconds) before retrying the download if it fails. Default is 10 seconds.

ptm_pose.annotate.construct_PTMcode_interprotein_gmt_file(file=None, odir=None, overwrite=False, max_retries=5, delay=10)[source]#

Given the PTMcode interprotein interaction data, convert to readily usable format with PTM-POSE in gmt file format

file: str

Path to the PTMcode interprotein interaction data file. If not provided, the data will be downloaded directly from the PTMcode website

odirstr

Output directory for the gmt file. If not provided, will default to the PTM-POSE resource directory for annotations

overwritebool, optional

If True, will overwrite any existing gmt files in the output directory. If False, will skip the creation of the gmt file if it already exists. Default is False.

max_retriesint, optional

Number of times to retry downloading the PTMcode data if the initial attempt fails. Default is 5.

delayint, optional

Number of seconds to wait between retries if the download fails. Default is 10 seconds.

ptm_pose.annotate.construct_PTMsigDB_gmt_files(file, odir=None, overwrite=False, process_PSP_data=True)[source]#

Given the PTMsigDB xlsx file, convert to readily usable format with PTM-POSE in gmt file format. This will also process the PhosphoSitePlus data in PTMsigDB if requested.

Parameters:
filestr

PTMsigDB excel file path. This file can be downloaded from the PTMsigDB website.

odirstr

Output directory for the gmt files. If None, will default to the PTM-POSE resource directory.

overwritebool, optional

If True, will overwrite any existing gmt files in the output directory. Default is False.

process_PSP_databool, optional

If True, will process the PhosphoSitePlus data included in the PTMsigDB file, but only if not already found in odir. Default is True.

ptm_pose.annotate.construct_PhosphoSitePlus_gmt_files(regulatory_site_file=None, kinase_substrate_file=None, disease_association_file=None, odir=None, overwrite=False)[source]#

Given three PhosphoSitePlus annotation files, convert to readily usable format with PTM-POSE in gmt file format

Parameters:
regulatory_site_file: str or None

Path to the PhosphoSitePlus regulatory site file (gzipped). If None, will skip creating function annotations.

kinase_substrate_file: str or None

Path to the PhosphoSitePlus kinase-substrate file (gzipped). If None, will skip creating kinase-substrate annotations.

disease_association_file: str or None

Path to the PhosphoSitePlus disease association file (gzipped). If None, will skip creating disease association annotations.

odirstr or None

Path to the output directory where the GMT files will be saved. If None, will save to the default resource directory for PhosphoSitePlus annotations.

overwritebool

If True, will overwrite existing GMT files if they already exist. If False, will skip creating the GMT files if they already exist. Default is False.

ptm_pose.annotate.construct_RegPhos_gmt_file(file=None, odir=None, overwrite=False)[source]#
filestr

RegPhos text file path. This file can be downloaded from the RegPhos website. If None, the function will raise an error.

odirstr

Output directory for the gmt files. If None, will default to the PTM-POSE resource directory.

overwritebool, optional

If True, will overwrite any existing gmt files in the output directory. Default is False.

ptm_pose.annotate.construct_annotation_dict_from_gmt(gmt_df, key_type='annotation')[source]#

Given a gmt annotation file format, construct a dictionary mapping each item to its annotations, with either the annotation as key or PTM as the key

ptm_pose.annotate.construct_custom_gmt_file(annotation_df, database, annot_type, annot_col, accession_col='UniProtKB Accession', residue_col='Residue', position_col='PTM Position in Isofrom', odir=None, **kwargs)[source]#

Function for constructing a gmt file for annotations not currently provided by PTM-POSE. Ideally, these annotations should be partially processed to have the same format as PTM-POSE annotations. For example, they should have columns for UniProtKB Accession, Residue, PTM Position in Isoform, and the annotation data to be added.

Parameters:
annotation_df: pandas.DataFrame

Dataframe containing the annotation data to be added to the ptms dataframe. Must contain columns for UniProtKB Accession, Residue, PTM Position in Isoform, and the annotation data to be added.

databasestr

Name of the database for the annotation. This will be used to create the output directory and file name.

annot_typestr

Type of annotation data being added. This will be used to create the output file name and description.

annot_colstr

Column name in the annotation data that contains the annotation data to be added to the ptms dataframe. This will be used as the annotation column in the output GMT file.

accession_colstr

Column name in the annotation data that contains the UniProtKB Accession information. Default is ‘UniProtKB Accession’.

residue_colstr

Column name in the annotation data that contains the residue information. Default is ‘Residue’.

position_colstr

Column name in the annotation data that contains the PTM position information. Default is ‘PTM Position in Isoform’.

odirstr or None

Path to the output directory where the GMT file will be saved. If None, will save to the default resource directory for annotations. Default is None.

kwargsadditional keyword arguments

additional keywords to pass to construct_gmt_df function. This can include parameters such as annotation_separator, description, and compressed.

ptm_pose.annotate.construct_gmt_df(df, annotation_col, description=nan, annotation_separator=None, odir=None, fname=None, compressed=True)[source]#

Given annotation data, construct a dataframe in the gmt file format. Save if odir and fname are provided

Parameters:
dfpd.DataFrame

Dataframe containing the annotation data to be converted to GMT format. Must contain columns for UniProtKB Accession, Residue, PTM Position in Isoform, and the annotation data to be added.

annotation_colstr

Column name in the dataframe that contains the annotation data to be added to the GMT file. This will be used as the annotation column in the output GMT file.

descriptionstr or np.nan

description to add to description column

annotation_separatorstr or None

what separator to use for splitting annotations in the annotation_col. If None, will not split annotations. Default is None.

odirstr or None

file path to output directory where the GMT file will be saved. If None, will not save. Default is None.

fnamestr or None:

name of output file. If None, will use the annotation_col as the file name. Default is None.

compressedbool

whether to save gmt file in gzip format. Default is True.

ptm_pose.annotate.construct_omnipath_gmt_file(min_sources=1, min_references=1, convert_to_gene_name=True, odir=None)[source]#

Download enzyme-substrate interactions from the OmniPath database. The data will be filtered based on the number of sources and references specified. The resulting data will be split into two categories: ‘Writer’ enzymes, which add the modification, and ‘Eraser’ enzymes, which remove the modification. The output will be saved as GMT files in resource files directory.

ptm_pose.annotate.convert_PSP_label_to_UniProt(label)[source]#

Given a label for an interacting protein from PhosphoSitePlus, convert to UniProtKB accession. Required as PhosphoSitePlus interactions are recorded in various ways that aren’t necessarily consistent with other databases (i.e. not always gene name)

Parameters:
label: str

Label for interacting protein from PhosphoSitePlus

ptm_pose.annotate.extract_ids_PTMcode(df, col='## Protein1')[source]#

Many different ID forms are used in PTMcode, but we rely on UniProt IDs. This function is used to convert between Ensembl Gene IDs to UniProt IDs

ptm_pose.annotate.extract_interaction_details(interaction, column='PhosphoSitePlus:Interactions')[source]#

Given an interaction string from a specific database, extract the type of interaction and the interacting protein. This is required as different databases format their interaction strings differently.

ptm_pose.annotate.extract_positions_from_DEPOD(x)[source]#

Given string object consisting of multiple modifications in the form of ‘Residue-Position’ separated by ‘, ‘, extract the residue and position. Ignore any excess details in the string.

Parameters:
xstr

dephosphosite entry from DEPOD data

Returns:
new_x :str

ptm residue and position in format that PTM-POSE recognizes

ptm_pose.annotate.get_available_gmt_annotations(format='dict')[source]#

Get the annotations available in resource files in GMT format. Can be outputted as either a dictionary or pandas DataFrame

Parameters:
format: str

Format to output the available annotations. Options are ‘dict’ or ‘dataframe’

ptm_pose.annotate.load_gmt_file(gmt_file)[source]#

Load a GMT file into a pandas DataFrame

ptm_pose.annotate.process_database_annotations(database='PhosphoSitePlus', annot_type='Function', key_type='annotation', collapsed=False, resource_dir=None, automatic_download=False, **kwargs)[source]#

Given a database and annotation type, find and process the annotations into a dictionary mapping each PTM to its annotations, or vice versa

Parameters:
database: str

source of annotation

annot_typestr

type of annotation to retrieve

key_typestr

whether the annotation or ptm should be the key of the output dictionary. Default is annotation

collapsedbool

whether to combine annotations for similar types into a single annotation. For example, ‘cell growth, induced’ and ‘cell growth, inhibited’ would be simplified to ‘cell growth’. Default is False.

resource_dirstr or None

location of annotations. By default, this will look for annotations in PTM-POSE resource directory

automatic_download: bool

Whether to automatically download annotations that are not yet present in resource files directory

kwargsadditional keyword arguments

Passes additional keyword arguments to annotation specific functions. For example, you could pass min_sources for the construct_omnipath_gmt() function

ptm_pose.annotate.simplify_annotation(annotation, sep=',')[source]#

Given an annotation, remove additional information such as whether or not a function is increasing or decreasing. For example, ‘cell growth, induced’ would be simplified to ‘cell growth’

Parameters:
annotation: str

Annotation to simplify

sep: str

Separator that splits the core annotation from additional detail. Default is ‘,’. Assumes the first element is the core annotation.

Returns:
annotation: str

Simplified annotation

ptm_pose.annotate.unify_interaction_data(ptms, interaction_col, name_dict={})[source]#

Given spliced ptm data and a column containing interaction data, extract the interacting protein, type of interaction, and convert to UniProtKB accession. This will be added as a new column labeled ‘Interacting ID’

Parameters:
ptms: pd.DataFrame

Dataframe containing PTM data

interaction_col: str

column containing interaction information from a specific database

name_dict: dict

dictionary to convert names within given database to UniProt IDs. For cases when name is not necessarily one of the gene names listed in UniProt

Returns:
interact: pd.DataFrame

Contains PTMs and their interacting proteins, the type of influence the PTM has on the interaction (DISRUPTS, INDUCES, or REGULATES)

Analyze Modules#

Summaries#

ptm_pose.analyze.summarize.combine_outputs(spliced_ptms, altered_flanks, report_removed_annotations=True, include_stop_codon_introduction=False, remove_conflicting=True, **kwargs)[source]#

Given the spliced_ptms (differentially included) and altered_flanks (altered flanking sequences) dataframes obtained from project and flanking_sequences modules, combine the two into a single dataframe that categorizes each PTM by the impact on the PTM site

Parameters:
spliced_ptms: pd.DataFrame

Dataframe with PTMs projected onto splicing events and with annotations appended from various databases

altered_flanks: pd.DataFrame

Dataframe with PTMs associated with altered flanking sequences and with annotations appended from various databases

include_stop_codon_introduction: bool

Whether to include PTMs that introduce stop codons in the altered flanks. Default is False.

remove_conflicting: bool

Whether to remove PTMs that are both included and excluded across different splicing events. Default is True.

kwargs: dict

Additional keyword arguments to pass to the function, will be passed to helpers.filter_ptms if filtering is desired. Will automatically filter out insignificant events if not provided

ptm_pose.analyze.summarize.get_modification_class_data(ptms, mod_class)[source]#

Given ptm dataframe and a specific modification class, return a dataframe with only the PTMs of that class

Parameters:
ptmspd.DataFrame

Dataframe with ptm information, such as the spliced_ptms or altered_flanks dataframe obtained during projection

mod_classstr

The modification class to filter by, e.g. ‘Phosphorylation’, ‘Acetylation’, etc.

ptm_pose.analyze.summarize.get_modification_counts(ptms, **kwargs)[source]#

Given PTM data (either spliced ptms, altered flanks, or combined data), return the counts of each modification class

Parameters:
ptms: pd.DataFrame

Dataframe with PTMs projected onto splicing events or with altered flanking sequences

Returns:
modification_counts: pd.Series

Series with the counts of each modification class

ptm_pose.analyze.summarize.plot_modification_breakdown(spliced_ptms=None, altered_flanks=None, colors=[(0.00392156862745098, 0.45098039215686275, 0.6980392156862745), (0.8705882352941177, 0.5607843137254902, 0.0196078431372549), (0.00784313725490196, 0.6196078431372549, 0.45098039215686275), (0.8352941176470589, 0.3686274509803922, 0.0), (0.8, 0.47058823529411764, 0.7372549019607844), (0.792156862745098, 0.5686274509803921, 0.3803921568627451), (0.984313725490196, 0.6862745098039216, 0.8941176470588236), (0.5803921568627451, 0.5803921568627451, 0.5803921568627451), (0.9254901960784314, 0.8823529411764706, 0.2), (0.33725490196078434, 0.7058823529411765, 0.9137254901960784)], ax=None, **kwargs)[source]#

Plot the number of PTMs that are differentially included or have altered flanking sequences, separated by PTM type

Parameters:
spliced_ptms: pd.DataFrame

Dataframe with PTMs that are differentially included

altered_flanks: pd.DataFrame

Dataframe with PTMs that have altered flanking sequences

colors: list

List of colors to use for the bar plot (first two will be used). Default is seaborn colorblind palette.

ax: matplotlib.Axes

Axis to plot on. If None, will create new figure. Default is None.

kwargs: dict

Additional keyword arguments to pass to the function, will be passed to helpers.filter_ptms if filtering is desired. Will automatically filter out insignificant events by min_dpsi and significance if the columns are present

Filtering PTMs and Events#

ptm_pose.analyze.filter.plot_filter_impact(ptms, output_type='count', topn=10, ax=None, **kwargs)[source]#

Given a dataframe of PTMs and a set of filter arguments to be passed to helpers.filter_ptms, this function will plot the number or fraction of PTMs that are retained after filtering for each modification type

Parameters:
ptmspd.DataFrame

Dataframe containing PTM data with a column ‘Modification Class’ that contains the type of modification (e.g. phosphorylation, acetylation, etc.)

output_typestr, optional

Type of output to plot, either ‘count’ or ‘fraction’. The default is ‘count’.

topnint, optional

The number of top modification classes to plot. The default is 10.

axmatplotlib.axes.Axes, optional

The axes to plot on. If None, a new figure and axes will be created. The default is None.

**kwargskeyword arguments

Additional keyword arguments to be passed to the filter_ptms function (e.g. min_studies, min_compendia, etc.). These will be extracted and checked for validity.

Annotations#

ptm_pose.analyze.annotations.annotation_enrichment(ptms, database='PhosphoSitePlus', annot_type='Function', background_type='all', collapse_on_similar=False, mod_class=None, alpha=0.05, min_dpsi=0.1, **kwargs)[source]#

Given spliced ptm information (differential inclusion, altered flanking sequences, or both), calculate the enrichment of specific annotations in the dataset using a hypergeometric test. Background data can be provided/constructed in a few ways:

  1. Use annotations from the entire phosphoproteome (background_type = ‘all’)

  2. Use the alpha and min_dpsi parameter to construct a foreground that only includes significantly spliced PTMs, and use the entire provided spliced_ptms dataframe as the background. This will allow you to compare the enrichment of specific annotations in the significantly spliced PTMs compared to the entire dataset. Will do this automatically if alpha or min_dpsi is provided.

Parameters:
ptms: pd.DataFrame

Dataframe with PTMs projected onto splicing events and with annotations appended from various databases

database: str

database from which PTMs are pulled. Options include ‘PhosphoSitePlus’, ‘ELM’, ‘PTMInt’, ‘PTMcode’, ‘DEPOD’, ‘RegPhos’, ‘PTMsigDB’. Default is ‘PhosphoSitePlus’.

annot_type: str

Type of annotation to pull from spliced_ptms dataframe. Available information depends on the selected database. Default is ‘Function’.

background_type: str

how to construct the background data. Options include ‘pregenerated’ (default) and ‘significance’. If ‘significance’ is selected, the alpha and min_dpsi parameters must be provided. Otherwise, will use whole proteome in the ptm_coordinates dataframe as the background.

collapse_on_similar: bool

Whether to collapse similar annotations (for example, increasing and decreasing functions) into a single category. Default is False.

mod_class: str

modification class to subset, if any

alpha: float

significance threshold to use to subset foreground PTMs. Default is None.

min_dpsi: float

minimum delta PSI value to use to subset foreground PTMs. Default is None.

kwargs: additional keyword arguments

Additional keyword arguments to pass to the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.analyze.annotations.draw_pie(dist, xpos, ypos, size, colors, edgecolor=None, type='donut', ax=None)[source]#

Draws pies individually, as if points on a scatter plot. This function was taken from this stack overflow post: https://stackoverflow.com/questions/56337732/how-to-plot-scatter-pie-chart-using-matplotlib

Parameters:
dist: list

list of values to be represented as pie slices for a single point

xpos: float

x position of pie chart in the scatter plot

ypos: float

y position of pie chart in the scatter plot

size: float

size of pie chart

colors: list

list of colors to use for pie slices

ax: matplotlib.Axes

axis to plot on, if None, will create new figure

ptm_pose.analyze.annotations.gene_set_enrichment(spliced_ptms=None, altered_flanks=None, sig_col='Significance', dpsi_col='dPSI', alpha=0.05, min_dpsi=None, gene_sets=['GO_Biological_Process_2023', 'Reactome_2022'], background=None, return_sig_only=True, max_retries=5, delay=10, **kwargs)[source]#

Given spliced_ptms and/or altered_flanks dataframes (or the dataframes combined from combine_outputs()), perform gene set enrichment analysis using the enrichr API

Parameters:
spliced_ptms: pd.DataFrame

Dataframe with differentially included PTMs projected onto splicing events and with annotations appended from various databases. Default is None (will not be considered in analysis). If combined dataframe is provided, this dataframe will be ignored.

altered_flanks: pd.DataFrame

Dataframe with PTMs associated with altered flanking sequences and with annotations appended from various databases. Default is None (will not be considered). If combined dataframe is provided, this dataframe will be ignored.

combined: pd.DataFrame

Combined dataframe with spliced_ptms and altered_flanks dataframes. Default is None. If provided, spliced_ptms and altered_flanks dataframes will be ignored.

gene_sets: list

List of gene sets to use in enrichment analysis. Default is [‘KEGG_2021_Human’, ‘GO_Biological_Process_2023’, ‘GO_Cellular_Component_2023’, ‘GO_Molecular_Function_2023’,’Reactome_2022’]. Look at gseapy and enrichr documentation for other available gene sets

background: list

List of genes to use as background in enrichment analysis. Default is None (all genes in the gene set database will be used).

return_sig_only: bool

Whether to return only significantly enriched gene sets. Default is True.

max_retries: int

Number of times to retry downloading gene set enrichment data from enrichr API. Default is 5.

delay: int

Number of seconds to wait between retries. Default is 10.

**kwargs: additional keyword arguments

Additional keyword arguments to pass to the combine_outputs() function from the summarize module. These will be used to filter the spliced_ptms and altered_flanks dataframes before performing gene set enrichment analysis. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the combine_outputs() function for more options.

Returns:
results: pd.DataFrame

Dataframe with gene set enrichment results from enrichr API

ptm_pose.analyze.annotations.get_available_annotations(ptms)[source]#

Given a PTM dataframe, indicate the annotations that are available for analysis and indicate whether they have already been appended to the PTM dataset

Parameters:
ptmspd.DataFrame

contains PTM information and may have annotations already appended, such as spliced_ptms and altered_flanks dataframes generated during projection

Returns:
available_annotspd.DataFrame

DataFrame indicating the available annotation types and their sources, as well as whether they have been appended to the PTM data.

ptm_pose.analyze.annotations.get_background_annotation_counts(database='PhosphoSitePlus', annot_type='Function', **kwargs)[source]#

Given a database and annotation type, retrieve the counts of PTMs associated with the requested annotation type across all PTMs in the ptm_coordinates dataframe used for projection

Parameters:
databasestr

Source of annotation. Default is PhosphoSitePlus

annot_typestr

Type of annotation that can be found in indicated database. Default is ‘Function’. Other options include ‘Process’, ‘Disease’, ‘Enzyme’, ‘Interactions’, etc.

kwargs: additional keyword arguments

Additional keyword arguments to pass to the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.analyze.annotations.get_enrichment_inputs(ptms, annot_type='Function', database='PhosphoSitePlus', background_type='all', collapse_on_similar=False, mod_class=None, alpha=0.05, min_dpsi=0.1, **kwargs)[source]#

Given the spliced ptms, altered_flanks, or combined PTMs dataframe, identify the number of PTMs corresponding to specific annotations in the foreground (PTMs impacted by splicing) and the background (all PTMs in the proteome or all PTMs in dataset not impacted by splicing). This information can be used to calculate the enrichment of specific annotations among PTMs impacted by splicing. Several options are provided for constructing the background data: all (based on entire proteome in the ptm_coordinates dataframe) or significance (foreground PTMs are extracted from provided spliced PTMs based on significance and minimum delta PSI)

Parameters:
ptms: pd.DataFrame

Dataframe with PTMs projected onto splicing events and with annotations appended from various databases. This can be either the spliced_ptms, altered_flanks, or combined dataframe.

annot_typestr

type of annotation to pull the annotations from. Default is ‘Function’.

databasestr

source of annotations. Default is ‘PhosphoSitePlus’.

background_typestr

Type of background to construct. Options are either ‘all’ (all PTMs in proteome) or ‘significance’ (only PTMs in dataset). Note that significance option assumes that PTMs have not already been filtered for significance.

collapse_on_similarbool

Whether to collapse similar annotations (for example, “cell growth, increased” and “cell growth, decreased”) into a single category. Default is False.

mod_classstr

Type of modification to perform enrichment for

min_dpsi: float

Minimum change in PSI required to return a PTM as associated with the annotation. Default is 0.1. This can be used to filter out PTMs that are not significantly spliced.

alphafloat

Significance threshold to use to filter PTMs based on their significance. Default is 0.05. This can be used to filter out PTMs that are not significantly spliced.

kwargs: additional keyword arguments

Additional keyword arguments to pass to the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.analyze.annotations.get_ptm_annotations(ptms, annot_type='Function', database='PhosphoSitePlus', collapse_on_similar=False, min_dpsi=0.1, alpha=0.05, **kwargs)[source]#

Given spliced ptm information obtained from project and annotate modules, grab PTMs in spliced ptms associated with specific PTM modules

Parameters:
spliced_ptms: pd.DataFrame

PTMs projected onto splicing events and with annotations appended from various databases

annot_type: str

Type of annotation to pull from spliced_ptms dataframe. Available information depends on the selected database. Default is ‘Function’.

database: str

database from which PTMs are pulled. Options include ‘PhosphoSitePlus’, ‘ELM’, or ‘PTMInt’. ELM and PTMInt data will automatically be downloaded, but due to download restrictions, PhosphoSitePlus data must be manually downloaded and annotated in the spliced_ptms data using functions from the annotate module. Default is ‘PhosphoSitePlus’.

collapse_on_similarbool

Whether to collapse similar annotations (for example, “cell growth, increased” and “cell growth, decreased”) into a single category. Default is False.

min_dpsi: float

Minimum change in PSI required to return a PTM as associated with the annotation. Default is 0.1. This can be used to filter out PTMs that are not significantly spliced.

alphafloat

Significance threshold to use to filter PTMs based on their significance. Default is 0.05. This can be used to filter out PTMs that are not significantly spliced.

kwargs: additional keyword arguments

Additional keyword arguments to pass to the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

Returns:
annotationspd.DataFrame

Individual PTM information of PTMs that have been associated with the requested annotation type.

annotation_countspd.Series or pd.DataFrame

Number of PTMs associated with each annotation of the requested annotation type. If dPSI col is provided or impact col is present, will output annotation counts for each type of impact (‘Included’, ‘Excluded’, ‘Altered Flank’) separately.

ptm_pose.analyze.annotations.plot_EnrichR_pies(enrichr_results, top_terms=None, terms_to_plot=None, colors=None, edgecolor=None, row_height=0.3, type='circle', ax=None)[source]#

Given PTM-specific EnrichR results, plot EnrichR score for the provided terms, with each self point represented as a pie chart indicating the fraction of genes in the group with PTMs

Parameters:
ptm_results: pd.selfFrame

selfFrame containing PTM-specific results from EnrichR analysis

num_to_plot: int

number of terms to plot, if None, will plot all terms. Ignored if specific terms are provided in terms to plot list

terms_to_plot: list

list of terms to plot

colors: list

list of colors to use for pie slices. Default is None, which will use seaborn colorblind palette

edgecolor: str

color to use for edge of pie slices. Default is None, which will use the same color as the pie slice

row_height: float

height of each row in the plot. Default is 0.3.

type: str

type of pie chart to plot. Default is ‘circle’. Options include ‘circle’ and ‘donut’ (hole in center).

ax: matplotlib.Axes

axis to plot on, if None, will create new figure

ptm_pose.analyze.annotations.plot_annotation_counts(spliced_ptms=None, altered_flanks=None, database='PhosphoSitePlus', annot_type='Function', collapse_on_similar=True, colors=None, fontsize=10, top_terms=5, legend=False, legend_loc=(1.05, 0.5), title=None, title_type='database', ax=None, **kwargs)[source]#

Given a dataframe with PTM annotations added, plot the top annotations associated with the PTMs

Parameters:
spliced_ptms: pd.DataFrame

Dataframe with differentially included PTMs

altered_flanks: pd.DataFrame

Dataframe with PTMs associated with altered flanking sequences

database: str

Database to use for annotations. Default is ‘PhosphoSitePlus’.

annot_type: str

Type of annotation to plot. Default is ‘Function’.

collapse_on_similar: bool

Whether to collapse similar annotations into a single category. Default is True.

colors: list

List of colors to use for the bar plot. Default is None.

top_terms: int

Number of top terms to plot. Default is 5.

legend: bool

Whether to show the legend. Default is True.

legend_loc: tuple

Location of the legend. Default is None, which will place the legend in the upper right corner.

ax: matplotlib.Axes

Axis to plot on. If None, will create new figure. Default is None.

title_type: str

Type of title to use for the plot. Default is ‘database’. Options include ‘database’ and ‘detailed’.

title: str

Title to use for the plot. Default is None.

fontsize: int

Font size for the plot. Default is 10.

legend_loc: tuple

Location of the legend. Default is None, which will place the legend to the right of the plot.

**kwargs: additional keyword arguments

Additional keyword arguments, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

ptm_pose.analyze.annotations.plot_available_annotations(ptms, only_annotations_in_data=False, show_all_ptm_count=False, ax=None, **kwargs)[source]#

Given a dataframe with ptm annotations added, show the number of PTMs associated with each annotation type

Parameters:
ptms: pd.DataFrame

Dataframe with PTMs and annotations added

only_annotations_in_databool

Only plot annotations that are already appended to the dataset

show_all_ptm_count: bool

Whether to show the total number of PTMs in the dataset. Default is True.

ax: matplotlib.Axes

Axis to plot on. If None, will create new figure. Default is None.

Protein Interactions#

ptm_pose.analyze.interactions.get_edge_colors(interaction_graph, network_data, defaultedgecolor='gray', color_edges_by='Database', database_color_dict={'Multiple': 'purple', 'PSP/RegPhos': 'red', 'PTMInt': 'gold', 'PTMcode': 'blue', 'PhosphoSitePlus': 'green'})[source]#

Get the edge colors to use for a provided networkx graph, either plotting all edges the same color or coloring them based on the database they are from.

Parameters:
interaction_graph: nx.Graph

networkx graph containing the interaction network

network_data: pd.DataFrame

specific network edge data that contains information on which database the interaction is from and any other relevant information (such as regulation change)

defaultedgecolor‘str’

Default color to use for edges if no specific database color is found. Default is ‘gray’.

color_edges_bystr

How to color the edges. Options are ‘Database’ to color by the database they are from, or ‘Same’ to color all edges the same color. Default is ‘Database’.

database_color_dictdict

Colors to use for specific databases

ptm_pose.analyze.interactions.get_interaction_stats(interaction_graph)[source]#

Given the networkx interaction graph, calculate various network centrality measures to identify the most relevant PTMs or genes in the network

ptm_pose.analyze.interactions.plot_interaction_network(interaction_graph, network_data, network_stats=None, modified_color='red', modified_node_size=10, interacting_color='lightblue', interacting_node_size=1, defaultedgecolor='gray', color_edges_by='Same', seed=200, legend_fontsize=8, ax=None, proteins_to_label=None, labelcolor='black', legend=True)[source]#

Given the interaction graph and network data outputted from analyze.protein_interactions, plot the interaction network, signifying which proteins or ptms are altered by splicing and the specific regulation change that occurs. by default, will only label proteins

Parameters:
interaction_graph: nx.Graph

NetworkX graph object representing the interaction network, created from analyze.get_interaction_network

network_data: pd.DataFrame

Dataframe containing details about specifici protein interactions (including which protein contains the spliced PTMs)

network_stats: pd.DataFrame

Dataframe containing network statistics for each protein in the interaction network, obtained from analyze.get_interaction_stats(). Default is None, which will not label any proteins in the network.

modified_color: str

Color to use for proteins that are spliced. Default is ‘red’.

modified_node_size: int

Size of nodes that are spliced. Default is 10.

interacting_color: str

Color to use for proteins that are not spliced. Default is ‘lightblue’.

interacting_node_size: int

Size of nodes that are not spliced. Default is 1.

edgecolor: str

Color to use for edges in the network. Default is ‘gray’.

seed: int

Seed to use for spring layout of network. Default is 200.

ax: matplotlib.Axes

Axis to plot on. If None, will create new figure. Default is None.

proteins_to_label: list, int, or str

Specific proteins to label in the network. If list, will label all proteins in the list. If int, will label the top N proteins by degree centrality. If str, will label the specific protein. Default is None, which will not label any proteins in the network.

labelcolor: str

Color to use for labels. Default is ‘black’.

ptm_pose.analyze.interactions.plot_network_centrality(network_stats, network_data=None, centrality_measure='Degree', top_N=10, modified_color='red', interacting_color='black', ax=None)[source]#

Given the network statistics data obtained from analyze.get_interaction_stats(), plot the top N proteins in the protein interaction network based on centrality measure (Degree, Betweenness, or Closeness)

Parameters:
network_stats: pd.DataFrame

Dataframe containing network statistics for each protein in the interaction network, obtained from analyze.get_interaction_stats()

network_data: pd.DataFrame

Dataframe containing information on which proteins are spliced and how they are altered. Default is None, which will plot all proteins the same color (interacting_color)

centrality_measure: str

Centrality measure to use for plotting. Default is ‘Degree’. Options include ‘Degree’, ‘Degree Centrality’, ‘Betweenness Centrality’, ‘Closeness Centrality’.

top_N: int

Number of top proteins to plot. Default is 10.

modified_color: str

Color to use for proteins that are spliced. Default is ‘red’.

interacting_color: str

Color to use for proteins that are not spliced. Default is ‘black’.

ax: matplotlib.Axes

Axis to plot on. If None, will create new figure. Default is None.

class ptm_pose.analyze.interactions.protein_interactions(spliced_ptms, include_enzyme_interactions=True, interaction_databases=['PhosphoSitePlus', 'PTMcode', 'PTMInt', 'RegPhos', 'DEPOD', 'OmniPath'], **kwargs)[source]#

Class to assess interactions facilitated by PTMs in splicing network

Parameters:
spliced_ptms: pd.DataFrame

Dataframe with PTMs projected onto splicing events and with annotations appended from various databases

include_enzyme_interactions: bool

Whether to include interactions with enzymes in the network, such as kinase-substrate interactions. Default is True

interaction_databases: list

List of databases whose information to include in the network. Default is [‘PhosphoSitePlus’, ‘PTMcode’, ‘PTMInt’, ‘RegPhos’, ‘DEPOD’, ‘ELM’]

**kwargs: additional keyword arguments

Additional keyword arguments, which will be fed into the filter_ptms() function from the helper module. These will be used to filter ptms with lower evidence. For example, if you want to filter PTMs based on the number of MS observations, you can add ‘min_MS_observations = 2’ to the kwargs. This will filter out any PTMs that have less than 2 MS observations. See the filter_ptms() function for more options.

Attributes:
interaction_graph: nx.Graph

NetworkX graph object representing the interaction network, created from analyze.get_interaction_network

network_data: pd.DataFrame

Dataframe containing details about specifici protein interactions (including which protein contains the spliced PTMs)

network_stats: pd.DataFrame

Dataframe containing network statistics for each protein in the interaction network, obtained from analyze.get_interaction_stats(). Default is None, which will not label any proteins in the network.

Methods

compare_to_nease(nease_edges)

Given the network edges generated by NEASE, compare the network edges generated by NEASE to the network edges generated by the PTM-driven interactions

get_interaction_network([node_type])

Given the spliced PTM data, extract interaction information and construct a dataframe containing all possible interactions driven by PTMs, either centered around specific PTMs or specific genes.

get_interaction_stats()

Given the networkx interaction graph, calculate various network centrality measures to identify the most relevant PTMs or genes in the network

get_protein_interaction_network(protein)

Given a specific protein, return the network data for that protein

plot_interaction_network([modified_color, ...])

Given the interactiong graph and network data outputted from analyze.get_interaction_network, plot the interaction network, signifying which proteins or ptms are altered by splicing and the specific regulation change that occurs.

plot_nease_comparison([ax, nease_edges])

Given the comparison data generated by compare_to_nease, plot the number of edges identified by NEASE and PTM-POSE

plot_network_centrality([...])

Plot the centrality measure for the top N proteins in the interaction network based on a specified centrality measure.

summarize_protein_network(protein)

Given a protein of interest, summarize the network data for that protein

compare_to_nease(nease_edges)[source]#

Given the network edges generated by NEASE, compare the network edges generated by NEASE to the network edges generated by the PTM-driven interactions

Parameters:
nease_edgespd.DataFrame

Interactions found by NEASE, which is the output of nease.get_edges() function after running NEASE (see nease_runner module for example)

Returns:
nease_comparisonpd.DataFrame

Dataframe containing the all edges found by NEASE and PTM-POSE. This will include edges that are unique to NEASE, unique to PTM-POSE, and common between the two.

get_interaction_network(node_type='Gene')[source]#

Given the spliced PTM data, extract interaction information and construct a dataframe containing all possible interactions driven by PTMs, either centered around specific PTMs or specific genes.

Parameters:
node_type: str

What to define interactions by. Can either be by ‘PTM’, which will consider each PTM as a separate node, or by ‘Gene’, which will aggregate information across all PTMs of a single gene into a single node. Default is ‘Gene’

get_interaction_stats()[source]#

Given the networkx interaction graph, calculate various network centrality measures to identify the most relevant PTMs or genes in the network

get_protein_interaction_network(protein)[source]#

Given a specific protein, return the network data for that protein

Parameters:
protein: str

Gene name of the protein of interest

Returns:
protein_network: pd.DataFrame

Dataframe containing network data for the protein of interest

plot_interaction_network(modified_color='red', modified_node_size=10, interacting_color='lightblue', interacting_node_size=5, defaultedgecolor='gray', color_edges_by='Same', seed=200, ax=None, proteins_to_label=None, labelcolor='black', legend=True, include_nease_comparison=False)[source]#

Given the interactiong graph and network data outputted from analyze.get_interaction_network, plot the interaction network, signifying which proteins or ptms are altered by splicing and the specific regulation change that occurs. by default, will only label proteins

Parameters:
modified_color: str

Color to use for nodes that are modified by splicing. Default is ‘red’

modified_node_size: int

Size of nodes that are modified by splicing. Default is 10

interacting_color: str

Color to use for nodes that interact with modified nodes. Default is ‘lightblue’

interacting_node_size: int

Size of nodes that interact with modified nodes. Default is 5

defaultedgecolor: str

Color to use for edges in the network. Default is ‘gray’. Can choose to color by database by providing a dictionary with database names as keys and colors as values or by specifying ‘database’ as color

color_edges_by: str

How to color edges in the network. Default is ‘Same’, which will color all edges the same color. Can also specify ‘Database’ to color edges based on the database they are from. If using ‘Database’, please provide a dictionary with database names as keys and colors as values in defaultedgecolor parameter.

seed: int

Seed to use for random number generator. Default is 200

ax: matplotlib.pyplot.Axes

Axes object to plot the network. Default is None

plot_nease_comparison(ax=None, nease_edges=None)[source]#

Given the comparison data generated by compare_to_nease, plot the number of edges identified by NEASE and PTM-POSE

Parameters:
axmatplotlib.pyplot.Axes

axes to plot on

nease_edgespd.DataFrame, optional

Interactions found by NEASE, which is the output of nease.get_edges() function after running NEASE (see nease_runner module for example). Only needed if you have not run comparison_to_nease() previously

plot_network_centrality(centrality_measure='Degree', top_N=10, modified_color='red', interacting_color='black', ax=None)[source]#

Plot the centrality measure for the top N proteins in the interaction network based on a specified centrality measure. This will help identify the most relevant PTMs/genes in the network.

Parameters:
centrality_measure: str

How to calculate centrality. Options include ‘Degree’, ‘Degree Centrality’, ‘Betweenness Centrality’, ‘Closeness Centrality’, and ‘Eigenvector Centrality’. Default is ‘Degree’.

top_N: int

Number of top proteins to plot based on the centrality measure. Default is 10.

modified_color: str

Color to use for proteins that are spliced. Default is ‘red’.

interacting_color: str

Color to use for proteins that are not spliced. Default is ‘black’.

ax: matplotlib.pyplot.Axes

Axes object to plot the centrality bar plot. If None, a new figure will be created. Default is None.

summarize_protein_network(protein)[source]#

Given a protein of interest, summarize the network data for that protein

Parameters:
protein: str

Gene name of the protein of interest

Enzyme Regulation#

ptm_pose.analyze.enzyme.compare_KL_for_sequence(inclusion_seq, exclusion_seq, dpsi=None, comparison_type='percentile')[source]#

Given two sequences, compare the kinase library scores, percentiles, or ranks for each sequence. Optionally, provide a dPSI value to calculate the relative change in preference for each kinase.

Parameters:
inclusion_seqstr

sequence to score for inclusion preference, with modification lowercased

exclusion_seqstr

sequence to score for exclusion preference, with modification lowercased

dpsifloat

dPSI value for the PTM event, which will be used to calculate the relative change in preference for each kinase (score difference * dPSI). Default is None.

comparison_typestr

type of comparison to perform. Can be ‘percentile’, ‘score’, or ‘rank’. Default is ‘percentile’.

ptm_pose.analyze.enzyme.get_all_KL_scores(seq_data, seq_col, kin_type=['ser_thr', 'tyrosine'], score_type='percentiles')[source]#

Given a dataset with flanking sequences, score each flanking sequence

Parameters:
seq_datapandas dataframe

processed dataframe containing flanking sequences to score

seq_colstr

column in seq_data containing the flanking sequences

kin_typelist

list of kinase types to score. Can be ‘ser_thr’ or ‘ST’ for serine/threonine kinases, or ‘tyrosine’ or ‘Y’ for tyrosine kinases. Default is [‘ser_thr’, ‘tyrosine’].

score_typestr

type of score to calculate. Can be ‘percentile’, ‘score’, or ‘rank’. Default is ‘percentile’.

Returns:
merged_datadict

dictionary containing the merged dataframes for each kinase type with KinaseLibrary scores

Flanking Sequences#

ptm_pose.analyze.flanking_sequences.compare_inclusion_motifs(flanking_sequences, elm_classes=None)[source]#

Given a DataFrame containing flanking sequences with changes and a DataFrame containing ELM class information, identify motifs that are found in the inclusion and exclusion events, identifying motifs unique to each case. This does not take into account the position of the motif in the sequence or additional information that might validate any potential interaction (i.e. structural information that would indicate whether the motif is accessible or not). ELM class information can be downloaded from the download page of elm (http://elm.eu.org/elms/elms_index.tsv).

Parameters:
flanking_sequences: pandas.DataFrame

DataFrame containing flanking sequences with changes, obtained from get_flanking_changes_from_splice_data()

elm_classes: pandas.DataFrame

DataFrame containing ELM class information (ELMIdentifier, Regex, etc.), downloaded directly from ELM (http://elm.eu.org/elms/elms_index.tsv). Recommended to download this file and input it manually, but will download from ELM otherwise

Returns:
flanking_sequences: pandas.DataFrame

DataFrame containing flanking sequences with changes and motifs found in the inclusion and exclusion events

ptm_pose.analyze.flanking_sequences.findAlteredPositions(seq1, seq2, flank_size=5)[source]#

Given two sequences, identify the location of positions that have changed

Parameters:
seq1, seq2: str

sequences to compare (order does not matter)

flank_size: int

size of the flanking sequences (default is 5). This is used to make sure the provided sequences are the correct length

Returns:
altered_positions: list

list of positions that have changed

residue_change: list

list of residues that have changed associated with that position

flank_side: str

indicates which side of the flanking sequence the change has occurred (N-term, C-term, or Both)

ptm_pose.analyze.flanking_sequences.find_motifs(seq, elm_classes)[source]#

Given a sequence and a dataframe containinn ELM class information, identify motifs that can be found in the provided sequence using the RegEx expression provided by ELM (PTMs not considered). This does not take into account the position of the motif in the sequence or additional information that might validate any potential interaction (i.e. structural information that would indicate whether the motif is accessible or not). ELM class information can be downloaded from the download page of elm (http://elm.eu.org/elms/elms_index.tsv).

Parameters:
seq: str

sequence to search for motifs

elm_classes: pandas.DataFrame

DataFrame containing ELM class information (ELMIdentifier, Regex, etc.), downloaded directly from ELM (http://elm.eu.org/elms/elms_index.tsv)

ptm_pose.analyze.flanking_sequences.getSequenceIdentity(seq1, seq2)[source]#

Given two flanking sequences, calculate the sequence identity between them using Biopython and parameters definded by Pillman et al. BMC Bioinformatics 2011

Parameters:
seq1, seq2: str

flanking sequence

Returns:
normalized_score: float

normalized score of sequence similarity between flanking sequences (calculated similarity/max possible similarity)

ptm_pose.analyze.flanking_sequences.plot_alterations_matrix(altered_flanks, modification_class=None, residue=None, title='', ax=None)[source]#

Given the altered flanking sequences dataframe, plot a matrix showing the positions of altered residues for specific proteins, as well as the specific change

Parameters:
altered_flanks: pd.DataFrame

Dataframe with altered flanking sequences, and annotated information added with analyze.compare_flanking_sequences

modification_class: str

Specific modification class to plot. Default is None, which will plot all modification classes.

residue: str

Specific residue to plot. Default is None, which will plot all residues.

title: str

Title of the plot. Default is ‘’ (no title).

ax: matplotlib.Axes

Axis to plot on. If None, will create new figure. Default is None.

ptm_pose.analyze.flanking_sequences.plot_location_of_altered_flanking_residues(altered_flanks, figsize=(4, 3), modification_class=None, residue=None)[source]#

Plot the number of PTMs with altered residues as specific positions relative to the PTM site. This includes the specific position of the residue (-5 to +5 from PTM site) and the specific side of the PTM site that is altered (N-term or C-term)

Parameters:
altered_flanks: pd.DataFrame

Dataframe with altered flanking sequences, and annotated information added with analyze.compare_flanking_sequences

figsize: tuple

Size of the figure. Default is (4,3).

modification_class: str

Specific modification class to plot. Default is None, which will plot all modification classes.

residue: str

Specific residue to plot. Default is None, which will plot all residues.

ptm_pose.analyze.flanking_sequences.plot_sequence_differences(inclusion_seq, exclusion_seq, dpsi=None, flank_size=5, figsize=(3, 1))[source]#

Given the flanking sequences for a PTM resulting from a specific splice event, plot the differences between the two sequences, coloring the changing residues in red. If dPSI is also provided, will add an arrow to the plot indicating the direction of change

Parameters:
inclusion_seq: str

Sequence of the inclusion isoform (with spliced region included)

exclusion_seq: str

Sequence of the exclusion isoform (with spliced region excluded)

dpsi: float

Change in PSI for the specific splice event. Default is None, which will not add an arrow to the plot.

flank_size: int

Size of flanking region to plot. Default is 5. This must be less than half the length of the shortest sequence.

figsize: tuple

Size of the figure. Default is (3,1).