Probing where and how PTM flanking sequences are altered#
In order to understand how PTMs may be altered due to splicing events, it is useful to identify the flanking sequences of the PTMs and how they may be altered due to nearby splice events (as identified by flanking sequence module). Once we have, this information we can analyze and visualize where the alterations in the flanking sequences occur. First, we need to compare the flanking sequences of PTMs based on whether an exonic region is included or excluded using the
compare_flanking_sequences
function in PTM-POSE.
[1]:
from ptm_pose import analyze
import pandas as pd
# Load altered flank data
altered_flanks = pd.read_csv('altered_flanks.csv')
altered_flanks = analyze.compare_flanking_sequences(altered_flanks)
print('Comparison of flanking sequences:')
altered_flanks[['UniProtKB Accession', 'Residue', 'PTM Position in Isoform', 'Modification Class', 'Inclusion Flanking Sequence', 'Exclusion Flanking Sequence', 'Sequence Identity', 'Altered Positions', 'Residue Change', 'Altered Flank Side']].head()
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Cell In[1], line 5
2 import pandas as pd
4 # Load altered flank data
----> 5 altered_flanks = pd.read_csv('altered_flanks.csv')
7 altered_flanks = analyze.compare_flanking_sequences(altered_flanks)
8 print('Comparison of flanking sequences:')
File ~/miniconda3/envs/pose_doc/lib/python3.12/site-packages/pandas/io/parsers/readers.py:1026, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
1013 kwds_defaults = _refine_defaults_read(
1014 dialect,
1015 delimiter,
(...)
1022 dtype_backend=dtype_backend,
1023 )
1024 kwds.update(kwds_defaults)
-> 1026 return _read(filepath_or_buffer, kwds)
File ~/miniconda3/envs/pose_doc/lib/python3.12/site-packages/pandas/io/parsers/readers.py:620, in _read(filepath_or_buffer, kwds)
617 _validate_names(kwds.get("names", None))
619 # Create the parser.
--> 620 parser = TextFileReader(filepath_or_buffer, **kwds)
622 if chunksize or iterator:
623 return parser
File ~/miniconda3/envs/pose_doc/lib/python3.12/site-packages/pandas/io/parsers/readers.py:1620, in TextFileReader.__init__(self, f, engine, **kwds)
1617 self.options["has_index_names"] = kwds["has_index_names"]
1619 self.handles: IOHandles | None = None
-> 1620 self._engine = self._make_engine(f, self.engine)
File ~/miniconda3/envs/pose_doc/lib/python3.12/site-packages/pandas/io/parsers/readers.py:1880, in TextFileReader._make_engine(self, f, engine)
1878 if "b" not in mode:
1879 mode += "b"
-> 1880 self.handles = get_handle(
1881 f,
1882 mode,
1883 encoding=self.options.get("encoding", None),
1884 compression=self.options.get("compression", None),
1885 memory_map=self.options.get("memory_map", False),
1886 is_text=is_text,
1887 errors=self.options.get("encoding_errors", "strict"),
1888 storage_options=self.options.get("storage_options", None),
1889 )
1890 assert self.handles is not None
1891 f = self.handles.handle
File ~/miniconda3/envs/pose_doc/lib/python3.12/site-packages/pandas/io/common.py:873, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
868 elif isinstance(handle, str):
869 # Check whether the filename is to be opened in binary mode.
870 # Binary mode does not support 'encoding' and 'newline'.
871 if ioargs.encoding and "b" not in ioargs.mode:
872 # Encoding
--> 873 handle = open(
874 handle,
875 ioargs.mode,
876 encoding=ioargs.encoding,
877 errors=errors,
878 newline="",
879 )
880 else:
881 # Binary mode
882 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'altered_flanks.csv'
Note, we only calculate these metrics for cases where altered flanking sequences do not cause a stop codon to be introduced, as this is harder to interpret (such as for the first PTM in the list). The above table will indicate the positions in the flanking sequence that are altered, how similar the altered flanking sequence is to the original flanking sequence, and the specific residue change that takes place. We can also plot some of this information to get a better sense of the distribution of altered flanking sequences:
[2]:
from ptm_pose import plots as pose_plots
pose_plots.location_of_altered_flanking_residues(altered_flanks)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[2], line 3
1 from ptm_pose import plots as pose_plots
----> 3 pose_plots.location_of_altered_flanking_residues(altered_flanks)
NameError: name 'altered_flanks' is not defined
We can even create the same plot for specific modification types or residues, as well as label the specific residue changes that occur:
[3]:
pose_plots.location_of_altered_flanking_residues(altered_flanks, modification_class='Acetylation')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[3], line 1
----> 1 pose_plots.location_of_altered_flanking_residues(altered_flanks, modification_class='Acetylation')
NameError: name 'altered_flanks' is not defined
If we want to dig deeper, we can look at the specific changes that occurring, although this is only recommended with a selected subset of PTMs, such as those that may have a functional impact:
[4]:
pose_plots.alterations_matrix(altered_flanks.head(10))
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[4], line 1
----> 1 pose_plots.alterations_matrix(altered_flanks.head(10))
NameError: name 'altered_flanks' is not defined