KSTAR
The “Config” Module
- kstar.config.create_network_pickles(phosphoTypes=['Y', 'ST'], network_directory='./NETWORKS/NetworKIN')[source]
Given network files declared in globals, create pickles of the kstar object that can then be quickly loaded in analysis Assumes that the Network structure has two folders Y and ST under the NETWORK_DIR global variable and that all .csv files in those directories should be loaded into a network pickle.
- kstar.config.install_resource_files()[source]
Retrieves RESOURCE_FILES that are the companion for this version release from FigShare, unzips them to the correct directory for resource files.
- kstar.config.update_network_directory(directory, create_pickles=True, KSTAR_DIR='/home/srcrowl/miniconda3/envs/documentation/lib/python3.10/site-packages', NETWORK_DIR='./NETWORKS/NetworKIN')[source]
Update the location of network the network files, and verify that all necessary files are located in directory
- Parameters
- directory: string
path to where network files are located
The “Prune” Module
The “Pruner” Class
- class kstar.prune.Pruner(network, logger, phospho_type='Y')[source]
Pruning Algorithm used for KSTAR.
- Parameters
- networkpandas df
kinase-site prediction network where there is an accession, site, kinase, and score column
- logger
logger used for pruning
- phospho_typestr
phospho_type(s) to use when building pruned networks
- columnsdict
relevant columns in network
Methods
build_multiple_compendia_networks
(...[, ...])Builds multiple compendia-limited networks
build_pruned_network
(network, kinase_size, ...)Builds a heuristic pruned network where each kinase has a specified number of connected sites and each site has an upper limit to the number of kinases it can connect to
calculate_compendia_sizes
(kinase_size)Calculates the number of sites per compendia size that a kinase should connect to using same ratios of compendia sizes as found in compendia
compendia_pruned_network
(compendia_sizes, ...)Builds a compendia-pruned network that takes into account compendia size limits per kinase
- build_multiple_compendia_networks(kinase_size, site_limit, num_networks, network_id, odir, PROCESSES=1)[source]
Builds multiple compendia-limited networks
- Parameters
- kinase_size: int
number of sites each kinase should connect to
- site_limitint
upper limit of number of kinases a site can connect to
- num_networks: int
number of networks to build
- network_idstr
id to use for each network in dictionary
- Returns
- pruned_networksdict
key : <network_id>_<i> value : pruned network
- build_pruned_network(network, kinase_size, site_limit)[source]
Builds a heuristic pruned network where each kinase has a specified number of connected sites and each site has an upper limit to the number of kinases it can connect to
- Parameters
- networkpandas DataFrame
network to build pruned network on
- kinase_size: int
number of sites each kinase should connect to
- site_limitint
upper limit of number of kinases a site can connect to
- Returns
- pruned networkpandas DataFrame
subset of network that has been pruned
- calculate_compendia_sizes(kinase_size)[source]
Calculates the number of sites per compendia size that a kinase should connect to using same ratios of compendia sizes as found in compendia
- Parameters
- kinase_size: int
number of sites each kinase should connect to
- Returns
- sizesdict
key : compendia size value : number of sites each kinase should pull from given compendia size
- compendia_pruned_network(compendia_sizes, site_limit, odir)[source]
Builds a compendia-pruned network that takes into account compendia size limits per kinase
- Parameters
- compendia_sizesdict
key : compendia size value : number of sites to connect to kinase
- site_limitint
upper limit of number of kinases a site can connect to
- Returns
- pruned_networkpandas DataFrame
subset of network that has been pruned according to compendia ratios
Functions to Perform Pruning
- kstar.prune.run_pruning(network, log, use_compendia, phospho_type, kinase_size, site_limit, num_networks, network_id, odir, PROCESSES=1)[source]
Generate pruned networks from a weighted kinase-substrate graph and log run information
- Parameters
- network: pandas dataframe
kinase substrate network matrix, with values indicating weight of kinase-substrate relationship
- log: logger
logger to document the pruning process from start to finish
- use_compendia: string
whether to use compendia ratios to build network
- phospho_type: string
phospho type (‘Y’, ‘ST’, …)
- kinase_size: int
number of sites a kinase connects to
- site_limit: int
upper limit of number of kinases can connect to
- num_networks: int
number of networks to generate
- network_id: string
name of network to use in building dictionary
- odir: string
output directory for results
- Returns
- pruner: Prune object
prune object that contains the number of pruned networks indicated by the num_networks paramater
- kstar.prune.save_pruning(phospho_type, network_id, kinase_size, site_limit, use_compendia, odir, log)[source]
Save the pruned networks generated by run_pruning function as a pickle to be loaded by KSTAR
- Parameters
- phosho_type: string
type of phosphomodification to networks were generated for (either ‘Y’ or ‘ST’)
- network_id: string
name of network used to build dictionary
- kinase_size: int
number of sites a kinase connects to
- site_limit: int
upper limit of number of kinases can connect to
- use_compendia: string
whether compendia was used for ratios to build networks
- odir: string
output directory for results
- log: logger
logger to document pruning process from start to finish
- Returns
- Nothing
- kstar.prune.save_run_information(results, use_compendia, pruner)[source]
Save information about the generation of networks during run_pruning, including the parameters used for generation. Primarily used when running bash script.
- Parameters
- results:
object that stores all parameters used in the pruning process
- use_compendia: string
whether compendia was used for ratios to build network
- pruner: Prune object
output of the run_pruning() function
- Returns
- Nothing
The “ExperimentMapper” class
- class kstar.mapping.ExperimentMapper(experiment, columns, logger, sequences=None, compendia=None, window=7, data_columns=None)[source]
Given an experiment object and reference sequences, map the phosphorylation sites to the common reference. Inputs
- Parameters
- namestr
Name of experiment. Used for logging
- experiment: pandas dataframe
Pandas dataframe of an experiment that has a reference accession, a peptide column and/or a site column. The peptide column should be upper case, with lower case indicating the site of phosphorylation - this is preferred The site column should be in the format S/T/Y<pos>, e.g. Y15 or S345
- columns: dict
Dictionary with mappings of the experiment dataframe column names for the required names ‘accession_id’, ‘peptide’, or ‘site’. One of ‘peptide’ or ‘site’ is required.
- logger: Logger object
used for logging when peptides cannot be matched and when a site location changes
- sequences: dict
Dictionary of sequences. Key : accession. Value : protein sequence. Default is imported from kstar.config
- compendia: pd.DataFrame
Human phosphoproteome compendia, mapped to KinPred and annotated with number of compendia. Default is imported from kstar.config
- windowint
The length of amino acids to the N- and C-terminal sides of the central phosphoprotein to map a site to. Default is 7.
- data_columns: list, or empty
The list of data columns to use. If this is empty, logger will look for anything that starts with statement data: and those values Default is None.
- Attributes
- experiment: pandas dataframe
mapped experiment, which for each peptide, no contains the mapped accession, site, peptide, number of compendia, compendia type
- sequences: dict
Dictionary of sequences passed into the class
- compendia: pandas dataframe
compendia dataframe passed into the class
- data_columns: list
indicates which columns will be used as data
Methods
align_sites
([window])Map the peptide/sites to the common sequence reference and remove and report errors for sites that do not align as expected.
Return the mapped experiment dataframe
get_sequence
(accession)Gets the sequence that matches the given accession
set_data_columns
(data_columns)Identifies which columns in the experiment should be used as data columns.
- align_sites(window=7)[source]
Map the peptide/sites to the common sequence reference and remove and report errors for sites that do not align as expected. expMapper.align_sites(window=7). Operates on the experiment dataframe of class.
- Parameters
- window: int
The length of amino acids to the N- and C-terminal sides of the central phosphoprotein to map a site to.
- set_data_columns(data_columns)[source]
Identifies which columns in the experiment should be used as data columns. If data_columns is provided, then ‘data:’ is added to the front and experiment dataframe is renamed. Otherwise, function will look for columns with ‘data:’ in front and this to the data_columns attribute.
The “KinaseActivity” class
- class kstar.calculate.KinaseActivity(evidence, logger, data_columns=None, phospho_type='Y')[source]
Kinase Activity calculates the estimated activity of kinases given an experiment using hypergeometric distribution. Hypergeometric distribution examines the number of protein sites found to be active in evidence compared to the number of protein sites attributed to a kinase on a provided network.
- Parameters
- evidencepandas df
a dataframe that contains (at minimum, but can have more) data columms as evidence to use in analysis and KSTAR_ACCESSION and KSTAR_SITE
- data_columns: list
list of the columns containing the abundance values, which will be used to determine which sites will be used as evidence for activity prediction in each sample
- loggerLogger object
keeps track of kstar analysis, including any errors that occur
- phospho_type: string, either ‘Y’ or ‘ST’
indicates the phoshpo modification of interest
- Attributes
- ——————-
- Upon Initialization
- ——————-
- evidence: pandas dataframe
inputted evidence column
- data_columns: list
list of columns containing abundance values, which will be used to determine which sites will be used as evidence. If inputted data_columns parameter was None, this lists includes in column in evidence prefixed by ‘data:’
- loggerLogger object
keeps track of kstar analysis, including any errors that occur
- phospho_type: string
indicated phosphomod of interest
- network_directory: string
directory where kinase substrate networks can be downloaded, as indicated in config.py
- normalized: bool
indicates whether normalization analysis has been performed
- aggregate: string
the type of aggregation to use when determining binary evidence, either ‘count’ or ‘mean’. Default is ‘count’.
- threshold: float
cutoff to use when determining what sites to use for each experiment
- greater: bool
indicates whether sites with greater or lower abundances than the threshold will be used
- run_data: string
indicates the date that kinase activity object was initialized
- ———————————
- After Hypergeometric Calculations
- ———————————
- activities_list: pandas dataframe
p-values obtained for all pruned networks indicating statistical enrichment of a kinase’s substrates for each network, based on hypergeometric tests
- activities: pandas dataframe
median p-values obtained from the activities_list object for each experiment/kinase
- agg_activities: pandas dataframe
- ———————————–
- After Random Enrichment Calculation
- ———————————–
- random_experiments: pandas dataframe
contains information about the sites randomly sampled for each random experiment
- random_kinact: KinaseActivity object
KinaseActivity object containing random activities predicted from each of the random experiments
- —————————
- After Mann Whitney Analysis
- —————————
- activities_mann_whitney: pandas dataframe
p-values obtained from comparing the real distribution of p-values to the distribution of p-values from random datasets, based the Mann Whitney U-test
- fpr_mann_whitney: pandas dataframe
false positive rates for predicted kinase activities
Methods
calculate_Mann_Whitney_activities_sig
(log[, ...])For a kinact_dict, where random generation and activity has already been run for the phospho_types of interest, this will calculate the Mann-Whitney U test for comparing the array of p-values for real data to those of random data, across the number of networks used.
calculate_kinase_activities
([agg, ...])Calculates combined activity of experiments based that uses a threshold value to determine if an experiment sees a site or not To use values use 'mean' as agg mean aggregation drops NA values from consideration To use count use 'count' as agg - present if not na
calculate_random_activities
(logger[, ...])Generate random experiments and calculate the kinase activities for these random experiments
Checks data columns to make sure column is in evidence and that evidence filtered on that data column has at least one point of evidence.
create_binary_evidence
([agg, threshold, greater])Returns a binary evidence data frame according to the parameters passed in for method for aggregating duplicates and considering whether a site is included as evidence or not
return date that kinase activities were run
set_data_columns
([data_columns])Sets the data columns to use in the kinase activity calculation If data_columns is None or an empty list then set data_columns to be all columns that start with data:
- calculate_Mann_Whitney_activities_sig(log, number_sig_trials=100, PROCESSES=1)[source]
For a kinact_dict, where random generation and activity has already been run for the phospho_types of interest, this will calculate the Mann-Whitney U test for comparing the array of p-values for real data to those of random data, across the number of networks used. It will also calculate the false positive rate for a pvalue, given observations of a random bootstrapping analysis
- Parameters
- kinact_dict: dictionary
A dictionary of kinact objects, with keys ‘Y’ and/or ‘ST’
- log: logger
Logger for logging activity messages
- phospho_types: {[‘Y’, ‘ST’], [‘Y’], [‘ST’]}
Which substrate/kinaset-type to run activity for: Both [‘Y, ‘ST’] (default), Tyrosine [‘Y’], or Serine/Threonine [‘ST’]
- number_sig_trials: int
Maximum number of significant trials to run
- Returns
- calculate_kinase_activities(agg='mean', threshold=1.0, greater=True, PROCESSES=1)[source]
Calculates combined activity of experiments based that uses a threshold value to determine if an experiment sees a site or not To use values use ‘mean’ as agg
mean aggregation drops NA values from consideration
To use count use ‘count’ as agg - present if not na
- Parameters
- data_columnslist
columns that represent experimental result, if None, takes the columns that start with `data:’’ in experiment. Pass this value in as a list, if seeking to calculate on fewer than all available data columns
- thresholdfloat
threshold value used to filter rows
- agg{‘count’, ‘mean’}
method to use when aggregating duplicate substrate-sites. ‘count’ combines multiple representations and adds if values are non-NaN ‘mean’ uses the mean value of numerical data from multiple representations of the same peptide.
NA values are droped from consideration.
- greater: Boolean
whether to keep sites that have a numerical value >=threshold (TRUE, default) or <=threshold (FALSE)
- Returns
- activitiesdict
key : experiment value : pd DataFrame
network : network name, from networks key kinase : kinase examined frequency : number of times kinase was seen in subgraph of evidence and network kinase_activity : hypergeometric kinase activity
- calculate_random_activities(logger, num_random_experiments=150, PROCESSES=1)[source]
Generate random experiments and calculate the kinase activities for these random experiments
- check_data_columns()[source]
Checks data columns to make sure column is in evidence and that evidence filtered on that data column has at least one point of evidence. Removes all columns that do not meet criteria
- create_binary_evidence(agg='mean', threshold=1.0, greater=True)[source]
Returns a binary evidence data frame according to the parameters passed in for method for aggregating duplicates and considering whether a site is included as evidence or not
- Parameters
- thresholdfloat
threshold value used to filter rows
- agg{‘count’, ‘mean’}
method to use when aggregating duplicate substrate-sites. ‘count’ combines multiple representations and adds if values are non-NaN ‘mean’ uses the mean value of numerical data from multiple representations of the same peptide.
NA values are droped from consideration.
- greater: Boolean
whether to keep sites that have a numerical value >=threshold (TRUE, default) or <=threshold (FALSE)
- Returns
- evidence_binarypd.DataFrame
Matches the evidence dataframe of the kinact object, but with 0 or 1 if a site is included or not. This is uniquified and rows that are never used are removed.
The “DotPlot” class
- class kstar.plot.DotPlot(values, fpr, alpha=0.05, inclusive_alpha=True, binary_sig=True, dotsize=5, colormap={0: '#6b838f', 1: '#FF3300'}, facecolor='white', labelmap=None, legend_title='p-value', size_number=5, size_color='gray', color_title='Significant', markersize=10, legend_distance=1.0, figsize=(20, 4), title=None, xlabel=True, ylabel=True, x_label_dict=None, kinase_dict=None)[source]
The DotPlot class is used for plotting dotplots, with the option to add clustering and context plots. The size of the dots based on the values dataframe, where the size of the dot is the area of the value * dotsize
- Parameters
- values: pandas DataFrame instance
values to plot
- fprpandas DataFrame instance
false positive rates associated with values being plotted
- alpha: float, optional
fpr value that defines the significance cutoff to use when plt default : 0.05
- inclusive_alpha: boolean
whether to include the alpha (significance <= alpha), or not (significance < alpha). default: True
- binary_sig: boolean, optional
indicates whether to plot fpr with binary significance or as a change color hue default : True
- dotsizefloat, optional
multiplier to use for scaling size of dots
- colormapdict, optional
maps color values to actual color to use in plotting default : {0: ‘#6b838f’, 1: ‘#FF3300’}
- labelmap =
maps labels of colors, default is to indicate FPR cutoff in legend default : None
- facecolorcolor, optional
Background color of dotplot default : ‘white’
- legend_titlestr, optional
Legend Title for dot sizes, default is `p-value’
- size_numberint, optional
Number of dots to attempt to generate for dot size legend
- size_colorcolor, optional
Size Legend Color to use
- color_titlestr, optional
Legend Title for the Color Legend
- markersizeint, optional
Size of dots for Color Legend
- legend_distanceint, optional
relative distance to place legends
- figsizetuple, optional
size of dotplot figure
- titlestr, optional
Title of dotplot
- xlabelbool, optional
Show xlabel on graph if True
- ylabelbool, optional
Show ylabel on graph if True
- x_label_dict: dict, optional
Mapping dictionary of labels as they appear in values dataframe (keys) to how they should appear on plot (values)
- kinase_dict: dict, optional
Mapping dictionary of kinase names as they appear in values dataframe (keys) to how they should appear on plot (values)
- Attributes
- values: pandas dataframe
a copy of the original values dataframe
- fpr: pandas dataframe
a copy of the original fpr dataframe
- alpha: float
cutoff used for significance, default 0.05
- inclusive_alpha: boolean
whether to include the alpha (significance <= alpha), or not (significance < alpha)
- significance: pandas dataframe
indicates whether a particular kinases activity is significant, where fpr <= alpha is significant, otherwise it is insignificant
- colors: pandas dataframe
dataframe indicating the color to use when plotting: either a copy of the fpr or significance dataframe
- binary_sig: boolean
indicates whether coloring will be done based on binary significance or fpr values. Default True
- labelmap: dict
indicates how to label each significance color
- figsize: tuple
size of the outputted figure, which is overridden if axes is provided for dotplot
- title: string
title of the dotplot
- xlabel: boolean
indicates whether to plot x-axis labels
- ylabel: boolean
indicates whether to plot y-axis labels
- colormap: dict
colors to be used when plotting
- facecolor: string
background color of dotplot
Methods
cluster
(ax[, method, metric, orientation, ...])Performs hierarchical clustering on data and plots result to provided Axes.
context
(ax, info, id_column, context_columns)Context plot is generated and returned.
dotplot
([ax, orientation, size_legend, ...])Generates the dotplot plot, where size is determined by values dataframe and color is determined by significant dataframe
drop_kinases
(kinase_list)Given a list of kinases, drop these from the dot.values dataframe in all future plotting of this object.
Drop kinases from the values dataframe (inplace) when plotting if they are never observed as significant
- cluster(ax, method='single', metric='euclidean', orientation='top', color_threshold=- inf)[source]
Performs hierarchical clustering on data and plots result to provided Axes. result and significant dataframes are ordered according to clustering
- axmatplotlib Axes instance
Axes to plot dendogram to
- methodstr, optional
The linkage algorithm to use.
- metricstr or function, optional
The distance metric to use in the case that y is a collection of observation vectors; ignored otherwise. See the pdist function for a list of valid distance metrics. A custom distance function can also be used.
- orientationstr, optional
The direction to plot the dendrogram, which can be any of the following strings: ‘top’: Plots the root at the top, and plot descendent links going downwards. (default). ‘bottom’: Plots the root at the bottom, and plot descendent links going upwards. ‘left’: Plots the root at the left, and plot descendent links going right. ‘right’: Plots the root at the right, and plot descendent links going left.
- context(ax, info, id_column, context_columns, dotsize=200, markersize=20, orientation='left', color_palette='colorblind', margin=0.2, make_legend=True)[source]
Context plot is generated and returned. The context plot contains the categorical data used for describing the data.
- Parameters
- axmaptlotlib axis
where to map subtype information to
- infopandas df
Dataframe where context information is pulled from
- id_column: str
Column used to map the subtype information to
- context_columnslist
list of columns to pull context informaiton from
- dotsizeint, optional
size of context dots
- markersize: int, optional
size of legend markers
- orientationstr, optional
orientation to plot context plots to - determines where legends are placed options : left, right, top, bottom
- color_palettestr, optional
seaborn color palette to use
- margin: float, optional
margin
- make_legendbool, optional
whether to create legend for context colors
- dotplot(ax=None, orientation='left', size_legend=True, color_legend=True, max_size=None)[source]
Generates the dotplot plot, where size is determined by values dataframe and color is determined by significant dataframe
- Parameters
- axmatplotlib Axes instance, optional
axes dotplot will be plotted on. If None then new plot generated
Supporting Functions
Master Functions for Running KSTAR Pipeline
- kstar.calculate.Mann_Whitney_analysis(kinact_dict, log, number_sig_trials=100, PROCESSES=1)[source]
For a kinact_dict, where random generation and activity has already been run for the phospho_types of interest, this will calculate the Mann-Whitney U test for comparing the array of p-values for real data to those of random data, across the number of networks used. It will also calculate the false positive rate for a pvalue, given observations of a random bootstrapping analysis
- Parameters
- kinact_dict: dictionary
A dictionary of kinact objects, with keys ‘Y’ and/or ‘ST’
- log: logger
Logger for logging activity messages
- number_sig_trials: int
Maximum number of significant trials to run
- kstar.calculate.enrichment_analysis(experiment, log, networks, phospho_types=['Y', 'ST'], data_columns=None, agg='mean', threshold=1.0, greater=True, PROCESSES=1)[source]
Function to establish a kstar KinaseActivity object from an experiment with an activity log add the networks, calculate, aggregate, and summarize the hypergeometric enrichment into a final activity object. Should be followed by randomized_analyis, then Mann_Whitney_analysis.
- Parameters
- experiment: pandas df
experiment dataframe that has been mapped, includes KSTAR_SITE, KSTAR_ACCESSION, etc.
- log: logger object
Log to write activity log error and update to
- networks: dictionary of dictionaries
Outer dictionary keys are ‘Y’ and ‘ST’. Establish a network by loading a pickle of desired networks. See the helpers and config file for this. If downloaded from FigShare, then the GLOBAL network pickles in config file can be loaded For example: networks[‘Y’] = pickle.load(open(config.NETWORK_Y_PICKLE, “rb” ))
- phospho_types: {[‘Y’, ‘ST’], [‘Y’], [‘ST’]}
Which substrate/kinaset-type to run activity for: Both [‘Y, ‘ST’] (default), Tyrosine [‘Y’], or Serine/Threonine [‘ST’]
- data_columnslist
columns that represent experimental result, if None, takes the columns that start with `data:’’ in experiment. Pass this value in as a list, if seeking to calculate on fewer than all available data columns
- agg{‘count’, ‘mean’}
method to use when aggregating duplicate substrate-sites. ‘count’ combines multiple representations and adds if values are non-NaN ‘mean’ uses the mean value of numerical data from multiple representations of the same peptide.
NA values are droped from consideration.
- thresholdfloat
threshold value used to filter rows
- greater: Boolean
whether to keep sites that have a numerical value >=threshold (TRUE, default) or <=threshold (FALSE)
- Returns
- kinactDict: dictionary of Kinase Activity Objects
Outer keys are phosphoTypes run ‘Y’ and ‘ST’ Includes the activities dictionary (see calculate_kinase_activities) aggregation of activities across networks (see aggregate activities) activity summary (see summarize_activities)
- kstar.calculate.randomized_analysis(kinact_dict, log, num_random_experiments=150, PROCESSES=1)[source]
Creates random experiments, drawn from the human phosphoproteome, according to the distribution of the number of compendia that each data column in the experiment has for num_random_experiments. Kinase activity calculation is then run on every random experiment.
- Parameters
- kinact_dict: KinaseActivities dictionary
Has keys [‘Y’] and/or [‘ST’] and values that are KinaseActivity objects. These objects are modified to add normalization
- log: logger
Logger for logging activity messages
- num_random_experiments: int
Number of random experiments, for each data column, to create and run activity from
Functions for Saving and Loading KSTAR results
- kstar.calculate.from_kstar_nextflow(name, odir, log=None)[source]
Given the name and output directory of a saved kstar analyis from the nextflow pipeline, load the results into new kinact object with the minimum dataframes required for analysis (binary experiment, hypergeometric activities, normalized activities, mann whitney activities)
- Parameters
- name: string
The name to used when saving activities and mapped data
- odir: string
Output directory of saved files
- log: logger
logger used when loading nextflow data into kinase activity object. If not provided, new logger will be created.
- kstar.calculate.from_kstar_slim(name, odir, log)[source]
Given the name and output directory of a saved kstar analyis, load the parameters and minimum dataframes needed for reinstantiating a kinact object This minimum list will allow you to repeat normalization or mann whitney at a different false positive rate threshold and plot results.
- Parameters
- name: string
The name to used when saving activities and mapped data
- odir: string
Output directory of saved files and parameter pickle
- log: logger
Logger for logging activity messages
- kstar.calculate.save_kstar(kinact_dict, name, odir, PICKLE=True)[source]
Having performed kinase activities (run_kstar_analyis), save each of the important dataframes to files and the final pickle Saves an activities, aggregated_activities, summarized_activities tab-separated files Saves a pickle file of dictionary
- Parameters
- kinact_dict: dictionary of Kinase Activity Objects
Outer keys are phosphoTypes run ‘Y’ and ‘ST’ Includes the activities dictionary (see calculate_kinase_activities) aggregation of activities across networks (see aggregate activities) activity summary (see summarize_activities)
- name: string
The name to use when saving activities
- odir: string
Outputdirectory to save files and pickle to
- PICKLE: boolean
Whether to save the entire pickle file
- Returns
- Nothing
- kstar.calculate.save_kstar_slim(kinact_dict, name, odir)[source]
Having performed kinase activities (run_kstar_analyis), save each of the important dataframes, minimizing the memory storage needed to get back to a rebuilt version for plotting results and analysis. For each phospho_type in the kinact_dict, this will save three .tsv files for every activities analysis run, two additional if random analysis was run, and two more if Mann Whitney based analysis was run. It also creates a readme file of the parameter values used
- Parameters
- kinact_dict: dictionary of Kinase Activity Objects
Outer keys are phosphoTypes run ‘Y’ and ‘ST’ Includes the activities dictionary (see calculate_kinase_activities) aggregation of activities across networks (see aggregate activities) activity summary (see summarize_activities)
- name: string
The name to use when saving activities
- odir: string
Outputdirectory to save files and pickle to
- Returns
- Nothing
Other Helper Functions
- kstar.helpers.convert_acc_to_uniprot(df, acc_col_name, acc_col_type, acc_uni_name)[source]
Given an experimental dataframe (df) with an accession column (acc_col_name) that is not uniprot, use uniprot to append an accession column of uniprot IDS
- Parameters
- df: pandas.DataFrame
Dataframe with at least a column of accession of interest
- acc_col_name: string
name of column to convert FROM
- acc_col_type: string
Uniprot string designation of the accession type to convert FROM, see https://www.uniprot.org/help/api_idmapping
- acc_uni_name:
name of new column
- Returns
- appended_df: pandas.DataFrame
Input dataframe with an appended acc column of uniprot IDs
- kstar.helpers.get_logger(name, filename)[source]
Finds and returns logger if it exists. Creates new logger if log file does not exist
- Parameters
- namestr
- log name
- filenamestr
- location to store log file