DANSy
This is the base DANSy class that contains all the methods necessary to analyze a collection of proteins.
- class dansy.dansy(protsOI=None, ref=None, n=10, interproIDs=None, **kwargs)[source]
A domain n-gram network built off either a list of proteins of interest or a reference file generated by CoDIAC. If InterPro IDs are provided will extract n-grams that contain only those IDs. Default values will generate a 10-gram network model.
- Parameters:
- protsOIlist
List of UniProt IDs whose n-grams are desired to generate the n-gram network.
- refpandas DataFrame (Recommended)
Dataframe that has been generated from CoDIAC containing both InterPro and UniProt information.
- nint (Optional)
N-gram lengths to be extracted
- interproIDslist (Optional)
List of Interpro IDs to extract n-grams. If omitted, all n-grams will be extracted.
- Attributes:
- G: networkx Graph
The network graph representation of the DANSy n-gram network
- ref: pandas DataFrame
The reference file information for the proteins within the dataset
- n: int
The maximum length of n-grams being extracted
- interproIDs: list
A list of all protein domain InterPro IDs that were found within the dataset
- protsOI: list
The UniProt IDs for the proteins found within the dataset
- ngrams: list
The extracted domain n-grams
- collapsed_ngrams: list
The domain n-grams which were collapsed into other n-grams which represent the set of proteins
- adj: pandas DataFrame
The adjacency matrix for the n-gram network for the DANSy analysis
- interpro2uniprot: dict
The keys of InterPro IDs with values of a list of UniProt IDs that have the InterPro ID
- min_arch: int (Default: 1)
The minimum number of domain architectures for an n-gram to be retained.
- max_node_len: int
The maximum n-gram length that will be retained during the collapsing step to represent n-grams sharing the same set of proteins. This will not be larger than n (Default of 10).
- collapse: bool
Whether the n-grams were collapsed
- readable_flag: bool
Whether the n-grams are human-legible
- verbose: bool
Whether progress statements are to be printed during calculations
- network_params: dict
Key-value pairs of acceptable networkx drawing parameters
Methods
- draw_network()[source]
Draws a basic version of the Graph with the networkx spring layout implementation. It is recommended to use the actual networkx package for a full implementation.
- ngram_protein_count(ngram)[source]
Returns the number of proteins an individual n-gram was found within.
- retrieve_protein_info(prot=None, ngram=None)[source]
Retrieves the reference information of proteins of interest. If an InterPro ID/n-gram is provided it will use that instead and search for the proteins with that ID and return that instead.
DANSy Supporting Functions
These are functions integrated into DANSy, which are not necessary to call, but can aid in initial setup of a DANSy object.