DANSy

This is the base DANSy class that contains all the methods necessary to analyze a collection of proteins.

class dansy.dansy(protsOI=None, ref=None, n=10, interproIDs=None, **kwargs)[source]

A domain n-gram network built off either a list of proteins of interest or a reference file generated by CoDIAC. If InterPro IDs are provided will extract n-grams that contain only those IDs. Default values will generate a 10-gram network model.

Parameters:
protsOIlist

List of UniProt IDs whose n-grams are desired to generate the n-gram network.

refpandas DataFrame (Recommended)

Dataframe that has been generated from CoDIAC containing both InterPro and UniProt information.

nint (Optional)

N-gram lengths to be extracted

interproIDslist (Optional)

List of Interpro IDs to extract n-grams. If omitted, all n-grams will be extracted.

Attributes:
G: networkx Graph

The network graph representation of the DANSy n-gram network

ref: pandas DataFrame

The reference file information for the proteins within the dataset

n: int

The maximum length of n-grams being extracted

interproIDs: list

A list of all protein domain InterPro IDs that were found within the dataset

protsOI: list

The UniProt IDs for the proteins found within the dataset

ngrams: list

The extracted domain n-grams

collapsed_ngrams: list

The domain n-grams which were collapsed into other n-grams which represent the set of proteins

adj: pandas DataFrame

The adjacency matrix for the n-gram network for the DANSy analysis

interpro2uniprot: dict

The keys of InterPro IDs with values of a list of UniProt IDs that have the InterPro ID

min_arch: int (Default: 1)

The minimum number of domain architectures for an n-gram to be retained.

max_node_len: int

The maximum n-gram length that will be retained during the collapsing step to represent n-grams sharing the same set of proteins. This will not be larger than n (Default of 10).

collapse: bool

Whether the n-grams were collapsed

readable_flag: bool

Whether the n-grams are human-legible

verbose: bool

Whether progress statements are to be printed during calculations

network_params: dict

Key-value pairs of acceptable networkx drawing parameters

Methods

draw_network()[source]

Draws a basic version of the Graph with the networkx spring layout implementation. It is recommended to use the actual networkx package for a full implementation.

ngram_protein_count(ngram)[source]

Returns the number of proteins an individual n-gram was found within.

static retrieve_longest_arch(ref)[source]

Retrieves the longest domain architecture length.

retrieve_protein_info(prot=None, ngram=None)[source]

Retrieves the reference information of proteins of interest. If an InterPro ID/n-gram is provided it will use that instead and search for the proteins with that ID and return that instead.

retrieve_random_ids(num, iters=50, seed=882)[source]

Generator of random UniProt IDs from the base network.

summary(detailed=False)[source]

Output a summary of key features that are represented within the n-gram network.

DANSy Supporting Functions

These are functions integrated into DANSy, which are not necessary to call, but can aid in initial setup of a DANSy object.

dansy.helper.import_proteome_files(ref_file_dir='./DANSY_DATA/', ref_file_suffix='20250512.csv')[source]

Imports the files that are used for the generation of the reference dataframe of the complete canonical proteome.

Note: Need to adjust this so it looks in only one folder from here on out.