Integrate the Structure and Reference files

CoDIAC.IntegrateStructure_Reference.add_reference_info_to_struct_file(struct_file, ref_file, out_file, INTERPRO=True, verbose=False)[source]

Given a PDB meta structure file and a Uniprot reference, integrate the two pieces to add information from reference

Parameters:
struct_file: str

Name of structure reference file

ref_file: str

Name of reference file

out_file: str

name of output file to write

INTERPRO: boolean

If True, uses Interpro, otherwise appends Uniprot from reference file. Recommended behavior is to use Interpro - it is more inclusive of domain boundaries and has better naming conventions, along with perserving ability to use the Interpro ID for filtering strucutres containing domains of interest.

verbose: boolean

Print information about processing. Default is False.

Returns:
out_struct: pandas dataframe

the appended dataframe of the structure (also written to out_file)

CoDIAC.IntegrateStructure_Reference.filter_structure_file(appended_structure_file, Interpro_ID, filtered_structure_file)[source]

Given an annotated structure file, keep only structures that have at least one chain that contain the Interpro_ID of interest.

Prints the filtered structure file to filtered_structure_file

Parameters:
appended_structure_file: str

location of the UniProt reference file that also has been appended using InterPro.appendRefFile to add interpro structures.

Interpro_ID: str

Interpro ID (controlled Interpro ID database identifier)

filtered_structure_file: str

location to write the output file to - all the same column fields, but reducing the rows to only those that contain an Interpro_ID of interest.