Integrate the Structure and Reference files¶
- CoDIAC.IntegrateStructure_Reference.add_reference_info_to_struct_file(struct_file, ref_file, out_file, INTERPRO=True, verbose=False)[source]¶
Given a PDB meta structure file and a Uniprot reference, integrate the two pieces to add information from reference
- Parameters:
- struct_file: str
Name of structure reference file
- ref_file: str
Name of reference file
- out_file: str
name of output file to write
- INTERPRO: boolean
If True, uses Interpro, otherwise appends Uniprot from reference file. Recommended behavior is to use Interpro - it is more inclusive of domain boundaries and has better naming conventions, along with perserving ability to use the Interpro ID for filtering strucutres containing domains of interest.
- verbose: boolean
Print information about processing. Default is False.
- Returns:
- out_struct: pandas dataframe
the appended dataframe of the structure (also written to out_file)
- CoDIAC.IntegrateStructure_Reference.filter_structure_file(appended_structure_file, Interpro_ID, filtered_structure_file)[source]¶
Given an annotated structure file, keep only structures that have at least one chain that contain the Interpro_ID of interest.
Prints the filtered structure file to filtered_structure_file
- Parameters:
- appended_structure_file: str
location of the UniProt reference file that also has been appended using InterPro.appendRefFile to add interpro structures.
- Interpro_ID: str
Interpro ID (controlled Interpro ID database identifier)
- filtered_structure_file: str
location to write the output file to - all the same column fields, but reducing the rows to only those that contain an Interpro_ID of interest.