chemicalchecker.database.uniprotkb.UniprotKB

class UniprotKB(version, host=None, user=None, pwd=None, port=None)[source]

Bases: object

This class provides an interface to querying the internal database UniprotKB

Methods

close_conn

get_protein

Returns the specified protein record

get_proteins

Returns the records for all specified proteins.

get_reference_proteome

Returns the set of Uniprot ACs belonging to the reference proteome for the organism corresponding to the tax_id.

map_names_to_uniprot_acs

Maps names to Uniprot AC (like ORF names, for ex.).

map_protein_to_uniref100_representative

Returns the Uniprot ACs representative of the Uniref100 cluster to which uniprot_ac belongs.

map_secondary_to_primary

Maps a secondary Uniprot AC to a primary one.

map_xrefs_to_uniprot_acs

Maps external IDs to Uniprot AC.

pick_reference

Among a set of ambiguously mapped uniprot ACs that, supposedly, refer to the same entity, this function picks the "best", defined as the one corresponding to one of the organism tax ids that is:

Attributes

DEFAULT_HOST

DEFAULT_PWD

DEFAULT_USER

SRC_DB_EMBL

SRC_DB_ENSEMBL

SRC_DB_FLYBASE

SRC_DB_GENEID

SRC_DB_GI

SRC_DB_HGNC

SRC_DB_REFSEQ

SRC_DB_SGD

SRC_DB_WORMBASE

UNIPROTKB_TABLES

get_protein(uniprot_ac, limit_to_fields=None)[source]

Returns the specified protein record

get_proteins(uniprot_acs, limit_to_fields=None)[source]

Returns the records for all specified proteins.

get_reference_proteome(tax_id, only_reviewed=False)[source]

Returns the set of Uniprot ACs belonging to the reference proteome for the organism corresponding to the tax_id.

map_names_to_uniprot_acs(names, filter_sources=None, filter_taxids=None)[source]

Maps names to Uniprot AC (like ORF names, for ex.).

map_protein_to_uniref100_representative(uniprot_ac)[source]

Returns the Uniprot ACs representative of the Uniref100 cluster to which uniprot_ac belongs.

map_secondary_to_primary(uniprot_ac)[source]

Maps a secondary Uniprot AC to a primary one.

map_xrefs_to_uniprot_acs(ids, filter_dbs=None)[source]

Maps external IDs to Uniprot AC.

pick_reference(uniprot_acs, organism_tax_ids=None, only_one=True)[source]

Among a set of ambiguously mapped uniprot ACs that, supposedly, refer to the same entity, this function picks the “best”, defined as the one corresponding to one of the organism tax ids that is:

  • the longest among the ones that are reviewed and assigned to the Complete and Reference proteome

  • if not, the longest among the ones that are reviewed and assigned to the Reference proteome

  • if not, the longest among the ones that are reviewed and assigned to the Complete proteome

  • if not, the longest among the ones that are assigned to the Complete and Reference proteome

  • if not, the longest among the ones that are assigned to the Reference proteome

  • if not, the longest among the ones that are assigned to the Complete proteome

  • if not, the longest among the ones that are reviewed

  • if not, the longest

If only_one is set to false, instead of returning the “longest among…” it returns the entire list of proteins satisfying each condition.