Wang Lu Individual genes may be annotated using a variety of R and Python programs. Some common choices are:
1. biomaRt (R package) enables the retrieval of gene annotation data from a variety of sources (e.g. Ensembl, UniProt, etc.)
2. pyensembl (Python package) - a Pythonic interface to Ensembl data, including gene annotation.
3. AnnotationDbi (R package) - offers a consistent interface to a range of annotation resources, such as gene ontology and KEGG pathways.
4. BioPython (Python package) - a collection of bioinformatics-related Python modules, including one for processing and editing GenBank entries, which may be used to get gene annotation information.
5. GORILLA (R package) - a R tool for high-throughput gene ontology (GO) analysis, which may be used to annotate genes with their GO concepts.
Many of these programs may annotate genes with disease-related information, such as disease-gene connections from the Online Mendelian Inheritance in Man (OMIM) database or GeneCards.
Biopython (Python): a set of tools for biological computation, including a module for working with genetic sequences and annotation.
BioMart (R, Perl, Python): a data management system that allows users to access and query large biological datasets, including gene annotation data.
AnnotationDbi (R): a package for accessing and manipulating annotation data in a variety of formats, including gene annotation data.
GEOquery (R): a package for downloading and analyzing gene expression data from the Gene Expression Omnibus (GEO) database, which includes annotation information for individual genes.
HGNC (Python): a python package that allow to retrieve information from HGNC database such as symbols, names, and references.