I want to evaluate the neighborhood of a particular gene among different bacterial lineages. The main objective is to develop a gene context metric that measures if the physical association of a neighboring to a gene of interest has a biological meaning or not (or just evaluating if the probability of a given gene to be found next to the gene of interest).

Do you know a software that is able to do it?

If not, I was wondering that I could do the following algorithm, based on (https://www.pnas.org/content/115/23/E5307.short): * Identify the gene of interest among the genomes; * Cut a "gene island" containing the gene of interest and the neighboring sequences (10 kb upstream and downstream); * Group the neighboring genes based on similarity (bidirectional blast-hits or cd-hit); * Create a "mock" genome without the gene island; * Blast the consensus of the homolog groups (or a representative sequence) to the each database, "gene islands" and "mock genomes"; * Count the hits in each database and calculate the metric; * Neighboring_metric = (hits_gene_islands_database - hits_mock_genomes) / (hits_gene_islands_database + hits_mock_genomes)

Does it sound reasonable to you?

More Fernando Hayashi Sant'Anna's questions See All
Similar questions and discussions