I am an undergraduate neuroscience and bioinformatics research assistant and my personal project has been to explore the gut microbiota of an EAE mouse model. There are three treatment groups: untreated control, Complete Freund's Adjuvant only, and MOG+CFA. Samples were taken from 6 control, 6 CFA, and 5 CFA+MOG mice over 5 time points (1 before and 4 after starting the EAE experiment).

I have since been focusing on analyzing a network created from the abundance data. Counts were normalized with DESeq2 and split into each treatment group. These count matrices were used to calculate a Spearman's rank correlation coefficient matrix for each group. This technique was applied to permutation testing with 100 randomized matrices generated. From the original matrix, all coefficients that fell below a 5% significance level in their corresponding distribution were set to 0 and correlations above were set to 1. In addition, any coefficients below 0.5 in magnitude were set to 0. This binary matrix was used to create unweighted network for each group.

I have since been focused on using over-representation testing on various features in the networks. Of key interest is how the networks are divided into communities/modules. I am using the fast greedy algorithm from igraph due to time/computing constraints but that could change based on suggestions.

Currently, I have been testing for whether a certain taxa-taxa interaction is over-represented in one module versus the rest of the network. I am using fisher's exact test where the 2x2 matrix can be split into within the network vs outside the network and the taxa-taxa interaction vs every other interaction. The counts correspond to each edge in the network that fulfills the criteria of each cell.

The data I get back is a matrix of a taxon-taxon interaction per each row and a module for each column. The values are p-values from the hypothesis testing. There are 3 such matrixes, one for each network/treatment group. I also have a matrix of the counts for each interaction/edge in each module.

My question is how can I better use the results of this data to derive biological insight? I have looked into dividing up bacteria into functional classes and potentially machine learning applications, but there are no standout programs that I know of that could readily take this data. The goals of this project are to better understand the structural changes in the gut microbiota during EAE and possibly to discover specific features like keystone taxa or co-occurrence groups that are gained or lost in the MOG+CFA group. Of particular interest are any OTUs related to the Lactobacillus/Bacilli lineage.

Similar questions and discussions