Can Shannon diversity index be used to identify the diversity among TnSeq libraries cultured under two different growth conditions?

12 July 2022 4 1K Report

Hello all,

I have sequenced results of TnSeq library of Agrobacterium tumefaciens grown at two different growth conditions. My overall goal is to see which genes are important/beneficial for the survival at two different conditions. To identify that, I want to look at the under-or over-representation of transposon insertion number or transposon insertion density per gene and compare the diversity of transposons per gene between the two conditions. The genes with the lowest transposon density would be considered as beneficial for that condition (assuming transposon insertion disrupted the gene's function which was important for survival in that condition).

I have both transposon insertion (density/gene) and frequency of particular insertion (read count) in a gene data in an excel file.

Firstly, I am planning to use Shannon diversity index to look if there is any transposon diversity difference between conditions. Since this diversity considers both richness (number of different inserts) and evenness (frequency of particular insert) per community, I decided on calculating the diversity using this index. However, I am not sure if it tells us specifically where the dissimilarities are? Also, is Bray Curtis similarity index helpful in this kind of situation?

Once I calculate the diversity index, I want to look for statistical significance of the diversity difference. I have heard about ANOSIM, ADONIS and PERMANOVA but I am not sure if these statistical models would be helpful in this case. Could anyone please clarify on this?

I hope my questions are clear enough, but if not please let me know!

Thanks in advance for your help.

Trevor John Kenchington

I have some understanding of the Shannon index, primarily as it is (mis)applied in ecological work, but none concerning the system you are considering applying it to. From that perspective:

The Shannon index was developed as a measure of the information content of codes and only subsequently adopted as an index of species diversity. It can be (and has been) used with many kinds of data and I see no reason why it should not be applied to yours.

I would caution, however, that Shannon's equation can only correctly be applied to an infinite sample. The common practice of applying it to finite ones is wrong, though the magnitude of the resulting errors might not matter to you. There is a parallel index (H) that can correctly be applied to finite samples, Shannon's H' then being found as the asymptotic value of H as sample size increases, though that is very rarely done.

Next: "Exponential Shannon" (the exponent of H' -- which is also the first-order Hill Number) is a more meaningful measure than H' itself. A data set with double the diversity of some other data set will have twice the value of Exponential Shannon, whereas that doubling is not seen with H'.

However ...

I have to ask why you would bother with all that work. Quantitative ecologists stopped using the Shannon Index (or other indices of species diversity) half a century ago, when multivariate statistical analyses became generally available. H' (like all other indices of species diversity) treats every species (or, in your application, every transposon) as interchangeable, which they are clearly not. In contrast, a Bray-Curtis Similarity Matrix treats each as unique and different. You can then analyze such a Matrix in many ways: clustering, MDS, ANOSIM and so on. Perhaps you have reason to consider diversity for itself (as Shannon did when considering codes). If not, application of any diversity index involves throwing away much of the information content of your data and that is rarely a sensible thing to do.

If you do have reason to focus on diversity (for itself, rather than as an over-simplified summary of data), then you should probably look at other orders of Hill Numbers, not just Exponential Shannon.

I hope that is more helpful than confusing!

Trevor Kenchington

Gayatri Sharma

Thank you for your response Dr. Trevor John Kenchington. To clarify on looking at the diversity: Since my major purpose is to see if there is any variation in the transposon density/per gene between two conditions and based on my understanding, Bray Curtis gives the compositional dissimilarity between two different sites where it considers species in common between both sites along with total species. However, unlike ecological data, transposon data doesn't tell us if there is any common transposon unless you look at the position of transposon inserts. In that case, would Bray Curtis be appropriate index to move forward with?. Also, regarding looking at the statistical significance of those index values, do matrices like clustering, MDS, ANOSIM as mentioned above give certain significance values (like p values) as an identifier?

Since I am new to this, I am trying to figure out if whatever I am planning to do is a correct approach. I would again be grateful if you could clarify this.

Trevor John Kenchington

With species-abundance data, the diversity indices (such as Shannon) treat the species from two sites as AA, AB, AC etc from the firsts site and BA, BB, BC etc from the second. Those indices ignore any connections, such as AA being the same species as BC. In contrast, the Bray-Curtis metric looks at particular, named species. The abundance of Homo sapiens at one site is compared to the abundance of H.sapiens at other sites, never interchanged with the abundance of Canis lupus, for example. I do not know enough about transposons to be sure but I suspect that you could only use Bray-Curtis if your data lists them by insert position.

But can you use even the diversity indices without that information? In species-abundance work, you need to know how many individuals of species AA are in the sample, how many of AB etc., so you need species identifications for each individual. The analytical approach then throws away the information on which species name corresponds to "AA" but that is only discarded after the specimens have been identified to species. I suspect that you would need to list your transposons by their positions in order to calculate a Shannon value and, once you have those data, it is just as easy to prepare a Bray-Curtis matrix.

Once you do have such a matrix:

Clustering leads to cluster diagrams. They are not sued in formal hypothesis testing and there is no associated "significance". Likewise, MDS ordinates data. It does not compare groups nor lead to probabilities other being different.

SIMPER and ANOSIM (as a pair) are used in comparing and contrasting groups. They do generate probability values. However, they were designed for analyzing ecological data. You should think carefully before applying them to your data. The nature of the uncertainties (the "error term") in your data may be quite different to what SIMPER and ANOSIM were intended for!

Trevor

Gayatri Sharma

Thank you Dr. Trevor John Kenchington. I really appreciate your help. I will look into suggested index and matrix and move forward with the appropriate one.

Which type of compound does lamda max of 218 indicate in a uv-vis spectrum of a partially purified compound through column and TLC?

Why Do TDS and EC Increase with Larger Wastewater Volumes, While BOD and COD Decrease?

How to enrich pig excreta for increasing nutrient quality organically ?

Is it possible to plot the atom-projected band structure using GPAW?

How does grain and grain boundary affect the ceramic when studying its dielectric properties?

Unusual intensity drop in some sections of chromatograms in DDA?

Leaf area of tomato ?

Why did the authors extrapolate a phenotype that they experimentally proved in one bacterial strain across the whole genus of the organism?

How to preform densitometry on SDS-page bands?

XRD Analysis is showing only Calcium carbonate. It is not showing other compounds. Can anyone help me get the other compounds?

Are there any instruments for studying time similar to the way it is in space?

How to increase citation in Research Gate?

H-index issues?

How do microbial communities contribute to ecological cycles on Earth and how does an agroecological system support diverse microbial communities?

BD Cell Viability Kit - TO evaporation?

Can somebody share a diversity and inclusion questionnaire for the business context or military context?

Refractive index in the W-band (75-110 GHz) of some material (Teflon - PTFE) ?

Using Kruskal-Wallis-Test when analyzing abundance and diversity indices?

Scholarly Journals Accepting Submissions?

QUESTIONs about Annexin V/ PI apoptosis experiment: Are these pictures acceptable?