How to address the inflated degrees of freedom in "dissimilarity modeling"?

13 March 2024 0 1K Report

I was asked to analyze a "community" response and the suggested response is a dissimilarity matrix. Ecologists take a sample at different locations (i) they however want to know the change in community composition as a function of the distances between DVs among the locations (i).

The IV: The samples contain information on the type of species and abundance at each location. To calculate a "community" response the Bray-Curtis dissimilarity is calculated over the species abundance at each site. This represents the response as having the domain [0, 1] (possibly but not always including 0, 1).

The DVs: positive real numbers with a domain (0, Inf). The pairwise distance between the DV at each location is calculated as the absolute distance (Euclidean distance).

The IV now consist of a matrix of i rows and j columns where each row and each column represent a location and each ij combination a Bray Curtis dissimilarity. The diagonal and above diagonal are removed and the matrix is converted to a long format. The long format contains the unique comparisons ij. If we assume an equal size matrix of n=40 then i=1 is compared to each j={1, ..., n} although not exact because only i=1 is compared to all j minus itself. For the DV each combination of the location i to j the absolute distance was calculated and merged with the corresponding response.

This matrix comparison is problematic because for n=40 we end up with 40*(40-1)/2=780 comparisons. This is a huge inflation of the degrees of freedom (df). One could model the variance of each i to j={1, ..., n} comparison separately. But this does not reduce the issue of non-independence because each i or j is still more closely related to its counterparts. More stressing is that the df is still inflated.

I have provided an example in R code in the appendix, but feel uncomfortable with this method, but was asked to do so. I have left all inferential statistics out, with exception of the point estimate. Moreover, I have in detail laid out the argument of why to so. However, the reviewer suggested this article: https://doi.org/10.1111/geb.13459. The authors in the suggested article mention that “subsampling site-pairs would limit the degree to which this assumption of the underlying GLM methodology is being violated”. Yet, this is similar to modeling the variance of each pairwise combination.

Thus fitting a beta-regression and modeling each ij comparison would provide similar results. Every decision therefore seems a bad decision in this case. However, the method is surprisingly common to use in ecology.

Any kind words of advice?

Best,

Badges
Science topic

More Wim Kaijser's questions See All

Cost effectiveness (or ICER) of ICU admission for COVID patients ??

Performing a costeffectiveness analysis (CEA) on a cohort of admissions during a crisis is only possible along the way or in retrospect. We included a series of ICU admitted patients from a...

24 October 2023 4,367 0 View

Why is it P(|T|>=t)?

The p-value from the T-test is denoted as P(|T|>=t). Yet, observing a t-statistic similair or more extreme then T under H0 based on what I know does not fit with this notation. Often a small...

05 September 2023 9,944 3 View

Details Japanese sanctions on Russia?

Am looking for details on the sanctions on Russia by Japan. Some material is published by METI in English, but most in Japanes. Need a good overview of them or a recent comparison with UK, US end...

02 February 2023 3,503 0 View

Do we estimate parameters, statistics or both: least ambiguity of wording?

A parameter is defined as a value of the population whereas a statistic is a value of the data, i.e., the mean can be a statistic or parameter. However, it can be quite ambiguously and the...

14 November 2022 5,140 26 View

Minor "statistics" and credibility intervals?

For some smaller and less know "statistics" often no option to calculated the error or confidence intervals is given. However, this might be obtained by bootstrapping. In addition, both McElrath...

12 September 2022 4,690 3 View

Is it possible to calculate confidence intervals for CLES via Fishers Z transformation?

Determining intervals for the common language effect size (CLES), probability of superiority (PS), Area Under the Curve (AUC) or Exceedance Probability (EP) is possible via multiple method Ruscia...

10 May 2022 6,889 4 View

Waht is the effect of browsing animals on crown formation in trees?

I am looking for an article on the influence of grazing animals (cattle and game) on branch sagging. I suspect that when the buds of the lowest branches are eaten, the hormones in the branch...

23 March 2022 6,985 3 View

Is a weighted quasibinomial GLM reasonable?

I am exploring some data and and possibilities of quasibinomial GLM. The data is less than perfect. Nonetheless, the target variable can range from 0 till 1 and from my knowledge it seems okay (is...

14 September 2021 4,667 3 View

Is correction for multiple comparisons needed?

I have been wondering about this for a bit and forgive my ignorance. Consider we loath the NHST approach, but value the information the p-value gives. I consider the we have a "perfect"...

18 August 2021 7,462 6 View

Are there any macrophyte experts willing to share data?

I have multiple models (hobby Sisyphean app: https://snwikaij.shinyapps.io/RF_macrophyte_models/) predicting (at least trying) presence/absence of riverine macrophytes. I would like to include...

02 June 2021 2,331 0 View

How can I prepare virus for a TEM or SEM imaging?

I have virus (viral hemorrhagic septicemia virus) in suspension and the experiment will not involve cells. What level of TCID50 is preferred?

11 August 2024 3,115 1 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Handling Missing Data and Building a Predictive Model with Incomplete Information ?

I am developing a predictive model for a water supply network that involves 20 influencing points. However, I only have historical data for 10 out of these 20 points. I would like to know how to...

10 August 2024 4,005 2 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Is it possible to use the Fused Deposition Modeling (FDM) to additively manufacture interconnected porous structure generation of >100-200 micrometer?

Usually, additive manufacturing techniques like SEBM, SLS, and SLM are used for interconnected porous lattice structure generation with sizes of >100–200 micrometers. Can the Fused Deposition...

09 August 2024 7,892 0 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How to define an anisotropic material with asymmetric elastic compliance/stiffness matrix in ANSYS APDL?

I need to model an anisotropic material in which the Poisson's ratio ν_12 ≠ ν_21 and so on. Therefore, the elastic compliance matrix wouldn't be a symmetric one. In ANSYS APDL, for TB,ANEL...

09 August 2024 5,048 2 View