What is the most appropriate test to use given the data I have and the research question I am to address?

28 September 2021 6 5K Report

Hello,

I need some help in figuring out the best way to analyse a set of data for my research project.

The project that aims to investigate consistency of distance travelled in repeat missing children. My IV is the number of missing incidents in a series (i.e. 2 incidents, 3 incidents, 5 incidents and so on; I have these as intervals i.e. 4-10 incidents, 11-20 incidents and so on, as well as a string variable i.e. number of missing incidents rather than intervals of numbers) and my DVs are six distance intervals for which I have computed similarity scores using Jaccard's coefficient. The question I am looking at is whether consistency in distance travelled increases or decreases as the number of incidents in the series increases (one way to think about it is whether someone with 3 missing incidents is more or less likely to travel within the same distance interval than is someone with 15 missing incidents? OR is there a difference between the mean distance similarity score of someone with 3 missing incidents and someone with 15 missing incidents?).

For each case in a series I have coded the distance travelled as 1(yes) or 0(no) for each distance interval (i.e. if during an incident an individual travelled between 0-5 miles, this variable is coded with one, and the value is 0 for all other intervals). I then calculated the Jaccard's coefficient for that case for each of the distance intervals, as well as a mean Jaccard across all distance intervals. I have also calculated the mean Jaccard's coefficient for all cases in a series per each of the different distance intervals.

One option for analysis is MANOVA. However, about 4.6% of my data (700 series of missing incidents) are outliers (as shown by the Mahalanobis distance test). I am aware MANOVA is sensitive to outliers, however I would not want to exclude these all together from the analysis. Should I run the analysis with both the outliers and without them, and report both? Or what is the best way to work around this? Is it absolutely necessary that I delete the outliers?

Also, my data seems to best fit an exponential distribution, how may this effect the results of the MANOVA?

Do people have any suggestions about other tests that may be appropriate for investigating this relationship between number of missing incidents and consistency in distance travelled?

Anticipated thanks to anyone who takes time to read this.

Emmanuel Curis

One short advice: never, never, never delete « outliers » just because the statistical model cannot handle them.

I'm not quite clear about your model and question, but it seems strange to use MANOVA for 0-1 coded variables or class variables...

Lewis Maccarter

Maybe this will get you into the literature for a nonparametric approach:

Article Nonparametric MANOVA approaches for non-normal multivariate ...

The amount of information loss (hence power loss) by using a nonparametric approach is not very large if the data is normal and is often a gain when the data is not normal. Likewise, outliers are not a big problem for nonparametric approaches because they emphasize ordinal (position) as opposed to differences in magnitude (size alone).

David Eugene Booth

First, MANOVA is rarely the method you want to use in modern times. Second what do you mean by your data best fits an exponential distribution. MANOVA requirements are about multivariate normals so why MANOVA? To be honest your question doesn't make much sense as written. Perhaps you should reconsider and try to write another question for submission that is more clear. Best wishes, David Booth

Iulia Moisi

Thank you for all your replies! I don't think I made it clear that my dependent variables are not the actual distance intervals, but rather the Jaccard's similarity coefficients for each of those intervals, but these have been useful and prompted me to do more focused reading for my study.

Iulia Moisi

I think the simplest way to put it is that I wish to see

1. If there is a significant difference between groups with differing number of missing incidents on their similarity scores (measured using Jaccard's coefficient)

2. If the number of incidents (IV) can predict consistency in distance travelled (DV - there are 6 different similarity scores for each distance interval included in the analysis and an overall similarity score across all intervals)

My data are not normally distributed, and I am not sure what test best fits the purpose of the analysis (I am using SPSS).

Paul Somerfield

Another problem you have is that dissimilarities are not independent of each other. They represent differences/distances among objects in multivariate space, and if you 'move' an object (change its relationship to another object by altering the value) you also change its relationship with other objects. As mentioned above, there are good non-parametric multivariate methods that are widely used in other field, like ecology, which could easily be employed. The main ones I would use to answer your questions would be ANOSIM (fully non-parametric) or PERMANOVA (semi-parametric). This paper mentions most of the relevant literature: Article Analysis of similarities (ANOSIM) for 2‐way layouts using a ...

Are central bank digital currencies (CBDCs) a necessity or a "solution in search of a problem", meaning "a lot of risk for very little reward"?

MANOVA or a different test? Is parametric or non-parametric the best option?

Can you help me define cutoff frequencies for a FIR band-pass filter (for filtering an EEG signal) by pointing me towards existing data/studies?

Hello! Can anybody help me with a Face Muscle EMG in txt format, I need it to be read in LabVIEW for my degree?

PK DNA extraction efficiency?

What is online training in convolutional neural networks?

When exactly can I use the terminology "phenotypic features" ?

How to define the main factors influencing a country´s image from the standpoint of public diplomacy? How to develop a scale to assess country image?

Simulation of freezing in ASPEN Plus?

After 1 week of incubation on tetracosane, my bacterial strain seems to produce this chemical compound: C23H32O2.... what can this be?

Do you think can be any diamond in A type eclogites?

U you think We need a website software of Blackbody radiation law expert software?

Enhancing Critical Thinking Skills for Slow Learners: A Review of Empirical Studies?

How to preform densitometry on SDS-page bands?

Do you think can be any gas and oil bearing rocks in Eastern part of Iran?

What is the best sampling strategy?

Would you like to join our meta analysis research team?

Do you think in which age of geology laterites can be more than other ages in Iran?

Difficulty with permittivitt and Magnetic Permeability Calculations?

How to use Desmond in HPC ?