How can I identify control & test group for hypothesis testing in a large dataset?

More Serban Anghel's questions See All

Do you think can be any Uranium bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about Uranium ore deposits in world.

11 August 2024 6,720 0 View

Do you think can be any diamond bearing rocks in Eastern part of Iran and western part of Afghanistan?

I want to know more about diamond ore deposits in world.

11 August 2024 2,167 1 View

What is the difference between mathematical R^4 space and physical 4D unit space?

We assume that the difference is huge and that it is not possible to compare the two spaces. The R^4 mathematical space considers time as an external controller and the space itself is immobile in...

10 August 2024 6,678 14 View

If Banks do not provide credit facility, what are the options available for FPOs and impact on producer’s income?

10 August 2024 8,198 5 View

Controlling for pupil light reflex when analyzing pupil size time course?

I used eye tracking to examine how participants from two different populations (A and B) react to an image. Participants in population A exhibit larger pupil sizes over time, but they also have...

10 August 2024 3,229 0 View

What are a “Farmers Producer Organization” (FPO) and its essential features?

10 August 2024 477 5 View

Strugglling with m6A dot blot any suugesstion ?

I have been doing the m6A dot blot for a while with no improvement, I am extracting the RNA, and I can see the dots although the three biological replicas give a different reading on the memberan...

10 August 2024 8,539 5 View

Do interactions between biosphere, carbon cycle, & water cycle impact global warming & interaction between atmosphere & hydrosphere?

How do interactions between the biosphere, the carbon cycle, and the water cycle impact global warming and interaction between the atmosphere and the hydrosphere?

09 August 2024 3,291 2 View

How to get moment output in Abaqus Standart?

I have input a moment load in module load Abaqus, i put my moment load on the node surface (using reference point). I have define moment in history output and make a set for moment too. But the...

08 August 2024 4,831 4 View

How is energy cycled through the Earth's climate system and how do matter cycle and energy flow through the rock cycle?

08 August 2024 8,162 0 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

Which Scopus Journal provides the most affordable fees?

"PUBLISHING IN A SCOPUS JOURNAL" Researchers are now at a cross road. The critical need to publish in a Scopus or ISI, etc journal is ever vital. Journal Publication fees must be submitted....

10 August 2024 8,621 1 View

1. If I can quantize the atom using this hyperbolic spiral and classical physics, could nature do the same?

If we map as a continuous motion an ionising electron (beginning its journey at n=1) in an H atom, a specific hyperbolic spiral appears (see animation). When we solve this spiral formula, we find...

07 August 2024 5,343 2 View

Could you please suggest methods to compare free protein and immobolized protein binding properties?

I have an antibody binding generic protein and I need to compare its activity in a free and immobolized form. I understand that there are a number of methods to determine Kd value of a free...

05 August 2024 5,311 0 View

What researches are there for satisfaction level of the hospital attachment for student nurses ?

I am doing a study on nursing students satisfaction on the hospital attachment whether they are satisfied with the clinical attachment or not. I need more research studies on this

03 August 2024 6,985 2 View

I need the datasets of Microgrid for system identification?

Hi I am working on data driven model of the microgrid, for that, i need the reliable datasets for the identification of MG data driven Model. Thanks

02 August 2024 5,748 4 View

Where can I find free research instruments for Nursing?

for eg. transition shock scale

01 August 2024 5,998 0 View

Why do open and free science in a world where science is not open and free?

Because I have realized that the world tends more and more to do open and free science and there is a trend more and more to choose free databases, free tools and open access platforms.

01 August 2024 10,046 1 View

Posthoc test lettering in JAMOVI?

Does anyone know of a module for the JAMOVI software that is capable of generating mean separations using the classic letters based on post hoc results (e.g., Tukey test)? If, as I believe, such...

31 July 2024 3,333 4 View

David Morse

Hello Serban,

As rated quality of wifi is likely to be an ordinal score, then, strictly speaking, I don't believe you'll be able to talk about x% improvement in that variable. As well, I'm not sure I understand how it is that you'll require control and test groups for an existing data set. If attention is reserved for the explanatory power of rated wifi service, after all other relevant variables/features have been accounted for, then I think you'd likely want to compare: (a) log likelihoods; or (b) change in satisfaction status; or (c) adjusted odds ratios of satisfaction between two models:

1. An explanatory model that has all relevant variables _except_ wifi rating;

2. Model 1 with wifi rating added.

That will come closest to giving you the "marginal" or value-added contribution of wifi rating as regards satisfaction status.

Apologies if I've misunderstood your query.

Good luck with your work.

Serban Anghel

Hello David,

Thank you for your inputs!

Regarding your doubts, I can explain a little bit better the rationale behind the control & test groups.

After applying the decision tree and extracting the features’ importance ( = which service separates the best the satisfied passengers from the non-satisfied ones) I received some feedback from some of the professors in the committee. They were asking about the level of confidence that we can have in these results. Investigating further (statistics are not one of my strong points as I’m an aerospace structural engineer by studies) I understood that this level of confidence arises from hypothesis testing.

That’s why I am trying to find a way to generate these control & test groups from the dataset that I already have - in order to test the hypothesis that indeed, improving the percieved quality of the wifi service results in a larger number of satisfied passengers.

I hope it’s clearer now. If you think that there are other methods from which I can obtain the confidence level in my results, please let me know!

Again, many thanks!

Babak Jamshidi

If I got your question correctly, there is no need to determine the control and test groups. It is enough to separate cases into different disjoint groups.

Hello again Serban,

OK, so the usual nomenclature is "training" (or model building) and "test" (or validation) groups. The training data set is used to develop the model, then the model's accuracy/efficacy is ascertained by applying it to the test/validation sample.

With a large enough initial sample, you could randomly select some fraction of that to serve as the training/model building sample, and hold the remainder out to use as the test/validation sample. There are fancier schemes (so-called k-fold models) that data mining folks like to use, but this is the basic framework.

Choose the initial sample size to be sufficiently large to allow a high degree of precision (and, stability: all other things equal, sample results are less volatile across large samples than across moderate or small samples). It's up to you to declare what degree of precision you'd like for any parameter estimates deriving from model building, and to select the training sample size accordingly.