Which statistical method should be used to investigate the relationship between two sets of numerical variables?

More Turkay Yildiz's questions See All

Does your institution pay you in case you publish your research in some of the leading journals?

In Turkey, there is such a system of payment (awarding) for supporting researches published in some of the SCI/SSCI indexed journals. See the attachment. I would be happy to hear if there is such...

02 March 2014 7,427 37 View

Do you know any full feature proprietary software free for academic institutions, students, labs, etc.?

The software should not be a trial version. It should be proprietary software and its full features should be freely available for use in academic institutions. For instance, I know of a program...

11 December 2013 3,422 6 View

What is the use of "include a constant" in vector autoregression in time series analysis?

There is an option for including a constant in time series analysis in GRETL. Practically, why and when is it used for?

11 December 2013 6,958 0 View

What does unrestricted and restricted constant mean in VECM time series analysis?

When and why to choose unrestricted, restricted constant, or other options in that combobox? There is such a combobox option for selecting unrestricted and restricted constant in GRETL. From the...

11 December 2013 3,628 2 View

Would you please tell me the common choice of fuel type in your country?

Gasoline and diesel fuel prices are high in many countries. For instance, in Turkey, the 97-octane gasoline is 2.45 US dollar per liter. The Euro-Diesel fuel is 2.23 US dollar per liter. 1 liter...

11 December 2013 6,155 15 View

Which statistical analysis could be performed to measure the progress of countries' performances with few columns?

My data has only 3 columns. These columns are year_2012, year_2010, year_2007. These columns are about measurements performed in various years (2012, 2010, 2007). Rows are countries (~150...

11 December 2013 2,563 4 View

What are the advantages and disadvantages of fully automated container terminals?

Fully automated container terminals are utilizing only unmanned terminal vehicles and using automatic stacking (AS) and retrieval systems (RS) for container storage, and many other automated...

10 November 2013 5,651 12 View

In factor analysis, is it safe to extract factors after knowing the correlation matrix is not positively definite and its determinant equals zero?

My purpose for using factor analysis is to obtain a small number of factors, which account for most of the variability in the 117 variables. I have extracted 16 factors. These 16 factors have...

10 November 2013 2,528 1 View

Which statistical methods should be used to test the distribution of a small or large sample?

In some cases, I encountered different results from some normality tests. For instance, in one of my samples, Shapiro–Wilk normality test indicated that my data varies significantly from a normal...

10 November 2013 6,762 15 View

Is there any alternative to the ridge regression method?

In my model, I experience a multicollinearity problem in least squares estimation. Therefore, I decided to use the ridge regression method. I examined the variance inflation factors (VIFs). In...

10 November 2013 3,224 4 View

How to learn more about SPSS and its Application?

I would like to learn more about SPSS and Its application especially in regards to data analysis. Please suggest me how I can learn more about it. Thank you so much.

11 August 2024 9,101 4 View

Can I base on reverse DNA sequences to perform alignment, convert to amino acids and GenBank submission?

I have reverse sequences (AB1 format), can I base on reverse DNA sequences to perform nucleotide alignment, convert nucleotides to amino acids and deposit the sequence in GenBank database?

11 August 2024 5,138 1 View

Baseline drift in HPLC? What causes this?

Hello, Why do i see this baseline drift when i compare my blank (black) to the sample (blue)? Any suggestions as to why this happened? Thank you!

11 August 2024 3,770 4 View

Text-Communication from the M1 Hand Area using BCI—and then there is Elon Musk?

Willett, Shenoy et al. (2021) have developed a brain computer interface (BCI) that used neural signal collected from the hand area of the motor cortex (area M1) of a paralyzed patient. The...

10 August 2024 7,180 0 View

Has anyone applied Python in the field of textile engineering for data analysis, automation, or smart textiles?

I'm currently exploring the application of Python in textile engineering, specifically in areas like data analysis, process automation, and the development of smart textiles. I'm interested in...

10 August 2024 7,429 2 View

How can I use the cif data obtained from rietveld refinement extracted via gsas2, for microstructural analysis using ETEX software?

09 August 2024 7,718 0 View

How are iso-frequency contours plotted?

Let's say we have a standard, regular hexagonal honeycomb with a 3-arm primitive unit cell (something like the figure attached; the figure is only representative and not drawn to scale). The...

07 August 2024 1,937 1 View

How to prepare the nanoparticle treated fungal sample for Environmental SEM analysis?

A fungal strain was treated with nanoparticles. We want to do an environmental SEM analysis. So could anyone share your views on preparing the sample? Thank you.

07 August 2024 5,307 1 View

How to normalize and take the significance of the MTT OD values with 3 replicates for the same cell-line?

Hi, I have a question about normalizing the MTT OD values for doing the statistical analysis. So, if we have 3 different plates and we call them 3 different replicates, so, first we would...

07 August 2024 8,106 4 View

Why does my protein refolded to beta sheet during thermal denaturation analysis?

Hi! So i attempted to understand a novel protein behavior towards heat application by analyzing its secondary structure change. I subjected the protein to a thermal denaturation analysis using...

06 August 2024 1,989 3 View

Eik Vettorazzi

My suggestion would be: rethink your question, set up some hypotheses about your data and after that ask what tools to use to check these.

Turkay Yildiz

Eik Vettorazzi, thank you for your answer. However, before setting up some hypotheses, I need to know the whole picture of "the relationship" among variables of the first and second set. Such as, which variables of the second set are highly contributing to the some variables of the first set?

Christian Michel

To me it sounds like a case where partial least squares regression (PLS) might be well suited. PLS is a projection method that allows you to evaluate the relationship between a matrix of "explanatory" variabels with a matrix of "response" variabels. For the fundamental idea behing see this Wikipedia article here http://en.wikipedia.org/wiki/Partial_least_squares_regression - Hope this helps & Good luck!

Ehsan Khedive

Dear Turkay, when we have two sets of variables to find the relationship between them the best way is to consider them simultaneously. In other words if we want to find a relationship between, namely, one of your 6 variables and other 16, we should consider the relationship between these 16 and the 5 remaining variables.

I'm sure the best statistical analysis for such data is Canonical Correspondence Analysis (CCA) which consider all the variables simultaneously. you can perform this analysis using PC-Ord or PAST softwares simply.

Regards

Patrick S Malone

Turkay,

I agree that either PLS or Canonical Correlation is a good choice for the question as framed, though I encourage you to strongly consider Elk's suggestion, even after your caveat.

PLS and CCA will probably give you very similar results in this situation, with the biggest difference being that the 1st PLS variates from each set will be maximally correlated. PLS focuses the variate weights on predictive utility to the other set, whereas CCA focuses them (if I recall correctly -- it's been a very long time, so I might not) on variance accounted for in their own set.

Read this for more info on PLS and CCA Difference:

http://www.diva-portal.org/smash/get/diva2:288565/FULLTEXT01.pdf

Ronny, That's a good point. I don't think Turkay has said whether his items would be expected to meet a common factor model (by set), but if they do, that's probably going to be a lot more straightforward and informative than what the rest of us have been talking about!

Albert Sesé

Dear Turkay,

If you don't have a clear hypothesis for the relationship between the two sets of variables, you can also use DATA MINING techniques. These models are very useful to find complex relationships between sets of variables. In some models, no statistical assumptions for the variables' distributions are needed. Artificial Neural Networks, or algorythms like C5, or for example, random forests, could help you to find a model.

All the best.

João Maroco

Canonical correspondence analysis will get you started in seeing what is going with you variables associations!

Thank you all for the constructive answers.

Recently, I have tried several methods on my data with my statistics software. Some data mining techniques appear to give results that are more relevant. However, results are sometimes hard to interpret. As always, statistics methods require steep learning curve.

I have tried using Multivariate Adaptive Regression Splines (MARSplines), Random Forests, Boosted Tree Regression and Regression Tree Models. In my case, these methods appear to be a good choice for analyzing and knowing predictors’ importance, and these methods appear to be handling one continuous dependent variable and several independent variables.

Finally, I decided to use Canonical Correlation Analysis (CCA), which I believe for my analysis, is a good choice for investigating the relationship between two sets of numerical variables. The CCA allowed several continuous dependent variables and several independent variables. At the end, I obtained these equations:

U1 = 0.37×F1_ES – 0.11×F2_ES + 0.07×F3_ES – 0.03×F4_ES + 0.11×F5_ES + 0.19×F6_ES – 0.11×F7_ES – 0.07×F8_ES – 0.46×F9_ES + 0.60×F10_ES – 0.17×F11_ES + 0.32×F12_ES + 0.10×F13_ES – 0.12×F14_ES + 0.13×F15_ES – 0.44×F16_ES

and

L1 = – 0.81×CUST + 0.16×INFR + 0.15×ITRN + 0.57×LOGS – 0.40×TIME + 0.97×TRAC

Canonical R = 0.93 and p

Garumma Tolu Feyissa

Do the sets of these numerical variables measure the same construct? If so you can create factor scores for them. Then you can use one of the statistical procedures of continuous outcomes

Sajjad Ali

First of all you need to check the interrelationship among these variables after that you will be able to check the relation between the two type of variables....

William H. Fisher

This should depend on your research questions. If it's a "fishing expedition" you're bound to turn up significant correlations simply by chance. I think factor analysis or some similar data reduction techniques would be useful, but remember that these are based on correlation and covariance structures, and typically assume linear relatiosnhips. I would select some variables from each set and explore their bivariate relationships graphically before rushing to a multivariate analysis.

Issam Ashqer

T- test can be used to find p-value which the results can be accepted if p-value is less than 0.01