I want to analyze the correlation between two variables using 15,000 hourly observations measured simultaneously. When using the raw data, the observed correlation is weak (R² = 0.1). However, after binning the data using the "Sturges grouping method" and performing correlation analysis on the grouped data, the correlation becomes very strong (R² = 0.95).
1. Which method is more appropriate for correlation analysis in this case?
2. Could binning the data lead to overfitting or misinterpretation of the correlation?
3. What is the best approach to determine the true relationship between the variables?
4. Do you think binning the data is a good way to visualise the type of relationship between two variables?