Given a datastream with an embedded signal that is exceedingly faint, but is statistically significant (that is, a small effect size, but a real effect) how would one go about scaling up the signal of the phenomenon, so it can be more readily detected?

Would it be enough to to have concurrent datastreams, all attempting to measure the data source from different angles, or would that have no effect, or would it perhaps scale the ability to detect an effect only linearly?

As a completely contrived example (so sorry); consider a dice manufacturer, who has ever so slightly weighted dice so that the 1-side comes up more than it should by a tiny margin, if, and only if there is a gust of wind while the dice is in the air.

The effect only shows up when there is wind. This is the event you are trying to detect: the signal in the data.

You can throw one dice per trial. Now imagine I gave you a choice: you can throw your dice three times as fast or you can throw three dice per trial, but you can't do both. Which option would allow you to detect the presence of wind with more accuracy?

It seems to me that by increasing the number of dice thrown per trial you exponentially increase your chances of detecting the presence of wind if it is there because observations can corroborate each other, even though the effect size is so small that they have a high chance of not doing so. But if you scaled those observations up to 1 million dice thrown at once: you'd usually have a distinguishable amount of corroborating evidence and you'd almost always be right, even though the effect size is tiny, right?

Is this intuition correct? Does concurrency give you better signal detection?

Thanks.

More Jordan Miller's questions See All
Similar questions and discussions