# Hypothesis testing for not identically distributed random variables conditioned on the...

Discussion in 'Mathematics' started by ECR, Aug 1, 2020 at 8:03 PM.

1. ### ECRGuest

I encountered the following problem (I give more details of the problem at the end of the post) and I am trying to figure out the best way of performing a null hypothesis testing. I looked for similar questions (like this) but it does not fit exactly my problem.

I have a random vector \$X = (X_1,...,X_N)\$ of \$N\$ random binary variables, not necessarily independent and non identically distributed. These \$N\$ variables are divided into two subsets: \$A\$ with \$N_A\$ random variables and \$B\$ with \$N_B\$ variables (\$N_A + N_B = N\$), so I can also write \$X = (X_A,X_B)\$. I know the marginal distribution of each of the binary variables, as well as the first and second moments of the random vector.

Now I consider another random vector \$Y = (Y_1,...,Y_N) = (Y_A, Y_B)\$, from which I can only sample in two steps: first sample \$Y_A\$ (obtaining some string \$(a_1,...,a_{N_A})\$), and then sample \$(Y_B | Y_A = (a_1,...,a_{N_A}))\$. The null hypothesis is that \$Y\$ follows the same distribution as \$X\$.

The problem that arises here is that the set of possible outcomes for \$A\$ is too large, which means that the probability to obtain the same \$a= (a_1, ..., a_{N_A})\$ is negligible. Thus, the distribution of the random variables in subset \$B\$ changes in each iteration. Since I cannot repeat the sampling under identical conditions I cannot use the usual central limit theorem to approximate the experimental mean by a Gaussian and perform typical Gaussian hypothesis tests.

You can imagine this as having \$N\$ biased coins, each bias being different, and the coins may not be independent. First I throw \$N_A\$ of the coins, which conditions the possible outcomes of the second set \$B\$.

How can I test my null hypothesis under these restrictions?

More details of the problem: I am dealing with a problem in quantum mechanics, having a state of \$N\$ spins that might be entangled (thus non independent variables). The data corresponds to measuring part of the system first (subsystem \$A\$), thus collapsing the whole state and conditioning the possible outcomes of the rest of the system (subsystem \$B\$). Because the set of possible outcomes for subsystem \$A\$ is very large and because when I measure I destroy the state, sampling two times subsystem \$A\$ and obtain the same result is highly unlikely.

Thank you very much in advance! Any idea or suggestion is highly appreciated!