0
$\begingroup$

Assume one can perform measurements of an unknown quantity $\theta$ as $$y = \theta + \epsilon(t),$$ where $\epsilon(t) \sim \mathcal{N}(0,1/t)$ is the measurement error when a time $t$ was spent to collect the observation.

Now, assume that we can actually repeat this several times, but that the observations might be correlated. The goal is to optimally split $T=1$ hour of computation time between observations.

The thing I have in mind is that $\epsilon(t)$ should in fact be deterministic, but we try to model it by a (non-stationary) Gaussian process: $$\epsilon(t) = \frac{1}{\sqrt{t}} Z(t),$$ where $Z(t)$ is GP of unit variance. For simplicity, let us assume $$\operatorname{cov}\big(Z(t_1),Z(t_2)\big) = e^{-(t_1-t_2)^2}.$$

  • If two observations are allowed, with a time $t$ spent on the first observation, and $1-t$ on the second one, then we have $$ \left[\begin{array}{c}y_1\\y_2\end{array}\right]\sim \mathcal{N}\left( \mathbf{1}\theta,\, \Sigma(t) \right), \ \mathrm{where}\ \Sigma(t)= \left[\begin{array}{cc}\frac{1}{t} & \frac{1}{\sqrt{t(1-t)}} e^{-(1-2t)^2}\\ \frac{1}{\sqrt{t(1-t)}} e^{-(1-2t)^2} & \frac{1}{1-t}\end{array}\right] $$ and $\mathbf{1}$ is the vector of all ones. The variance of the BLUE for $\theta$ is $\big(\mathbf{1}^T \Sigma(t)^{-1} \mathbf{1}\big)^{-1}$, so we want to select $t\in[0,1]$ that maximizes the quantity $\rho(t):= \mathbf{1}^T \Sigma(t)^{-1} \mathbf{1}$. Curiously, the suppremum is attained when $t\to 0^+$ (or when $t \to 1^-$), with $\rho(t)\to 1/(1-e^{-2})\simeq 1.156$. This indicates a non-continuous behaviour around $t=0$ and $t=1$, where there is a single observation of unit variance, so we would define $\rho(0)=\rho(1)=1$.

  • Even worse: assume we can observe $y_1$ during a time $t=\varepsilon$, and y_2 during a time $t=2\epsilon$ (for a small $\epsilon>0$). Then, both $y_1$ and $y_2$ have a huge variance, but they are extremely correlated (Pearson correlation factor is $\rho=e^{-\epsilon^2}$). The variance of the BLUE is equal to $$ \frac{-\sqrt{2}(e^{-2\epsilon^2}-1)}{\epsilon(3\sqrt{2}-4e^{-\epsilon^2})},$$ which tends to 0 when $\epsilon\to 0$, so $\theta$ can be recovered with arbitrary precision, and furthermore for a vanishing computational effort !!! For information, in this situation the BLUE for $\theta$ is $$\hat{\theta} = \frac{e^{-\epsilon^2}\sqrt{2}-1}{2e^{-\epsilon^2}\sqrt{2}-3}\, y_1 + \frac{e^{-\epsilon^2}\sqrt{2}-2}{2e^{-\epsilon^2}\sqrt{2}-3}\, y_2 \simeq -2.4142 y_1 + 3.4142 y_2.$$

So, based on these observations:

  1. Do you have an intuitive explanation for this phenomenon? How could two experiments with weights $t_1=\epsilon$ and $t_2=1-\epsilon$ be much better than a single experiment with weight $t=1$? My guess is that assuming that the covariance kernel of $Z(t)$ is known is an extremely strong assumption, but I find counter-intuitive that information on the correlation between two observations can give an arbitrary good precision, even though both observations have a huge variance !

  2. Can you think of a better model, with more realistic assumptions, that allows us to optimally split one unit time of computation ?

$\endgroup$
7
  • $\begingroup$ I don't follow your model. I suspect that you are accidentally assuming something strange and that is why you are getting strange results. If you are sure you haven't made a mistake, please elaborate on how you get a better estimate of $\theta$ when $\epsilon = 10^{-10}$, say. What do you do with the observations $y_1=50,000, y_2=3$? $\endgroup$ Aug 29, 2016 at 14:09
  • $\begingroup$ Yes, I agree this is a strange model, but I wonder what a better model could be? The BLUE for $\theta$ is obtained by WLS: $\hat{\theta} = (\mathbf{1}^T \Sigma^{-1} \mathbf{1})^{-1} \mathbf{1}^T \Sigma^{-1} \mathbf{y}$. For a small $\epsilon$ a development up to the order 1/2 gives $\hat{\theta} \simeq y_2 - e^{-1} (y_1-y_2) \sqrt{\epsilon}$. $\endgroup$
    – guigux
    Aug 31, 2016 at 9:24
  • $\begingroup$ So for your values, instead of having $\hat{\theta}=3$, we would obtain $\hat{\theta}\simeq 2.816$. $\endgroup$
    – guigux
    Aug 31, 2016 at 9:26
  • $\begingroup$ The (too) strong assumption is that we know that the Pearson correlation between $y_1$ and $y_2$ is $\rho \simeq e^{-1}$. The conditionned law of $y_2|y1$ is a normal of variance $\simeq 1-\rho^2 < 1$. $\endgroup$
    – guigux
    Aug 31, 2016 at 9:36
  • 1
    $\begingroup$ To me, that confirms your model is wrong, and you should move on if you are actually interested in understanding real problems that could be described as splitting your observation time up with errors that might be correlated. It's like you were modelling the temperature of a cup of tea and your first guess accidentally produced a vertical asymptote instead of a smooth decay. Do you study when nuclear fusion might be happening in the cup of tea or do you correct your model? $\endgroup$ Sep 2, 2016 at 9:55

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge that you have read and understand our privacy policy and code of conduct.