Assume one can perform measurements of an unknown quantity $\theta$ as $$y = \theta + \epsilon(t),$$ where $\epsilon(t) \sim \mathcal{N}(0,1/t)$ is the measurement error when a time $t$ was spent to collect the observation.
Now, assume that we can actually repeat this several times, but that the observations might be correlated. The goal is to optimally split $T=1$ hour of computation time between observations.
The thing I have in mind is that $\epsilon(t)$ should in fact be deterministic, but we try to model it by a (non-stationary) Gaussian process: $$\epsilon(t) = \frac{1}{\sqrt{t}} Z(t),$$ where $Z(t)$ is GP of unit variance. For simplicity, let us assume $$\operatorname{cov}\big(Z(t_1),Z(t_2)\big) = e^{-(t_1-t_2)^2}.$$
If two observations are allowed, with a time $t$ spent on the first observation, and $1-t$ on the second one, then we have $$ \left[\begin{array}{c}y_1\\y_2\end{array}\right]\sim \mathcal{N}\left( \mathbf{1}\theta,\, \Sigma(t) \right), \ \mathrm{where}\ \Sigma(t)= \left[\begin{array}{cc}\frac{1}{t} & \frac{1}{\sqrt{t(1-t)}} e^{-(1-2t)^2}\\ \frac{1}{\sqrt{t(1-t)}} e^{-(1-2t)^2} & \frac{1}{1-t}\end{array}\right] $$ and $\mathbf{1}$ is the vector of all ones. The variance of the BLUE for $\theta$ is $\big(\mathbf{1}^T \Sigma(t)^{-1} \mathbf{1}\big)^{-1}$, so we want to select $t\in[0,1]$ that maximizes the quantity $\rho(t):= \mathbf{1}^T \Sigma(t)^{-1} \mathbf{1}$. Curiously, the suppremum is attained when $t\to 0^+$ (or when $t \to 1^-$), with $\rho(t)\to 1/(1-e^{-2})\simeq 1.156$. This indicates a non-continuous behaviour around $t=0$ and $t=1$, where there is a single observation of unit variance, so we would define $\rho(0)=\rho(1)=1$.
Even worse: assume we can observe $y_1$ during a time $t=\varepsilon$, and y_2 during a time $t=2\epsilon$ (for a small $\epsilon>0$). Then, both $y_1$ and $y_2$ have a huge variance, but they are extremely correlated (Pearson correlation factor is $\rho=e^{-\epsilon^2}$). The variance of the BLUE is equal to $$ \frac{-\sqrt{2}(e^{-2\epsilon^2}-1)}{\epsilon(3\sqrt{2}-4e^{-\epsilon^2})},$$ which tends to 0 when $\epsilon\to 0$, so $\theta$ can be recovered with arbitrary precision, and furthermore for a vanishing computational effort !!! For information, in this situation the BLUE for $\theta$ is $$\hat{\theta} = \frac{e^{-\epsilon^2}\sqrt{2}-1}{2e^{-\epsilon^2}\sqrt{2}-3}\, y_1 + \frac{e^{-\epsilon^2}\sqrt{2}-2}{2e^{-\epsilon^2}\sqrt{2}-3}\, y_2 \simeq -2.4142 y_1 + 3.4142 y_2.$$
So, based on these observations:
Do you have an intuitive explanation for this phenomenon? How could two experiments with weights $t_1=\epsilon$ and $t_2=1-\epsilon$ be much better than a single experiment with weight $t=1$? My guess is that assuming that the covariance kernel of $Z(t)$ is known is an extremely strong assumption, but I find counter-intuitive that information on the correlation between two observations can give an arbitrary good precision, even though both observations have a huge variance !
Can you think of a better model, with more realistic assumptions, that allows us to optimally split one unit time of computation ?