I know that the weak lower semi-continuity of the KL divergence was proved in [1]. If I remember well, the same property is true for any $f$ divergence (with suitable assumptions on the probability space). I am looking for some reference about it.
[1] Posner, Random Coding Strategies for Minimum Entropy, 1975.
Edit.
Here is what I believe a standard definition of $f$ divergences, which includes the case of measures not absolutely continuous to each other. This definition is taken from http://people.lids.mit.edu/yp/homepage/data/LN_fdiv.pdf
Definition 7.1. Let $f:(0,\infty)\to\mathbb R$ be a convex function with $f(1)=0$. Let $P$ and $Q$ be two probability distributions on a measurable space $(\mathcal X, \mathcal F)$. If $P\ll Q$ then the $f$-divergence is defined as $$D_f(P\|Q)=\mathbb E_Q[f(dP/dQ)]$$ where $dP/dQ$ is the Radon-Nikodym derivative and $f(0)=f(0+)$. More generally, let $f'(\infty)=\lim_{x\to 0}xf(1/x)$. Let $R$ be such that $Q\ll R$ and $P\ll R$ (such an $R$ always exists, for instance take $R=\frac{1}{2}(P+Q)$. Then we have $$D_f(P\|Q) = f'(\infty)P(dQ/dR=0)+\int_{dQ/dR>0}\frac{dQ}{dR}f\left(\frac{dP/dR}{dQ/dR}\right)dR\,,$$ with the agreement that if $P(dQ/dR=0)=0$ the last term is taken to be zero regardless of the value of $f'(\infty)$ (which could be infinite).