At first, the answer might seem obvious: weather monitoring stations all over the country take measurements throughout the day. Surely these tell us what the atmosphere is doing at the moment? Unfortunately, these observations alone do not give us enough information to make a good forecast.
In my previous Snack, I mentioned that weather forecasting models have underlying grids. The grid points are spaced a certain distance apart. In the Met Office’s national model, the horizontal spacing is 1.5km, and there are 70 vertical layers. Roughly speaking, each gridpoint is assigned a single value for temperature, pressure, wind speed, wind direction, and so on. However, as the figure above makes clear, weather stations are far more spread out than this — and we certainly don’t have stacks of weather stations extending into the stratosphere!
Even if we make use of satellite data, there are still far more gridpoints than reliable observations. We somehow need to ‘fill in’ the rest of the data to make the initial state. “No problem,” you might say, “we can just interpolate between the observations that we do have.” Unfortunately, this wouldn’t represent the current state very well. Many weather events, including precipitation and storms, arise from small-scale features in the atmosphere. Simply interpolating between the sparse observations would give an initial state with no small-scale features whatsoever — it would be far too ‘smooth’ — and this would lead to a poor forecast.
How else might we generate the initial state to use in a forecast? If we think our short-term forecasts are already quite accurate, we could just use the output of the previous forecast. But if we just feed the output of one forecast straight into the next, the observations are never taken into account at all. The forecasts will just become more and more wrong!
So we’ve established that we can’t use the observations on their own, and we can’t use the previous forecast output on its own. What about a combination of the two? As a first attempt, consider the following: to generate the initial state for a weather forecast, start with the model output from a previous forecast. However, at any gridpoint that we have an observation for, overwrite the data at that gridpoint with the observed data.
At first, this sounds reasonable. But on closer inspection, there are still big problems with this. For example, suppose that the previous forecast predicted that the temperature in Birmingham would be 16°C. Let’s suppose that the three weather stations actually measure 18°C. How should we interpret this?
It’s possible that the temperature over Birmingham is indeed 16°C, apart from three warm patches that just happen to be around the weather stations. But it seems far more likely that the temperature is actually 18°C in the whole area. However, with this first idea, we would have overwritten the temperature at just three gridpoints and the rest of Birmingham would be unchanged. This doesn’t sound very realistic, so let’s try a better approach.
What if we could rerun our previous forecast, but somehow ‘nudge’ it so that the forecast matches the intermediate observations a bit more closely? This is essentially how real forecasting systems work, and this process is known as data assimilation. I’ll describe one way that this can be done.
Firstly, we need to represent the previous forecast’s “error” (i.e. any differences between the forecast and the actual observations) as a single number. We’ll call this number the Forecast Score.
We can rerun the previous forecast with small changes to its initial state. This will produce a different forecast (covering the last few hours), and we hope the Forecast Score will improve. We can write this, more formally, as an optimisation problem:
Find the initial state for the previous forecast which produces the best Forecast Score, once the generated forecast is compared against subsequent observations.
If that was too abstract, the following analogy may help: suppose an asteroid has been spotted that might hit the Earth. At the moment, we only have a small number of imperfect observations, so there is a lot of uncertainty in the asteroid’s current position, speed and direction. Nevertheless, we can still use the laws of gravity to make a `best guess’ at simulating (forecasting) the asteroid’s future path.
As time passes and more observations come in, we’ll probably find that the observed trajectory deviates from the simulated trajectory. But remember that the asteroid simply follows the laws of gravity! The only way to improve our simulation is to change the asteroid’s initial state — its initial position, speed and direction within our simulation. We shall make small improvements to these, in accordance with the observations. We can then use these improved estimates to make a more accurate forecast of the asteroid’s future path.
The method behind this tweaking and nudging of the initial state is beyond the scope of this Snack, but it mostly relies on two mathematical devices: firstly, the Newton-Raphson method for finding roots (zeros) of a function and, secondly, some wizardry known as the adjoint method. The previous forecast is then run again, but with this ‘optimised’ initial state, and the end result is used as the initial state for the current forecast.
To generate the initial state for the current forecast, we need to run the previous forecast several times. In fact, most of the computation time is spent on determining the initial state — the state of the atmosphere right now — rather than on forecasting the future. There’s little point forecasting the future if we can’t forecast the present!
Unlike weather forecasts, climate simulation models are not usually initialised with the current state of the planet. We know that we can’t usefully forecast the day-by-day weather years ahead. If we tried, anything beyond a few weeks into the future would be a “representative” weather sequence rather than something we would trust on a daily timescale. Climate simulation models care more about the statistics of these representative weather sequences — their averages and extremes — and how these statistics change when other parameters (especially the atmospheric CO2 concentration) are varied. To generate a “representative” weather sequence, any reasonable initial state will do — it doesn’t have to be the state observed in the real world. Climate models are often initialised in the year 1850 with some very simple initial state. They then generate a representative weather sequence up to the present and on into the future.
A recurring topic in the press has been the slowdown, or so-called “pause”, in the increase of surface temperatures. One criticism is that climate models have generally not reproduced this. However, this is totally unsurprising if the models are not initialised with the current state! In addition, the Technical Summary report  contains the following:
Unlike the CMIP5 historical simulations… some CMIP5 predictions were initialized from the observed climate state during the late 1990s and the early 21st century. There is medium evidence that these initialized predictions show a GMST [Global Mean Surface Temperature] lower by about 0.05°C to 0.1°C compared to the historical (uninitialized) simulations and maintain this lower GMST during the first few years of the simulation…
Overall, there is medium confidence that initialization leads to simulations of GMST during 1998-–2012 that are more consistent with the observed trend hiatus than are the uninitialized CMIP5 historical simulations, and that the hiatus is in part a consequence of internal variability that is predictable on the multi-year time scale.
Maybe we should be initialising climate models properly, after all?
 Simon Bell, Dan Cornford, and Lucy Bastin. The state of automated amateur weather observations. Weather, 68(2):36–41, 2013.
 Stocker, T.F., D. Qin, G.-K. Plattner, L.V. Alexander, S.K. Allen, N.L. Bindoff, F.-M. Bréon, J.A. Church, U. Cubasch, S. Emori, P. Forster, P. Friedlingstein, N. Gillett, J.M. Gregory, D.L. Hartmann, E. Jansen, B. Kirtman, R. Knutti, K. Krishna Kumar, P. Lemke, J. Marotzke, V. Masson-Delmotte, G.A. Meehl, I.I. Mokhov, S. Piao, V. Ramaswamy, D. Randall, M. Rhein, M. Rojas, C. Sabine, D. Shindell, L.D. Talley, D.G. Vaughan and S.-P. Xie, 2013: Technical Summary. In: Climate Change 2013: The Physical Science Basis. Contribution of Working Group I to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Stocker, T.F., D. Qin, G.-K. Plattner, M. Tignor, S.K. Allen, J. Boschung, A. Nauels, Y. Xia, V. Bex and P.M. Midgley (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA.