J.D. Mooney1, E. Holmes2, P. Christie1
1 Scottish Centre for Infection & Environmental Health, Glasgow, United Kingdom 

Seasonal outbreaks of influenza exert a considerable burden on health services, and are notorious for their variability from year to year. Making use of historical data from the Scottish sentinelle surveillance since 1972, a potential candidate model has been derived based on simple linear regression. It was applied with a measure of success in the 1999–2000 winter season.  
Introduction
Influenza outbreaks are notoriously difficult to predict, even when a seasonal outbreak is underway, both in their likely time course and severity at an individual and a population level (1). Even two subsequent annual outbreaks caused by an identical strain of the virus can have very different impacts on both the timing and the levels of resulting illness in the population. The major real time indicator of influenza activity in Scotland comes from the sentinel network of volunteer general practices (2). This spotter scheme currently involves 90 practices in 12 health board areas covering a total of 10% of the Scottish population. Participating practices submit weekly totals for the approximate number of consultations for ‘flulike illness’ from which can be derived a consultation rate per 100,000 based on population projections from the sample reporting. Although essentially a voluntary setup, in which not all health boards are represented, the fluspotter network has proven to be a consistently reliable early indicator of the onset of seasonal influenza illness, since the scheme’s inception in 1972. As well as serving to illustrate the wide between season variability of influenza outbreaks in both timing and magnitude, an examination of cumulative plots for spotter data from past seasons reveals classic sigmoid curves (see Figure 1). It was postulated that the rate of increase at the midpoint of the outbreak, where the rise in reporting for flu like illness is greatest, may be used to predict the likely total number of cases for that season as estimated by the cumulative flu spotter totals. From the cumulative plot, the best approximation that can be measured for the midpoint of a seasonal outbreak would be the maximum rate of increase between any two consecutive weeks. Methods Data on consultations for influenza like illness was available from Scottish GP spotter practices for the years 1972 to 1999. Estimates for the total numbers of cases seen in each week were derived during the flu spotter season (weeks 40 to 20 of the following year) by multiplying the overall Scottish rate per 100 000 by 51.2 (population 5.12 million). The differences between the numbers of cases one week and the next week were calculated for each week of the season and the maximum increase for each year was noted. The dataset was logtransformed and a linear regression model was fitted to the total number of cases vs. the maximum increase seen for each season between any two consecutive weeks. A 95% prediction interval was calculated for the expected total numbers of cases dependent on the maximum increase. The resulting model was then used to provide weekly estimates of the total numbers of expected cases by week 20 during the ongoing flu seasons of 1999–2000 and 2000–2001. Results Performing a simple linear regression with the total estimated cumulative cases for each season versus the maximum increase (d) (corresponding to the sharpest rise in the rate) (both log transformed) gives rise to a significant positive correlation (p < 0.005, R2 = 72%), which can be described as follows (with 95% prediction interval*) (Figure 2): log (expected total) = 7.5134 + 0.4693 x log (max. increase) Giving Expected total = exp(7.5134) x max increase 0.4693 Upper / lower PI = exp (7.1534+/1.96*0.1998) + max increase 0.4693 [*95% PI based on the residual standard deviation about the fitted line]. Application of the model The utility of the model was then investigated for the winter flu season of 1999/2000. The sharpest increase in the GP spotter rates occurred between week 52 and 53 and gave rise to an expected total of 169 057 consultations with 95% prediction interval ([114 277–250 096]) for the whole season. At the end of the season, the actual estimate based on cumulative figures from week 40 to week 20 was 175 787, less than 5% difference from the predicted total. Since there is no way of knowing in advance what the maximum change will be, the estimate of total likely consultations was revised weekly throughout the season, based on the extent of change over the previous week (see appendix 3). The continuously revised estimate made it possible to expect by week 53 that 1999–2000 was likely to be more severe than the flu season of 1998–99 with a probability of 84% (based on the standard deviation of the prediction interval), where the total estimated cases at the end of the season was 137 336. A revised model (incorporating the results of the 1999–2000 season – revised expression: Log (expected total) = 7.526 + 0.468 x log (max. increase), giving: Expected total = exp(7.526) x max increase 0.468 was then applied during the 2000/01 season, a winter that saw the lowest flu activity since 1972, and spotter rates that rarely exceeded the baseline threshold level of 50 consultations per 100000 population (5). Even at this very low level of activity, the final cumulative total for consultations (54 033) was still within the predicted range (Predicted total = 46 556; 95% PI = 7089,305775). Discussion Seasonal outbreaks of influenza are difficult to predict for a number of reasons. The continual antigenic changes between seasons, the introduction of new viral strains, the high proportions of subclinical infections and continuing controversy over factors which affect transmission all combine to frustrate attempts to model or define a ‘typical’ influenza outbreak. Since even modest influenza outbreaks can exert additional pressures on health services however, the benefits for planning and healthcare purposes of a model that is simple to apply and has some capacity to predict the course of an ongoing outbreak are selfevident. The main drawbacks to the above model as a predictive tool are firstly the very wide prediction intervals which accompany the estimated eventual size of the outbreak and secondly, like all linear regression models, it becomes less reliable at the extreme ends of the range of the source data on which it is based (3). Since in prediction intervals, the scatter of the individual data about the fitted line becomes more directly relevant, they are invariably much wider than the equivalent confidence interval for the fitted values (6). In theory it should be possible also to refine the model with each additional season, although the nature of prediction intervals means again that their likely reduction will be small. The increasing availability of rapid virological testing also makes it possible to identify quickly the underlying virus types that are contributing to an increase in illness presentation (eg: A alone, B alone or A + B). The well established differences in severity and population health impact between A and B strains (7) may mean that introducing interaction terms to the regression, according to the epidemic type as suggested by Dab et al, could improve the predictive capability of the model (8). A model which took account of virus type may also be able to begin to address the likely time course of an ongoing outbreak, often as important a consideration with regard to health service planning as overall population attack rate. Although the limitations of the model prevent its adoption as a definitive predictive tool, its usefulness relates more to the capacity to provide a dynamic weekly revisable estimate of the likely severity of an ongoing flu outbreak. While the current model does not specifically address the timing of any peak, large increases in consulting rates are likely to be followed with higher workloads in secondary health services. Additionally, although consulting patterns are not by any means the only indicator of influenza activity, they are certainly the timeliest and sentinel practice networks like that in Scotland are used widely throughout Europe 9. Variations of the presented model may also therefore be of interest to other countries that have a significant historical dataset. Conclusion Tillet and Spencer have previously highlighted the potential of cumulative totals of GP consultations, among other indicators, for describing the extent of influenza outbreaks in England and Wales (4). The model presented here demonstrates that it is possible to describe the relationship between cumulative total numbers of consultations and the maximum weekly increase for seasonal outbreaks of influenza using simple linear regression, allowing predictions for the eventual size of an outbreak to be revised as the winter season progresses. The wide ranging prediction interval seen during the exceptionally mild influenza season of 2000–01, although in keeping with the diminishing applicability of regression models at the extremes of their range, is probably not a serious practical limitation in that the main use of the model would be to flag up potentially large epidemics as early as possible. The increased availability of rapid virological testing may make it possible to further refine models such as that presented here, on the basis of the type(s) of influenza in circulation in any one season. Annex 1. Winter season
*saisons 99/00 et 00/01 (consultations cumulées d’après les données du système de surveillance) également montrées / 99/00 and 00/01 seasons (cumulative consultations from Spotter data), also shown. Annex 2. Data and table: Regression line and constituent values used for model
Annex 3. Season 1999 / 2000 – Predicted total consultations based on weekly changes
*Total prévu d’après l’augmentation maximale à ce jour. 

References
1. Cliff AD. Statistical modelling of measles and influenza outbreaks. Statistical Methods in Medical Research 1993; (2): 4373. 2. Christie P, Mooney J. Surveillance Report on Flu Spotters data 19992000. SCIEH Weekly Report 2000;34(36):218219 3. Kirkwood B.R. Correlation and Linear Regression. Chapter 9 in Essentials of Medical Statistics. Blackwell Science Ltd. Oxford 1998; p5764. 4. Tillett HE, Spencer IL. Influenza surveillance in England and Wales using routine statistics. Development of ‘cusum’ graphs to compare 12 previous winters and to monitor the 1980/81 winter. J Hygiene 1982 Feb;88(1):8394. 5. Christie P, Mooney J, Smith A. Surveillance Report on Flu Spotters data and SERVIS scheme 20002001. SCIEH Weekly Report 2001; 35(24): 154. 6. Altman D. Relationship between two continuous variables. Chapter 11 in Practical Statistics for Medical research. Chapman & Hall. London 1995; p277234. 7. Monto AS. Individual and community impact of influenza. Pharmacoeconomics 1999;16 Suppl 1:16 8. Dab W, Quenel P, Cohen, JM, Hannon C. A new influenza surveillance system in France: the IledeFrance “GROG”.2. Validity of indicators (198489). Eur J Epidemiol 1991; 7(6):57987. 9. Zambon M. Sentinel surveillance of influenza in Europe, 1997/1998. Eurosurveillance 1998; 3: 2931 