A large-scale multiple surveillance system for infectious disease outbreaks has been in operation in England and Wales since the early 1990s. Changes to the statistical algorithm at the heart of the system were proposed and the purpose of this paper is to compare two new algorithms with the original algorithm. Test data to evaluate performance are created from weekly counts of the number of cases of each of more than 2000 diseases over a twenty-year period. The time series of each disease is separated into one series giving the baseline (background) disease incidence and a second series giving disease outbreaks. One series is shifted forward by twelve months and the two are then recombined, giving a realistic series in which it is known where outbreaks have been added. The metrics used to evaluate performance include a scoring rule that appropriately balances sensitivity against specificity and is sensitive to variation in probabilities near 1. In the context of disease surveillance, a scoring rule can be adapted to reflect the size of outbreaks and this was done. Results indicate that the two new algorithms are comparable to each other and better than the algorithm they were designed to replace.