Assessing the impact of a health intervention via user-generated Internet content

Vasileios Lampos*, Elad Yom-Tov, Richard Pebody, Ingemar J. Cox

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

18 Citations (Scopus)


Assessing the effect of a health-oriented intervention by traditional epidemiological methods is commonly based only on population segments that use healthcare services. Here we introduce a complementary framework for evaluating the impact of a targeted intervention, such as a vaccination campaign against an infectious disease, through a statistical analysis of user-generated content submitted on web platforms. Using supervised learning, we derive a nonlinear regression model for estimating the prevalence of a health event in a population from Internet data. This model is applied to identify control location groups that correlate historically with the areas, where a specific intervention campaign has taken place. We then determine the impact of the intervention by inferring a projection of the disease rates that could have emerged in the absence of a campaign. Our case study focuses on the influenza vaccination program that was launched in England during the 2013/14 season, and our observations consist of millions of geo-located search queries to the Bing search engine and posts on Twitter. The impact estimates derived from the application of the proposed statistical framework support conventional assessments of the campaign.

Original languageEnglish
Pages (from-to)1434-1457
Number of pages24
JournalData Mining and Knowledge Discovery
Issue number5
Publication statusPublished - 22 Sep 2015

Bibliographical note

Funding Information:
This work has been supported by the EPSRC Grant EP/K031953/1 (“Early-Warning Sensing Systems for Infectious Diseases”). The authors would also like to acknowledge the Royal College of General Practitioners in the UK (in particular Simon de Lusignan) and Public Health England for providing ILI surveillance data.

Publisher Copyright:
© 2015, The Author(s).


  • Gaussian Process
  • Infectious diseases
  • Intervention
  • Search query logs
  • Social media
  • Supervised learning
  • User-generated content


Dive into the research topics of 'Assessing the impact of a health intervention via user-generated Internet content'. Together they form a unique fingerprint.

Cite this