Background. Studies of cancer incidence and early management will increasingly draw on routine electronic patient records. However, data may be incomplete or inaccurate. We developed a generalisable strategy for investigating presenting symptoms and delays in diagnosis using ovarian cancer as an example. Methods. The General Practice Research Database was used to investigate the time between first report of symptom and diagnosis of 344 women diagnosed with ovarian cancer between 01/06/2002 and 31/05/2008. Effects of possible inaccuracies in dating of diagnosis on the frequencies and timing of the most commonly reported symptoms were investigated using four increasingly inclusive definitions of first diagnosis/suspicion: 1. "Definite diagnosis" 2. "Ambiguous diagnosis" 3. "First treatment or complication suggesting pre-existing diagnosis", 4 "First relevant test or referral". Results. The most commonly coded symptoms before a definite diagnosis of ovarian cancer, were abdominal pain (41%), urogenital problems(25%), abdominal distension (24%), constipation/change in bowel habits (23%) with 70% of cases reporting at least one of these. The median time between first reporting each of these symptoms and diagnosis was 13, 21, 9.5 and 8.5 weeks respectively. 19% had a code for definitions 2 or 3 prior to definite diagnosis and 73% a code for 4. However, the proportion with symptoms and the delays were similar for all four definitions except 4, where the median delay was 8, 8, 3, 10 and 0 weeks respectively. Conclusion. Symptoms recorded in the General Practice Research Database are similar to those reported in the literature, although their frequency is lower than in studies based on self-report. Generalisable strategies for exploring the impact of recording practice on date of diagnosis in electronic patient records are recommended, and studies which date diagnoses in GP records need to present sensitivity analyses based on investigation, referral and diagnosis data. Free text information may be essential in obtaining accurate estimates of incidence, and for accurate dating of diagnoses.