James Jackson, Social Statistics, Lancaster University, 2019 Cohort
This October marks the centenary of the publication of The Mysterious Affair at Styles, the debut novel of Dame Agatha Mary Clarissa Christie. Born in 1890, Agatha Christie was a prolific writer of detective stories. She wrote 66 full-length novels – and several other books under the pseudonym of Mary Westmacott – prior to her death in 1976. She is the best-selling fiction writer of all time.
In the last hundred years, Agatha Christie has deceived generations of readers with her clever plots, subtle clues and array of red herrings. Her mysteries – as with most whodunnits from The Golden Age of Detective Fiction – generally follow the same format. The characters are introduced: doctors, vicars, aristocrats, secretaries, butlers, maiden aunts (and so on). The murder occurs: perhaps poison is administered: strychnine, arsenic, cyanide (and so on); or maybe that classic whodunnit weapon is used: the blunt instrument. This prompts the arrival of the detective, for example, Poirot or Miss Marple, who duly investigates. Then the time comes for the suspects to be gathered together for the denouement. The tension builds! At last, the killer is revealed, who, after initially protesting their innocence, attempts an ill-fated getaway before being apprehended by the faithful police sergeant!
I wondered whether certain characters – stereotypes – in Agatha Christie’s novels were more or less likely to be murderers than others. As a statistics PhD student, I thought a statistical analysis of her novels could reveal whether there are, indeed, any factors that raise the likelihood of being the murderer. Characters’ genders and occupations were considered. The article can be read in the October issue of Significance magazine. The conclusion, unlike one of Agatha Christie’s solutions, is not really that surprising!
But how is this related to my PhD work? Admittedly, not all that much, yet there is a slight connection! My PhD project, an ESRC case studentship with the Office for National Statistics (ONS), based at Lancaster University, is looking at the development of synthetic data methods for administrative databases. Synthetic data are generated by simulating from statistical models fit to some original data. The synthetic data can, potentially, be equally informative as the original data, but the disclosure risk is much lower as the values are not real. This makes it particularly useful for preserving confidentiality, and when legal obligations such as GDPR mean it is not possible to release the original data.
Generalised linear models (GLMs) are typically chosen as synthesis models in synthetic data generation. It was a type of GLM, a logistic regression model, which was chosen to analyse Agatha Christie’s novels. The data used to fit the model reported in the Significance article represented only a fraction of the total data collected. The whole data set can be accessed at GitHub here. There is definitely scope for further research, especially using information relating to each book, such as the murder weapon used and whether the detection was being carried out by Poirot or Marple.
To conclude, on the 23rd October Death on the Nile will be released in cinemas in the UK. The film will see Kenneth Branagh reprise the role of the Belgian sleuth, Hercule Poirot, after his previous appearance in The Murder on the Orient Express. This would provide the perfect opportunity for you to exercise those little grey cells and go head-to head with Poirot!
Read the full article here