A detailed account of data sources and collection methods
This page documents the complete data collection process for each visualisation in the project, including sources, technical challenges, and workarounds employed.
GCSE and A-level results by institution type (state-funded vs. independent schools) were obtained from Ofqual's official analytics dashboards, covering academic years 2019–2025.
University progression data by institution type was accessed through the Department for Education's statistical service, showing progression rates to high-tariff institutions.
Historical fee data for the Clarendon Schools was compiled from multiple sources. Initial data (1929–2018) came from a Strathclyde University dataset, which I filtered to include only the nine Clarendon Schools.
Dataset covers years: 1929, 1930, 1940, 1950, 1960, 1970, 1980, 1990, 2000, 2010, 2018
All historical fees were adjusted for inflation using Bank of England historical multipliers to convert values to 2025 prices.
Bursary statistics required manual collection from individual school sources. I first checked school websites for published statistics; where unavailable, I searched annual reports and charity commission filings. Annual fee income data came from respective Charity Commission records.
Individual School Sources:
Independent school locations across England were obtained from the Department for Education's Get Information About Schools (GIAS) service, which provides comprehensive establishment data including coordinates and institutional characteristics.
To visualise the relationship between independent school clustering and area deprivation in London, I needed deprivation scores at LSOA (Lower Layer Super Output Area) level alongside corresponding geographic boundaries.
To examine the relationship between independent school density and state school performance, I needed GCSE attainment data at London borough level.
To control for socioeconomic confounders in the correlation analysis, I collected borough-level data on household income, educational qualifications, house prices, and deprivation indices from multiple London datasets.
Index of Multiple Deprivation scores were sourced from the same London Datastore dataset used for the deprivation heat map.