Data Collection Methodology

A detailed account of data sources and collection methods

This page documents the complete data collection process for each visualisation in the project, including sources, technical challenges, and workarounds employed.

Academic Performance Gap

GCSE and A-level results by institution type (state-funded vs. independent schools) were obtained from Ofqual's official analytics dashboards, covering academic years 2019–2025.

University Access Gap

University progression data by institution type was accessed through the Department for Education's statistical service, showing progression rates to high-tariff institutions.

The Rising Cost of Elite Education

Historical fee data for the Clarendon Schools was compiled from multiple sources. Initial data (1929–2018) came from a Strathclyde University dataset, which I filtered to include only the nine Clarendon Schools.

Technical Challenge: To maintain consistent 10-year intervals, I needed 2020 data instead of 2018. This was easier for some schools than others. I primarily used the Wayback Machine to access archived school websites. When school website snapshots were unavailable, I used archives of MyTopSchools.co.uk to find listed fees.

All historical fees were adjusted for inflation using Bank of England historical multipliers to convert values to 2025 prices.

Bursary Provision at Clarendon Schools

Bursary statistics required manual collection from individual school sources. I first checked school websites for published statistics; where unavailable, I searched annual reports and charity commission filings. Annual fee income data came from respective Charity Commission records.

Individual School Sources:

Geographic Distribution of Independent Schools

Independent school locations across England were obtained from the Department for Education's Get Information About Schools (GIAS) service, which provides comprehensive establishment data including coordinates and institutional characteristics.

Technical Challenge: The GIAS data provided coordinates in British National Grid format (Eastings and Northings) rather than latitude/longitude. I had to convert all coordinates to WGS84 format for web mapping compatibility using Python's pyproj library.

London's Concentration and Deprivation

To visualise the relationship between independent school clustering and area deprivation in London, I needed deprivation scores at LSOA (Lower Layer Super Output Area) level alongside corresponding geographic boundaries.

Technical Challenge: I struggled to find an LSOA map of London with the correct geographic boundaries. Fortunately, Josh had already compiled this data, which I was able to use for the visualisation.

Geographic Correlation Analysis

To examine the relationship between independent school density and state school performance, I needed GCSE attainment data at London borough level.

Multiple Linear Regression Analysis

To control for socioeconomic confounders in the correlation analysis, I collected borough-level data on household income, educational qualifications, house prices, and deprivation indices from multiple London datasets.

Index of Multiple Deprivation scores were sourced from the same London Datastore dataset used for the deprivation heat map.

← Back to Project