Does Private Education Create Advantage or Reflect It?

A data-driven investigation into education and inequality in the UK

Aims

This project investigates whether independent schools in England create educational advantages or simply reflect existing socioeconomic privilege. Using data from over 1,600 independent schools, I examine geographic clustering patterns, historical cost trends, academic performance gaps, and statistical correlations to explore whether the "private school advantage" represents genuine value-added education or geographic sorting of already-privileged families.

The Advantage is Real

Academic Performance Gap

Independent schools consistently outperform state-funded schools at both GCSE and A-level, with gaps of approximately 20–30 percentage points across top grade thresholds. This advantage persists across all years examined.

*Note: Results for 2020 and 2021 reflect centre/teacher-assessed grades during the COVID-19 pandemic when examinations were cancelled.

University Access Gap

Independent school students progress to high-tariff institutions (Russell Group and similar universities) at nearly three times the rate of state school students.

These advantages are undeniable. But access to them is increasingly restricted...

The Cost Barrier

Historical analysis of the nine Clarendon Schools reveals that fees have increased dramatically over 95 years, even after adjusting for inflation.

*Note: Boarding fees exclude Merchant Taylor's (as they dropped boarding). Day fees exclude Eton College, Harrow, and Winchester College (traditionally boarding-focused schools with limited or no day data).

Furthermore, bursary provision varies dramatically. Some elite schools dedicate substantial resources to financial aid, whilst others offer minimal support.

These schools concentrate not just academically, but geographically...

Geographic Clustering and Inequality

Geographic Distribution

England's 1,613 independent schools are unevenly distributed across the country, with significant clustering in London.

This geographic pattern raises a critical question: are independent schools clustered in high-performing areas because they improve them, or because privilege was already concentrated there?

Testing whether private schools create or reflect advantage

Establishing the Correlation

View Regression Notebook

Initial analysis revealed a moderate positive correlation (r=0.380, p=0.032) between independent school density and state school GCSE performance across London boroughs. However, this correlation could reflect two competing explanations:

Hypothesis A (Creation): Independent schools generate local benefits through competition, spillover effects, or raised standards.

Hypothesis B (Reflection): Independent schools locate in already-privileged areas where affluent, educated families—and consequently strong state schools—are already concentrated.

Testing Through Multiple Regression

To distinguish between these explanations, I controlled for four socioeconomic factors: median household income, adults with degree qualifications, average house prices, and the Index of Multiple Deprivation. Does independent school density still predict state school performance after accounting for existing privilege structures?

The Results

The multiple regression revealed a dramatic shift:

  • Simple model: Coefficient = 0.309 (R² = 0.145)
  • Controlled model: Coefficient = 0.111 (R² = 0.504)
  • Change: 64% reduction in effect size

The model's explanatory power more than tripled (R² = 0.504), with socioeconomic factors—particularly deprivation (coefficient: -0.866) and parental education (coefficient: 0.103)—emerging as far stronger predictors than independent school density.

Interpretation

These results support Hypothesis B: independent schools predominantly reflect rather than create geographic advantage. Once we account for family wealth, parental education, and area deprivation, independent school density contributes minimally to state school outcomes. The apparent correlation in the simple model was spurious—both variables are products of underlying privilege structures rather than one causing the other.

Conclusions

Private education both creates and reflects advantage. Independent schools produce superior outcomes (20–30 percentage point gaps, triple university access rates), but multiple regression revealed that controlling for socioeconomic factors reduced the independent school coefficient by 64% whilst explanatory power tripled (R²=0.504). Private schools cluster in already-advantaged areas. Area deprivation and parental education predict outcomes far more strongly than school density. Future research could examine earnings outcomes, control for prior attainment, or investigate finer spatial scales.

Data

Data was sourced from UK government datasets and historical archives. Independent school locations came from DfE's GIAS service. Academic performance data came from Ofqual and DfE statistics (2019–2025). University progression and socioeconomic variables were accessed through DfE and ONS sources. Historical fee data was manually compiled using Wayback Machine web archives when school websites were unavailable.

Key technical matters included converting coordinates from British National Grid to WGS84 latitude/longitude using pyproj, and locating disparate historical sources. Initial ONS API automation was explored, though direct CSV downloads proved more reliable.

View detailed data collection methodology and sources →

Tools & Methodology

Data cleaning and analysis were performed using Python (pandas, scipy.stats, sklearn) within Google Colab environments. Geographic coordinates required transformation from British National Grid to WGS84 for web mapping compatibility. Historical fees underwent inflation adjustment to 2025 prices using Bank of England multipliers. Statistical analysis employed Pearson's correlation and multiple linear regression through scipy and sklearn.

Visualisations were built using Vega-Lite specifications, chosen for their interactive capabilities. Each chart incorporates interactive elements to enhance data exploration. The website is hosted via GitHub Pages, with all code and datasets publicly accessible for full reproducibility.

Key technical challenges included missing historical data, data type mismatches (GSS vs numeric codes), map rendering with 1,600+ points. These were addressed through strategic filtering, explicit type conversions, and layered visualisation approaches.

Open Data & Code

All data, code, and visualisations are publicly accessible for reproducibility and further research.

View GitHub Repository View Regression Notebook