Live Research Data
The system is learning.
This page updates automatically as LoadLens ingests new data, generates forecasts, and grades them against reality. Every number is computed from real PJM grid demand via the EIA Open Data API. Nothing is synthetic. Nothing is cached.
Last forecast: 5 hours ago · Last grading: 1 month ago
Hourly Readings
9,961
413.20833333333 days of data
Forecasts Generated
14,593
Forecasts Graded
720
Ensemble MAPE
19.88%
Lower is better
Current Adaptive Weights
Recomputed before every forecast cycle. Models that predict better earn more weight.
Daily Forecast Accuracy (MAPE)
Pre-Registered Evaluation · Top-Line
From the most recent run of php artisan loadlens:eval against v1 of the protocol on 2,733 hours of held-out PJM Region demand. Full per-regime breakdown lives on the methodology page.
Best Baseline
7.84%
B_PERSIST_168
Advanced Ensemble
14.63%
MAPE
[q10, q90] Coverage
10.1%
Target 80% (off-target — see methodology)
Pre-registered protocol v1 · commit f9d6200 · 6 days ago.
Naive baselines currently outperform our point forecasts on transmission-scale data; that gap and the under-coverage finding above are the v2 work, surfaced openly on /methodology.
Observations From the Data
Observation 1: On real PJM hourly demand data (9,961 readings over 413.20833333333 days), the trend/seasonality model consistently outperforms weather-response and momentum models in baseline operating conditions. The adaptive ensemble correctly identified this and upweighted trend from 40% to 84%.
Observation 2: The weather-response model shows higher error (31.79% MAPE) because it currently uses a simplified temperature-driven model. Future iterations will integrate NOAA forecast data as a feature, expected to significantly reduce weather-model error during temperature-driven regime shifts.
Observation 3: Cross-grid validation on ERCOT data (Texas) shows nearly identical ensemble MAPE — confirming the architecture generalizes without domain-specific tuning.
Open question: Will the adaptive mechanism reverse weights during a genuine regime shift (e.g., a summer heat wave)? The current backtest window does not include one. Longer-running validation will answer this.
AI Research Journal
Generated by Claude · Analyzing model performance · Not making predictions
Trend Model Dominance Masks Late-Night Forecasting Blind Spot
Both grids show extreme forecast errors clustered around midnight hours despite strong trend model performance, suggesting systematic mishandling of overnight load transitions.
The adaptive weighting system has converged on dramatically different strategies across the two grids. PJM places 84% weight on the trend model (MAPE 5.27%) while largely ignoring weather and momentum components. ERCO distributes weights more evenly at 40/35/25, despite trend performing well there too at 6.71% MAPE.
Both grids exhibit a troubling pattern: their worst forecast errors are concentrated in late-night/early morning hours (00:34-01:58), with errors ranging from -48% to -52%. This suggests the ensemble struggles with overnight load transitions regardless of the weighting strategy employed. The actual loads during these failures span a wide range (57-113 GW), indicating the issue isn't simply low-load sensitivity.
The weather and momentum models show consistently poor individual performance (31-34% MAPE) across both grids, yet ERCO maintains significant weight allocation to these components while achieving similar ensemble performance to PJM's trend-dominated approach. This raises questions about whether ERCO's more distributed weighting provides resilience benefits not captured in the MAPE metric.
Open research question: Why do both weighting strategies fail catastrophically during late-night hours, and could the overnight load transition represent a systematic gap in all three underlying models that ensemble weighting cannot compensate for?
Generated 4 minutes ago
Trend Model Dominance Despite Similar Ensemble Performance Across Grids
Both PJM and ERCO grids achieve nearly identical ensemble MAPE (~19.8%), but employ drastically different weighting strategies with trend model showing superior individual performance.
The adaptive ensemble exhibits remarkably consistent performance across disparate grid scales, with PJM (110GW peak loads) and ERCO (60GW peak loads) both achieving ensemble MAPE values within 0.13% of each other (19.88% vs 19.75%). This convergence occurs despite the grids operating at vastly different scales and presumably different load characteristics.
The weighting adaptation reveals fascinating behavioral differences between grids. PJM heavily favors the trend model (83.96% weight) while relegating weather and momentum models to minimal influence (~8% each). ERCO employs a more balanced approach with trend at 40%, weather at 35%, and momentum at 25%. This suggests the adaptive system has identified distinct predictive signal strengths across different grid environments.
Individual model performance shows trend consistently outperforming weather and momentum models across both grids (5.27% vs 31.79%/31.93% for PJM; 6.71% vs 34.11%/31.86% for ERCO). The weather and momentum models show consistently poor individual performance, raising questions about their value beyond ensemble diversification effects.
Worst errors cluster in early morning hours (00:xx and 01:xx timestamps) with systematic under-prediction patterns (-48% to -51% errors). The temporal clustering suggests a systematic bias during overnight load transitions that the ensemble struggles to capture. Research question: What specific overnight load dynamics are causing this consistent failure mode across different grid systems?
Generated 6 hours ago
Adaptive Weighting Reveals Stark Model Performance Differences in Grid Forecasting
The trend model dramatically outperforms weather and momentum models across both grids, with the ensemble adapter weighting it 40-84% despite all models showing systematic underestimation errors during late-night hours.
The data reveals a striking performance hierarchy among our three constituent models. The trend model achieves MAPE values of 5.27% (PJM) and 6.71% (ERCO), while weather and momentum models struggle with MAPEs exceeding 31% on both grids. This 5-6x performance gap suggests fundamentally different model capabilities in capturing load patterns.
The adaptive weighting system responds intelligently to these performance differences. On PJM, where trend model superiority is most pronounced, the system allocates 84% weight to trend and minimizes weather/momentum contributions to ~8% each. On ERCO, the adapter takes a more conservative approach with 40% trend weighting, possibly indicating less consistent trend model performance or different grid characteristics requiring more diverse model input.
All worst forecast errors cluster around midnight to 2 AM timeframes in late March and early April 2026, with systematic underestimation (all negative errors ranging from -48% to -52%). The actual loads during these failures span 57-113 GW, suggesting the errors aren't simply low-load artifacts. This temporal clustering points to a systematic blind spot rather than random forecast failures.
Open research question: Why do all three models simultaneously fail during these specific late-night periods in spring 2026? The consistent timing suggests an external factor—possibly seasonal transitions, daylight saving time effects, or economic patterns—that none of our current models adequately capture.
Generated 12 hours ago
Methodology & Data Provenance
Data source: EIA Open Data API v2, dataset electricity/rto/region-data, respondent PJM, type D (demand), hourly frequency.
Date range: Apr 12, 2025 to May 30, 2026.
Backtest methodology: Walk-forward validation. At each hour, the model sees only prior data. No future leakage.
MAPE calculation: Mean Absolute Percentage Error = mean(|forecast - actual| / actual × 100) across all graded hours.
Adaptive weighting: Inverse-error weighting on the most recent 48 graded forecasts per model.
Update cadence: Every 2 hours via cron. This page reflects the most recent pipeline run.