| dc.description.abstract | As we plan tomorrow’s electricity system, we face fundamental questions: where should new power plants go, which technologies deserve investment, and how much transmission is enough? These decisions are the domain of Capacity Expansion Planning (CEP), a class of optimization models that guide long-term infrastructure investments in power systems. To be realistic, CEP models must capture fine-grained spatial and temporal variations because demand varies by city and climate, while wind and solar output depend on weather patterns that shift hour by hour and location by location. But representing the system with thousands of time steps and hundreds of nodes makes the optimization problem computationally too large to solve.
This thesis addresses the core question: how can spatial and temporal aggregation in CEP models be designed to preserve planning-relevant patterns that drive investment decisions? Existing approaches often treat aggregation as a neutral preprocessing step, relying on heuristics like political boundaries or geographic proximity. In contrast, we propose a task-aware pipeline that treats aggregation as an integral modeling decision, explicitly aligned with planning objectives.
The approach builds a composite similarity metric that blends diverse planning-relevant signals, including, but not limited to, duration curves, ramping behavior, and spatial correlation, and uses k-medoids clustering to define spatial zones. Temporal aggregation is then applied to daily system-wide profiles, selecting representative days that maintain cross-zonal interactions. The result is a reduced spatio-temporal dataset fed into a CEP model. The resulting investment decisions are re-evaluated at full resolution to evaluate their feasibility and real cost.
Experiments on a New England case study show the pipeline consistently outperforms common baselines like political boundaries, geographic proximity, or capacity factor statistics. Among 50 feature weightings, the best design reduces system cost by 13% compared to heuristics. Correlation-based features drive the best results, while raw amplitude and geographic location often degrade performance when used alone. | |