Action Set Based Policy Optimization for Safe Power Grid Management

by   Bo Zhou, et al.

Maintaining the stability of the modern power grid is becoming increasingly difficult due to fluctuating power consumption, unstable power supply coming from renewable energies, and unpredictable accidents such as man-made and natural disasters. As the operation on the power grid must consider its impact on future stability, reinforcement learning (RL) has been employed to provide sequential decision-making in power grid management. However, existing methods have not considered the environmental constraints. As a result, the learned policy has risk of selecting actions that violate the constraints in emergencies, which will escalate the issue of overloaded power lines and lead to large-scale blackouts. In this work, we propose a novel method for this problem, which builds on top of the search-based planning algorithm. At the planning stage, the search space is limited to the action set produced by the policy. The selected action strictly follows the constraints by testing its outcome with the simulation function provided by the system. At the learning stage, to address the problem that gradients cannot be propagated to the policy, we introduce Evolutionary Strategies (ES) with black-box policy optimization to improve the policy directly, maximizing the returns of the long run. In NeurIPS 2020 Learning to Run Power Network (L2RPN) competition, our solution safely managed the power grid and ranked first in both tracks.


page 1

page 2

page 3

page 4


Reinforcement Learning for Resilient Power Grids

Traditional power grid systems have become obsolete under more frequent ...

Hierarchical Decision Making In Electricity Grid Management

The power grid is a complex and vital system that necessitates careful r...

PowRL: A Reinforcement Learning Framework for Robust Management of Power Networks

Power grids, across the world, play an important societal and economical...

Winning the CityLearn Challenge: Adaptive Optimization with Evolutionary Search under Trajectory-based Guidance

Modern power systems will have to face difficult challenges in the years...

A Prescriptive Dirichlet Power Allocation Policy with Deep Reinforcement Learning

Prescribing optimal operation based on the condition of the system and, ...

Optimizing Empty Container Repositioning and Fleet Deployment via Configurable Semi-POMDPs

With the continuous growth of the global economy and markets, resource i...

Reinforcement Learning for Electricity Network Operation

This paper presents the background material required for the Learning to...

Please sign up or login with your details

Forgot password? Click here to reset