Assessment of reinforcement learning algorithms for nuclear power plant fuel optimization

Seurin, Paul; Shirvan, Koroush

dc.contributor.author	Seurin, Paul
dc.contributor.author	Shirvan, Koroush
dc.date.accessioned	2025-06-24T15:28:25Z
dc.date.available	2025-06-24T15:28:25Z
dc.date.issued	2024-01-29
dc.identifier.uri	https://hdl.handle.net/1721.1/159669
dc.description.abstract	The nuclear fuel loading pattern optimization problem belongs to the class of large-scale combinatorial optimization. It is also characterized by multiple objectives and constraints, which makes it impossible to solve explicitly. Stochastic optimization methodologies including Genetic Algorithms and Simulated Annealing are used by different nuclear utilities and vendors but hand-designed solutions continue to be the prevalent method in the industry. To improve the state-of-the-art, Deep Reinforcement Learning (RL), in particular, Proximal Policy Optimization is leveraged. This work presents a first-of-a-kind approach to utilize deep RL to solve the loading pattern problem and could be leveraged for any engineering design optimization. This paper is also to our knowledge the first to propose a study of the behavior of several hyper-parameters that influence the RL algorithm. The algorithm is highly dependent on multiple factors such as the shape of the objective function derived for the core design that behaves as a fudge factor that affects the stability of the learning. But also an exploration/exploitation trade-off that manifests through different parameters such as the number of loading patterns seen by the agents per episode, the number of samples collected before a policy update , and an entropy factor that increases the randomness of the policy during training. We found that RL must be applied similarly to a Gaussian Process in which the acquisition function is replaced by a parametrized policy. Then, once an initial set of hyper-parameters is found, reducing and until no more learning is observed will result in the highest sample efficiency robustly and stably. This resulted in an economic benefit of 535,000 - 642,000 $/year/plant. Future work must extend this research to multi-objective settings and comparing them to state-of-the-art implementation of stochastic optimization methods.	en_US
dc.publisher	Springer US	en_US
dc.relation.isversionof	https://doi.org/10.1007/s10489-023-05013-5	en_US
dc.rights	Creative Commons Attribution-Noncommercial-ShareAlike	en_US
dc.rights.uri	http://creativecommons.org/licenses/by-nc-sa/4.0/	en_US
dc.source	Springer US	en_US
dc.title	Assessment of reinforcement learning algorithms for nuclear power plant fuel optimization	en_US
dc.type	Article	en_US
dc.identifier.citation	Seurin, P., Shirvan, K. Assessment of reinforcement learning algorithms for nuclear power plant fuel optimization. Appl Intell 54, 2100–2135 (2024).	en_US
dc.contributor.department	Massachusetts Institute of Technology. Department of Nuclear Science and Engineering	en_US
dc.relation.journal	Applied Intelligence	en_US
dc.eprint.version	Author's final manuscript	en_US
dc.type.uri	http://purl.org/eprint/type/JournalArticle	en_US
eprint.status	http://purl.org/eprint/status/PeerReviewed	en_US
dc.date.updated	2025-03-27T13:48:26Z
dc.language.rfc3066	en
dc.rights.holder	The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature
dspace.embargo.terms	Y
dspace.date.submission	2025-03-27T13:48:26Z
mit.journal.volume	54	en_US
mit.license	OPEN_ACCESS_POLICY
mit.metadata.status	Authority Work and Publication Information Needed	en_US

Files in this item

Name:: 10489_2023_5013_ReferencePDF.pdf
Size:: 21.54Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

MIT Open Access Articles

Show simple item record