Author ORCID Identifier
https://orcid.org/0009-0009-1224-8939
Defense Date
2026
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Computer Science
First Advisor
Changqing Luo
Second Advisor
Kemal Akkaya
Abstract
Deep reinforcement learning (DRL), combining reinforcement learning and high-performance function approximations such as deep neural networks (DNN), is a powerful approach to solving complex sequential decision-making problems. However, due to the complex solution space of the sequential decision-making problems and the inefficient design of the DRL algorithms, DRL algorithms usually require a prohibitively large number of data samples to train effective strategies. Consequently, it is difficult to apply these DRL algorithms to complex real-world problems that require high costs to collect a large volume of data samples. This dissertation proposes new mechanisms to address this sample inefficiency issue, realizing sample-efficient DRL algorithms. Specifically, this dissertation first presents a FeedbAck-based Decision-mAking mechanism (FADA) that utilizes feedback from the critic for decision calibration to improve the sample efficiency of off-policy actor-critic DRL algorithms. Then, it presents a Quality-Aware Experience Exploitation scheme (QA2E), which selectively exploits simulated experiences based on their varying quality, to enhance the sample efficiency of model-based policy learning. Finally, it presents a Value-guided Search-to-Imitation framework (VSI), which performs value-guided imitation-based policy improvement to enhance the sample efficiency of off-policy actor-critic DRL. Extensive experiments have been conducted to evaluate the proposed mechanisms on a set of continuous control tasks in DeepMind Control Suite, and experimental results have shown their effectiveness in improving the sample efficiency of DRL algorithms.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
5-6-2026