Bayesian Conservative Reinforcement Learning
For UC Berkeley's Deep Reinforcement Learning Class (CS 285), a few friends and I combined two recent RL approaches to create a conservative single-agent RL algorithm, maximizing the worst-case return within a specificed lower percentile (CVar). Cool project.
Why Conservatism Matters
Sometimes optimizing for the "worst-case" scenario is valuable to us. For example, it is unlikely to encounter road hazards while driving. However, agents that optimize for the average case may not account for this situation, whereas more risk-averse agents would be more prepared.