BRANDON MORRIS

Updated 81 days ago

ID: 40900229/57

CLICK HERE TO SEE DETAILS OF COMPANY CHANGES

Up to this point, we've only described the reinforcement learning problem: given an MDP, we want to figure out good actions that will maximizes the sum of our rewards (i.e. the return). The process of deciding an action from a state is known as a policy, so in other words, we want to learn the best policy for a given task. There are several different algorithms that do this, but one of the most straightforward that we'll look at here is known as Q-learning... Note the action selection process. Initially, our agent has no idea what good actions are. As such, we want it to explore very broadly, so that it can get a diverse range of experience that it can build off of. The method will determine some exploration rate depending on how far into training we are. As the agent is more and more trained, it will take random actions (i.e. explore) less and more often take the best action available. This is known as an "epsilon-greedy" policy. When we're done training, or evaluating our model, we..

SEARCH FOR SIMILAR COMPANIES

Interest Score

HIT Score

0.73

General

ma..@brandonmorris.dev

Domain

brandonmorris.dev

Actual

brandonmorris.dev

104.21.16.1, 104.21.32.1, 104.21.48.1, 104.21.64.1, 104.21.80.1, 104.21.96.1, 104.21.112.1

Status

Category

Company

0 comments Add a comment