Markov Decision Process

A markov decision process, MDP for short, is defined by the tuple $(S, A, P_{a}, R_{a})$ where

$S$ is the set of all possible states, also called the state space
$A$ is the set of all possible actions, also known as the action space
$P_{a}$ is a probability function on $S \times S$ such that $P_{a} (s, s^{'})$ is the probability of endig up in state $s^{'}$ upon taking action $a$ at state $s$ . (todo, this could be explained in more detail)
$R_{a} : S \times S \to R \forall a \in A$ such that $R_{a} (s, s^{'})$ represents reward recieved after taking action $a$ at state $s$ that lead to gettting to state $s^{'}$ .

Note, that in a markov decision process the next state is only reliant the previous state, all the other states that have come before are not taken into count.

Jegyzetek kincstára

Explorer

Markov Decision Process