A markov decision process, MDP for short, is defined by the tuple where
- is the set of all possible states, also called the state space
- is the set of all possible actions, also known as the action space
- is a probability function on such that is the probability of endig up in state upon taking action at state . (todo, this could be explained in more detail)
- such that represents reward recieved after taking action at state that lead to gettting to state .
Note, that in a markov decision process the next state is only reliant the previous state, all the other states that have come before are not taken into count.