A markov decision process, MDP for short, is defined by the tuple where

  • is the set of all possible states, also called the state space
  • is the set of all possible actions, also known as the action space
  • is a probability function on such that is the probability of endig up in state upon taking action at state . (todo, this could be explained in more detail)
  • such that represents reward recieved after taking action at state that lead to gettting to state .

Note, that in a markov decision process the next state is only reliant the previous state, all the other states that have come before are not taken into count.