📅  最后修改于: 2022-03-11 14:57:57.843000             🧑  作者: Mango
a policy π is a function that takes as input a state s and returns an action a.
That is: π(s) → a
a policy π is a probability distribution over actions given states.