📜  策略强化学习 - 无论代码示例

📅  最后修改于: 2022-03-11 14:57:57.843000             🧑  作者: Mango

代码示例2
a policy π is a function that takes as input a state s and returns an action a.
That is: π(s) → a
a policy π is a probability distribution over actions given states.