github RL: DP
生活随笔
收集整理的這篇文章主要介紹了
github RL: DP
小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.
github RL: DP
這是github上RL練習(xí)的筆記
https://github.com/dennybritz/reinforcement-learning/tree/master/DP
Implement Policy Evaluation in Python (Gridworld)
首先觀察opai env.P的構(gòu)造
env: OpenAI env. env.P represents the transition probabilities of the environment. env.P[s][a] is a list of transition tuples (prob, next_state, reward, done). env.nS is a number of states in the environment. env.nA is a number of actions in the environment.回憶policy evaluation的迭代公式:
使用向量進(jìn)行計(jì)算
R_pi = np.zeros(shape=(env.nS)) P_pi = np.zeros(shape=(env.nS,env.nS)) v_pi = np.zeros(shape=(env.nS)) for s,s_item in env.P.items():for a,a_item in s_item.items():for dis in a_item:prob,next_state,reward,_ = disR_pi[s] += policy[s,a] * rewardP_pi[s,next_state] += policy[s,a] * prob v_change = np.ones(shape=(env.nS,env.nS)) while (np.abs(v_change) > theta).any():v_change = R_pi + discount_factor * np.dot(P_pi,v_pi) - v_piv_pi += v_change首先展開env.P計(jì)算R和P,之后進(jìn)行迭代至收斂
?
posted on 2018-07-31 12:47 pine73 閱讀(...) 評(píng)論(...) 編輯 收藏轉(zhuǎn)載于:https://www.cnblogs.com/esoteric/p/9395261.html
總結(jié)
以上是生活随笔為你收集整理的github RL: DP的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Java之Java程序与虚拟机
- 下一篇: c++中堆、栈内存分配