1 回答
TA贡献1883条经验 获得超3个赞
我认为您的问题是,当模拟终止并需要重置时,代理不会返回 adecision_step而是返回terminal_step. 这是因为代理已经丢球了,terminal_step 中返回的奖励将为 -1.0。我已经获取了你的代码并做了一些更改,现在它运行良好(除了你可能想要更改,这样你就不会在每次代理之一掉球时重置)。
import numpy as np
import mlagents
from mlagents_envs.environment import UnityEnvironment
# -----------------
# This code is used to close an env that might not have been closed before
try:
unity_env.close()
except:
pass
# -----------------
env = UnityEnvironment(file_name = None)
env.reset()
for i in range(1000):
arr = []
behavior_names = env.behavior_specs
# Go through all existing behaviors
for behavior_name in behavior_names:
decision_steps, terminal_steps = env.get_steps(behavior_name)
for agent_id_terminated in terminal_steps:
print("Agent " + behavior_name + " has terminated, resetting environment.")
# This is probably not the desired behaviour, as the other agents are still active.
env.reset()
actions = []
for agent_id_decisions in decision_steps:
actions.append(np.random.uniform(-1,1,2))
# print(decision_steps[0].reward)
# print(decision_steps[0].action_mask)
if len(actions) > 0:
env.set_actions(behavior_name, np.array(actions))
try:
env.step()
except:
print("Something happend when taking a step in the environment.")
print("The communicatior has probably terminated, stopping simulation early.")
break
env.close()
添加回答
举报