Running Code¶
The code uses Hydra framework for composing configs and running the code. Let us say that we want to train multi-task SAC on MT10 (from MetaWorld). The command for that will look like:
PYTHONPATH=. python3 -u main.py \
setup=metaworld \
agent=state_sac \
env=metaworld-mt10 \
agent.multitask.num_envs=10 \
agent.multitask.should_use_disentangled_alpha=True
Let us break this command piece by piece.
setup=metaworld
says that we want to use a specific setup calledmetaworld
. In multitask RL, different works/environments care about different setups. For example, commonly multitask RL environments use episodic rewards as the metric to optimize while MetaWorld [YQH+20] usesuccess
as the key metric. Some multitask RL setups evaluate the agent on the same set of environments that it was trained on while HiPBMDP [ZSKP20] evalutes on three sets of unseen environments. We abstract away these details viasetup
parameter. Supported values are listed here. When a setup is selected, the corresponding metrics config is also loaded. By default, we also load optimizers and agent components based on thesetup
value but this can be easily overided (as described in the next step). We can easily add a new setup, by defining a new config or updating the existing configs. For example, to add thehipbdmp
setup, we added a metrics config and new optimizer configs assuming the values should be different for the new setup. But we do not change the agent configs as the agent implementation does not have to change with the setup, though the user is free to update the agent configs as well.agent=state_sac
says that we want to train SAC using state observations.Other supported agents are listed as top-level yaml files here.
Update the config files to change the agent’s hyper-parameters.
Add a new config file to support a new agent.
You would note that we are using the
setup
value in the name of component configs and optimizer configs. This is completely optional and the same effect can be achieved by command line overrides. We opt for using multiple config to reduce the overhead of remembering what values to override when running the code.
env=metaworld-mt10
says that we want to train on MT10 environment from MetaWorld.Other supported environments are listed here.
New environments can be added by creating a new config file in the directory above.
agent.multitask.num_envs=10
sets the number of tasks to be 10.agent.multitask.should_use_disentangled_alpha=True
says that we want to learn a different entropy coefficient for each task.
We can update the previous command to train multi-task multi-headed SAC
agent by adding an additional argument agent.multitask.should_use_multi_head_policy=True
as follows:
PYTHONPATH=. python3 -u main.py \
setup=metaworld \
agent=state_sac \
env=metaworld-mt10 \
agent.multitask.num_envs=10 \
agent.multitask.should_use_disentangled_alpha=True \
agent.multitask.should_use_multi_head_policy=True
We can control more aspetts of training (like seed, number of training steps, batch size etc) by adding ad the previous command to train multi-task multi-headed SAC agent by adding additional arguments as follows:
PYTHONPATH=. python3 -u main.py \
setup=metaworld \
env=metaworld-mt10 \
agent=state_sac \
agent.multitask.num_envs=10 \
agent.multitask.should_use_disentangled_alpha=True \
agent.multitask.should_use_multi_head_policy=True \
experiment.num_train_steps=2000000 \
setup.seed=1 \
replay_buffer.batch_size=1280
experiment.num_train_steps=2000000
says that we should train the agent for 2 million steps.setup.seed=1
sets the seed to 1.replay_buffer.batch_size=1280
says that batches, sampled from replay buffer, will contain 1280 transitions.