mtrl.agent.components package¶
Submodules¶
mtrl.agent.components.actor module¶
Actor component for the agent.
-
class
mtrl.agent.components.actor.
Actor
(env_obs_shape: List[int], action_shape: List[int], hidden_dim: int, num_layers: int, log_std_bounds: Tuple[float, float], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.actor.BaseActor
Actor component for the agent.
- Parameters
env_obs_shape (List[int]) – shape of the environment observation that the actor gets.
action_shape (List[int]) – shape of the action vector that the actor produces.
hidden_dim (int) – hidden dimensionality of the actor.
num_layers (int) – number of layers in the actor.
log_std_bounds (Tuple[float, float]) – bounds to clip log of standard deviation.
encoder_cfg (ConfigType) – config for the encoder.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
encode
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶ Compute the predictions from the actor.
- Parameters
mtobs (MTObs) – multi-task observation.
detach_encoder (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
tuple of (mean of the gaussian, sample from the gaussian,
log-probability of the sample, log of standard deviation of the gaussian).
- Return type
Tuple[TensorType, TensorType, TensorType, TensorType]
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
make_model
(action_shape: List[int], hidden_dim: int, num_layers: int, encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig) → torch.nn.modules.module.Module[source]¶ Make the model for the actor.
- Parameters
action_shape (List[int]) –
hidden_dim (int) –
num_layers (int) –
encoder_cfg (ConfigType) –
multitask_cfg (ConfigType) –
- Returns
model for the actor.
- Return type
ModelType
-
training
: bool¶
-
class
mtrl.agent.components.actor.
BaseActor
(env_obs_shape: List[int], action_shape: List[int], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig, *args, **kwargs)[source]¶ Bases:
mtrl.agent.components.base.Component
Interface for the actor component for the agent.
- Parameters
env_obs_shape (List[int]) – shape of the environment observation that the actor gets.
action_shape (List[int]) – shape of the action vector that the actor produces.
encoder_cfg (ConfigType) – config for the encoder.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
encode
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶ Compute the predictions from the actor.
- Parameters
mtobs (MTObs) – multi-task observation.
detach_encoder (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
tuple of (mean of the gaussian, sample from the gaussian,
log-probability of the sample, log of standard deviation of the gaussian).
- Return type
Tuple[TensorType, TensorType, TensorType, TensorType]
-
training
: bool¶
mtrl.agent.components.base module¶
Interface for the agent components.
-
class
mtrl.agent.components.base.
Component
[source]¶ Bases:
torch.nn.modules.module.Module
Basic component (for building the agent) that every other component should extend.
It inherits torch.nn.Module.
Initializes internal Module state, shared by both nn.Module and ScriptModule.
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
training
: bool¶
mtrl.agent.components.critic module¶
Critic component for the agent.
-
class
mtrl.agent.components.critic.
Critic
(env_obs_shape: List[int], action_shape: List[int], hidden_dim: int, num_layers: int, encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.base.Component
Critic component for the agent.
- Parameters
env_obs_shape (List[int]) – shape of the environment observation that the actor gets.
action_shape (List[int]) – shape of the action vector that the actor produces.
hidden_dim (int) – hidden dimensionality of the actor.
num_layers (int) – number of layers in the actor.
encoder_cfg (ConfigType) – config for the encoder.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
encode
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Returns
encoding of the observation.
- Return type
TensorType
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, action: torch.Tensor, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor][source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
training
: bool¶
-
class
mtrl.agent.components.critic.
QFunction
(obs_dim: int, action_dim: int, hidden_dim: int, num_layers: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.base.Component
Q-function implemented as a MLP.
- Parameters
obs_dim (int) – size of the observation.
action_dim (int) – size of the action vector.
hidden_dim (int) – size of the hidden layer of the model.
num_layers (int) – number of layers in the model.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
build_model
(obs_dim: int, action_dim: int, hidden_dim: int, num_layers: int, multitask_cfg: omegaconf.dictconfig.DictConfig) → torch.nn.modules.module.Module[source]¶ Build the Q-Function.
- Parameters
obs_dim (int) – size of the observation.
action_dim (int) – size of the action vector.
hidden_dim (int) – size of the hidden layer of the trunk.
num_layers (int) – number of layers in the model.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
- Returns
- Return type
ModelType
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
training
: bool¶
mtrl.agent.components.decoder module¶
Decoder component for the agent.
-
class
mtrl.agent.components.decoder.
PixelDecoder
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int = 2, num_filters: int = 32)[source]¶ Bases:
mtrl.agent.components.base.Component
Convolutional decoder for pixels observations.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
num_filters (int, optional) – number of conv filters per layer. Defaults to 32.
-
forward
(h: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
training
: bool¶
mtrl.agent.components.encoder module¶
Encoder component for the agent.
-
class
mtrl.agent.components.encoder.
Encoder
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, *args, **kwargs)[source]¶ Bases:
mtrl.agent.components.base.Component
Interface for the encoder component of the agent.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
copy_conv_weights_from
(source: mtrl.agent.components.encoder.Encoder) → None[source]¶ Copy convolutional weights from the source encoder.
The no-op implementation should be overridden only by encoders that take convnets.
- Parameters
source (Encoder) – encoder to copy weights from.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.encoder.
FeedForwardEncoder
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int, hidden_dim: int, should_tie_encoders: bool)[source]¶ Bases:
mtrl.agent.components.encoder.Encoder
Feedforward encoder for state observations.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
hidden_dim (int, optional) – number of conv filters per layer. Defaults to 32.
should_tie_encoders (bool) – should the feed-forward layers be tied.
-
copy_conv_weights_from
(source: mtrl.agent.components.encoder.Encoder)[source]¶ Copy convolutional weights from the source encoder.
The no-op implementation should be overridden only by encoders that take convnets.
- Parameters
source (Encoder) – encoder to copy weights from.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.encoder.
FiLM
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int, hidden_dim: int, should_tie_encoders: bool)[source]¶ Bases:
mtrl.agent.components.encoder.FeedForwardEncoder
Feedforward encoder for state observations.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
hidden_dim (int, optional) – number of conv filters per layer. Defaults to 32.
should_tie_encoders (bool) – should the feed-forward layers be tied.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.encoder.
IdentityEncoder
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int)[source]¶ Bases:
mtrl.agent.components.encoder.Encoder
Identity encoder that does not perform any operations.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (#) – feature dimension.
num_layers (#) – number of layers. Defaults to 2.
num_filters (#) – number of conv filters per layer. Defaults to 32.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.encoder.
MixtureofExpertsEncoder
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, encoder_cfg: omegaconf.dictconfig.DictConfig, task_id_to_encoder_id_cfg: omegaconf.dictconfig.DictConfig, num_experts: int)[source]¶ Bases:
mtrl.agent.components.encoder.Encoder
Mixture of Experts based encoder.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
encoder_cfg (ConfigType) – config for the experts in the mixture.
task_id_to_encoder_id_cfg (ConfigType) – mapping between the tasks and the encoders.
num_experts (int) – number of experts.
-
copy_conv_weights_from
(source)[source]¶ Copy convolutional weights from the source encoder.
The no-op implementation should be overridden only by encoders that take convnets.
- Parameters
source (Encoder) – encoder to copy weights from.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.encoder.
PixelEncoder
(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int = 2, num_filters: int = 32)[source]¶ Bases:
mtrl.agent.components.encoder.Encoder
Convolutional encoder for pixels observations.
- Parameters
env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
num_filters (int, optional) – number of conv filters per layer. Defaults to 32.
-
copy_conv_weights_from
(source: mtrl.agent.components.encoder.Encoder)[source]¶ Copy convolutional weights from the source encoder.
The no-op implementation should be overridden only by encoders that take convnets.
- Parameters
source (Encoder) – encoder to copy weights from.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶ Encode the input observation.
- Parameters
mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.
- Raises
NotImplementedError –
- Returns
encoding of the observation.
- Return type
TensorType
-
forward_conv
(env_obs: torch.Tensor) → torch.Tensor[source]¶ Encode the environment observation using the convolutional layers.
- Parameters
env_obs (TensorType) – observation from the environment.
- Returns
encoding of the observation.
- Return type
TensorType
-
reparameterize
(mu: torch.Tensor, logstd: torch.Tensor) → torch.Tensor[source]¶ Reparameterization Trick
- Parameters
mu (TensorType) – mean of the gaussian.
logstd (TensorType) – log of standard deviation of the gaussian.
- Returns
sample from the gaussian.
- Return type
TensorType
-
training
: bool¶
mtrl.agent.components.hipbmdp_theta module¶
Implementation of the theta component described in “Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP” Link: https://arxiv.org/abs/2007.07206
-
class
mtrl.agent.components.hipbmdp_theta.
ThetaModel
(dim: int, output_dim: int, num_envs: int, train_env_id: List[str])[source]¶ Bases:
mtrl.agent.components.base.Component
- Implementation of the theta component described in
“Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP” Link: https://arxiv.org/abs/2007.07206
- Parameters
dim (int) – input dimension.
output_dim (int) – output dimension.
num_envs (int) – number of environments.
train_env_id (List[str]) – index of environments corresponding to training tasks. Some strategies (for sampling theta) need this information.
-
forward
(env_index: torch.Tensor, theta_sampling_strategy: str, modes: List[str]) → torch.Tensor[source]¶ Sample theta.
Following strategies are supported:
- embedding - use an embedding layer and index into it using
task index. This is the default strategy and used during training and testing on in-distribution environments.
zero - set theta as tensor of zeros.
- mean - use an embedding layer and set theta as the mean of
all the embeddings.
- mean_train - use an embedding layer and set theta as the mean of
all the embeddings that were trained.
- Parameters
env_index (TensorType) –
theta_sampling_strategy (str) – strategy to sample theta.
modes (List[str]) – List of train/eval/… modes.
- Returns
sampled theta.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.hipbmdp_theta.
ThetaSamplingStrategy
(value)[source]¶ Bases:
enum.Enum
Different strategies for sampling theta values.
- embedding - use an embedding layer and index into it using
task index.
zero - set theta as tensor of zeros.
- mean - use an embedding layer and set theta as the mean of
all the embeddings.
- mean_train - use an embedding layer and set theta as the mean of
all the embeddings that were trained.
-
EMBEDDING
= 'embedding'¶
-
MEAN
= 'mean'¶
-
MEAN_TRAIN
= 'mean_train'¶
-
ZERO
= 'zero'¶
mtrl.agent.components.moe_layer module¶
Layers for parallelizing computation with mixture of experts.
A mixture of experts(models) can be easily simulated by maintaining a list of models and iterating over them. However, this can be slow in practice. We provide some additional modules which makes it easier to create mixture of experts without slowing down training/inference.
-
class
mtrl.agent.components.moe_layer.
AttentionBasedExperts
(num_tasks: int, num_experts: int, embedding_dim: int, hidden_dim: int, num_layers: int, temperature: bool, should_use_soft_attention: bool, task_encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig, topk: Optional[int] = None)[source]¶ Bases:
mtrl.agent.components.moe_layer.MixtureOfExperts
Class for interfacing with a mixture of experts.
- Parameters
multitask_cfg (ConfigType) – config for multitask training.
-
forward
(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
class
mtrl.agent.components.moe_layer.
ClusterOfExperts
(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig, env_name: str, task_description: Dict[str, str], ordered_task_list: List[str], mapping_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.moe_layer.MixtureOfExperts
Map the ith task to a subset (cluster) of experts.
- Parameters
num_tasks (int) – number of tasks.
num_experts (int) – number of experts in the mixture of experts.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
multitask_cfg (ConfigType) – config for multitask training.
env_name (str) – name of the environment. This is used with the mapping configuration.
task_description (Dict[str, str]) – dictionary mapping task names to descriptions.
ordered_task_list (List[str]) – ordered list of tasks. This is needed because the task description is not always ordered.
mapping_cfg (ConfigType) – config for mapping the tasks to subset of experts.
-
training
: bool¶
-
class
mtrl.agent.components.moe_layer.
EnsembleOfExperts
(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.moe_layer.MixtureOfExperts
Ensemble of all the experts.
- Parameters
num_tasks (int) – number of tasks.
num_experts (int) – number of experts in the mixture of experts.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
multitask_cfg (ConfigType) – config for multitask training.
-
training
: bool¶
-
class
mtrl.agent.components.moe_layer.
FeedForward
(num_experts: int, in_features: int, out_features: int, num_layers: int, hidden_features: int, bias: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
A feedforward model of mixture of experts layers.
- Parameters
num_experts (int) – number of experts in the mixture.
in_features (int) – size of each input sample for one expert.
out_features (int) – size of each output sample for one expert.
num_layers (int) – number of layers in the feedforward network.
hidden_features (int) – dimensionality of hidden layer in the feedforward network.
bias (bool, optional) – if set to
False
, the layer will not learn an additive bias. Defaults to True.
-
forward
(x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
class
mtrl.agent.components.moe_layer.
Linear
(num_experts: int, in_features: int, out_features: int, bias: bool = True)[source]¶ Bases:
torch.nn.modules.module.Module
torch.nn.Linear layer extended for use as a mixture of experts.
- Parameters
num_experts (int) – number of experts in the mixture.
in_features (int) – size of each input sample for one expert.
out_features (int) – size of each output sample for one expert.
bias (bool, optional) – if set to
False
, the layer will not learn an additive bias. Defaults to True.
-
extra_repr
() → str[source]¶ Set the extra representation of the module
To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.
-
forward
(x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
class
mtrl.agent.components.moe_layer.
MaskCache
(num_tasks: int, num_eval_episodes: int, batch_size: int, task_index_to_mask: torch.Tensor)[source]¶ Bases:
object
- In multitask learning, using a mixture of models, different tasks
can be mapped to different combination of models. This utility class caches these mappings so that they do not have to be revaluated.
For example, when the model is training over 10 tasks, and the tasks are always ordered, the mapping of task index to encoder indices will be the same and need not be recomputed. We take a very simple approach here: cache using the number of tasks, since in our case, the task ordering during training and evaluation does not change. In more complex cases, a mode (train/eval..) based key could be used.
This gets a little trickier during evaluation. We assume that we are running multiple evaluation episodes (per task) at once. So during evaluation, the agent is inferring over num_tasks*num_eval_episodes at once.
We have to be careful about not caching the mapping during update because neither the task distribution, nor the task ordering, is pre-determined during update. So we explicitly exclude the batch_size from the list of keys being cached.
- Parameters
num_tasks (int) – number of tasks.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
task_index_to_mask (TensorType) – mapping of task index to mask.
-
get_mask
(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]¶ Get the mask corresponding to a given task info.
- Parameters
task_info (TaskInfo) –
- Returns
encoder mask.
- Return type
TensorType
-
class
mtrl.agent.components.moe_layer.
MixtureOfExperts
(multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
torch.nn.modules.module.Module
Class for interfacing with a mixture of experts.
- Parameters
multitask_cfg (ConfigType) – config for multitask training.
-
forward
(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
class
mtrl.agent.components.moe_layer.
OneToOneExperts
(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.moe_layer.MixtureOfExperts
Map the output of ith expert with the ith task.
- Parameters
num_tasks (int) – number of tasks.
num_experts (int) – number of experts in the mixture of experts.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
multitask_cfg (ConfigType) – config for multitask training.
-
mask_cache
: mtrl.agent.components.moe_layer.MaskCache¶
-
training
: bool¶
mtrl.agent.components.reward_decoder module¶
Reward decoder component for the agent.
-
class
mtrl.agent.components.reward_decoder.
RewardDecoder
(feature_dim: int)[source]¶ Bases:
mtrl.agent.components.base.Component
Predict reward using the observations.
- Parameters
feature_dim (int) – dimension of the feature used to predict the reward.
-
forward
(x: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
training
: bool¶
mtrl.agent.components.scripted_soft_modularization module¶
mtrl.agent.components.soft_modularization module¶
Implementation of the soft routing network and MLP described in “Multi-Task Reinforcement Learning with Soft Modularization” Link: https://arxiv.org/abs/2003.13661
-
class
mtrl.agent.components.soft_modularization.
RoutingNetwork
(in_features: int, hidden_features: int, num_experts_per_layer: int, num_layers: int)[source]¶ Bases:
mtrl.agent.components.base.Component
Class to implement the routing network in ‘Multi-Task Reinforcement Learning with Soft Modularization’ paper.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
-
class
mtrl.agent.components.soft_modularization.
SoftModularizedMLP
(num_experts: int, in_features: int, out_features: int, num_layers: int, hidden_features: int, bias: bool = True)[source]¶ Bases:
mtrl.agent.components.base.Component
Class to implement the actor/critic in ‘Multi-Task Reinforcement Learning with Soft Modularization’ paper. It is similar to layers.FeedForward but allows selection of expert at each layer.
-
forward
(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
-
mtrl.agent.components.task_encoder module¶
Component to encode the task.
-
class
mtrl.agent.components.task_encoder.
TaskEncoder
(pretrained_embedding_cfg: omegaconf.dictconfig.DictConfig, num_embeddings: int, embedding_dim: int, hidden_dim: int, num_layers: int, output_dim: int)[source]¶ Bases:
mtrl.agent.components.base.Component
Encode the task into a vector.
- Parameters
pretrained_embedding_cfg (ConfigType) – config for using pretrained embeddings.
num_embeddings (int) – number of elements in the embedding table. This is used if pretrained embedding is not used.
embedding_dim (int) – dimension for the embedding. This is used if pretrained embedding is not used.
hidden_dim (int) – dimension of the hidden layer of the trunk.
num_layers (int) – number of layers in the trunk.
output_dim (int) – output dimension of the task encoder.
-
forward
(env_index: torch.Tensor) → torch.Tensor[source]¶ Defines the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
-
training
: bool¶
mtrl.agent.components.transition_model module¶
Transition dynamics for the agent.
-
class
mtrl.agent.components.transition_model.
DeterministicTransitionModel
(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.transition_model.TransitionModel
Determinisitc model for predicting the transition dynamics.
- Parameters
encoder_feature_dim (int) – size of the input feature.
action_shape (List[int]) – size of the action vector.
layer_width (int) – width for each layer.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
forward
(x: torch.Tensor) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶ - Return the mean and standard deviation of the
gaussian distribution that the model predicts for the next state.
- Parameters
x (TensorType) – input.
- Returns
[mean of gaussian distribution, sigma of gaussian distribution]
- Return type
Tuple[TensorType, TensorType]
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
sample_prediction
(x: torch.Tensor) → torch.Tensor[source]¶ Sample a possible value of next state from the model.
- Parameters
x (TensorType) – input.
- Returns
predicted next state.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.transition_model.
ProbabilisticTransitionModel
(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig, max_sigma: float = 10.0, min_sigma: float = 0.0001)[source]¶ Bases:
mtrl.agent.components.transition_model.TransitionModel
Probabilistic model for predicting the transition dynamics.
- Parameters
encoder_feature_dim (int) – size of the input feature.
action_shape (List[int]) – size of the action vector.
layer_width (int) – width for each layer.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
max_sigma (float, optional) – maximum value of sigma (of the learned gaussian distribution). Larger values are clipped to this value. Defaults to 1e1.
min_sigma (float, optional) – minimum value of sigma (of the learned gaussian distribution). Smaller values are clipped to this value. Defaults to 1e-4.
-
forward
(x)[source]¶ - Return the mean and standard deviation of the
gaussian distribution that the model predicts for the next state.
- Parameters
x (TensorType) – input.
- Returns
[mean of gaussian distribution, sigma of gaussian distribution]
- Return type
Tuple[TensorType, TensorType]
Get the list of last layers (for different sub-components) that are shared across tasks.
This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.
- Returns
list of layers.
- Return type
List[ModelType]
-
sample_prediction
(x)[source]¶ Sample a possible value of next state from the model.
- Parameters
x (TensorType) – input.
- Returns
predicted next state.
- Return type
TensorType
-
training
: bool¶
-
class
mtrl.agent.components.transition_model.
TransitionModel
(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶ Bases:
mtrl.agent.components.base.Component
Model for predicting the transition dynamics.
- Parameters
encoder_feature_dim (int) – size of the input feature.
action_shape (List[int]) – size of the action vector.
layer_width (int) – width for each layer.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
-
forward
(x: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶ - Return the mean and standard deviation of the
gaussian distribution that the model predicts for the next state.
- Parameters
x (TensorType) – input.
- Returns
[mean of gaussian distribution, sigma of gaussian distribution]
- Return type
Tuple[TensorType, TensorType]
-
sample_prediction
(x: torch.Tensor) → torch.Tensor[source]¶ Sample a possible value of next state from the model.
- Parameters
x (TensorType) – input.
- Returns
predicted next state.
- Return type
TensorType
-
training
: bool¶