mtrl.agent.components package¶

Submodules¶

mtrl.agent.components.actor module¶

Actor component for the agent.

class mtrl.agent.components.actor.Actor(env_obs_shape: List[int], action_shape: List[int], hidden_dim: int, num_layers: int, log_std_bounds: Tuple[float, float], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.actor.BaseActor

Actor component for the agent.

Parameters

env_obs_shape (List[int]) – shape of the environment observation that the actor gets.
action_shape (List[int]) – shape of the action vector that the actor produces.
hidden_dim (int) – hidden dimensionality of the actor.
num_layers (int) – number of layers in the actor.
log_std_bounds (Tuple[float, float]) – bounds to clip log of standard deviation.
encoder_cfg (ConfigType) – config for the encoder.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

encode(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Compute the predictions from the actor.

Parameters

mtobs (MTObs) – multi-task observation.
detach_encoder (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

tuple of (mean of the gaussian, sample from the gaussian,

log-probability of the sample, log of standard deviation of the gaussian).

Return type

Tuple[TensorType, TensorType, TensorType, TensorType]

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

make_model(action_shape: List[int], hidden_dim: int, num_layers: int, encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig) → torch.nn.modules.module.Module[source]¶

Make the model for the actor.

Parameters

action_shape (List[int]) –
hidden_dim (int) –
num_layers (int) –
encoder_cfg (ConfigType) –
multitask_cfg (ConfigType) –

Returns

model for the actor.

Return type

ModelType

training: bool¶

class mtrl.agent.components.actor.BaseActor(env_obs_shape: List[int], action_shape: List[int], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig, *args, **kwargs)[source]¶

Bases: mtrl.agent.components.base.Component

Interface for the actor component for the agent.

Parameters

env_obs_shape (List[int]) – shape of the environment observation that the actor gets.
action_shape (List[int]) – shape of the action vector that the actor produces.
encoder_cfg (ConfigType) – config for the encoder.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

encode(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]¶

Compute the predictions from the actor.

Parameters

mtobs (MTObs) – multi-task observation.
detach_encoder (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

tuple of (mean of the gaussian, sample from the gaussian,

log-probability of the sample, log of standard deviation of the gaussian).

Return type

Tuple[TensorType, TensorType, TensorType, TensorType]

training: bool¶

mtrl.agent.components.actor.check_if_should_use_multi_head_policy(multitask_cfg: omegaconf.dictconfig.DictConfig) → bool[source]¶

mtrl.agent.components.actor.check_if_should_use_task_encoder(multitask_cfg: omegaconf.dictconfig.DictConfig) → bool[source]¶

mtrl.agent.components.base module¶

Interface for the agent components.

class mtrl.agent.components.base.Component[source]¶

Bases: torch.nn.modules.module.Module

Basic component (for building the agent) that every other component should extend.

It inherits torch.nn.Module.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

training: bool¶

mtrl.agent.components.critic module¶

Critic component for the agent.

class mtrl.agent.components.critic.Critic(env_obs_shape: List[int], action_shape: List[int], hidden_dim: int, num_layers: int, encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.base.Component

Critic component for the agent.

Parameters

env_obs_shape (List[int]) – shape of the environment observation that the actor gets.
action_shape (List[int]) – shape of the action vector that the actor produces.
hidden_dim (int) – hidden dimensionality of the actor.
num_layers (int) – number of layers in the actor.
encoder_cfg (ConfigType) – config for the encoder.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

encode(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Returns

encoding of the observation.

Return type

TensorType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, action: torch.Tensor, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor][source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

training: bool¶

class mtrl.agent.components.critic.QFunction(obs_dim: int, action_dim: int, hidden_dim: int, num_layers: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.base.Component

Q-function implemented as a MLP.

Parameters

obs_dim (int) – size of the observation.
action_dim (int) – size of the action vector.
hidden_dim (int) – size of the hidden layer of the model.
num_layers (int) – number of layers in the model.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

build_model(obs_dim: int, action_dim: int, hidden_dim: int, num_layers: int, multitask_cfg: omegaconf.dictconfig.DictConfig) → torch.nn.modules.module.Module[source]¶

Build the Q-Function.

Parameters

obs_dim (int) – size of the observation.
action_dim (int) – size of the action vector.
hidden_dim (int) – size of the hidden layer of the trunk.
num_layers (int) – number of layers in the model.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

Returns

Return type

ModelType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

training: bool¶

mtrl.agent.components.decoder module¶

Decoder component for the agent.

class mtrl.agent.components.decoder.PixelDecoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int = 2, num_filters: int = 32)[source]¶

Bases: mtrl.agent.components.base.Component

Convolutional decoder for pixels observations.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
num_filters (int, optional) – number of conv filters per layer. Defaults to 32.

forward(h: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

training: bool¶

mtrl.agent.components.decoder.make_decoder(env_obs_shape: List[int], decoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

mtrl.agent.components.encoder module¶

Encoder component for the agent.

class mtrl.agent.components.encoder.Encoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, *args, **kwargs)[source]¶

Bases: mtrl.agent.components.base.Component

Interface for the encoder component of the agent.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

copy_conv_weights_from(source: mtrl.agent.components.encoder.Encoder) → None[source]¶

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters: source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

training: bool¶

class mtrl.agent.components.encoder.FeedForwardEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int, hidden_dim: int, should_tie_encoders: bool)[source]¶

Bases: mtrl.agent.components.encoder.Encoder

Feedforward encoder for state observations.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
hidden_dim (int, optional) – number of conv filters per layer. Defaults to 32.
should_tie_encoders (bool) – should the feed-forward layers be tied.

copy_conv_weights_from(source: mtrl.agent.components.encoder.Encoder)[source]¶

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters: source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

training: bool¶

class mtrl.agent.components.encoder.FiLM(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int, hidden_dim: int, should_tie_encoders: bool)[source]¶

Bases: mtrl.agent.components.encoder.FeedForwardEncoder

Feedforward encoder for state observations.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
hidden_dim (int, optional) – number of conv filters per layer. Defaults to 32.
should_tie_encoders (bool) – should the feed-forward layers be tied.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

training: bool¶

class mtrl.agent.components.encoder.IdentityEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int)[source]¶

Bases: mtrl.agent.components.encoder.Encoder

Identity encoder that does not perform any operations.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (#) – feature dimension.
num_layers (#) – number of layers. Defaults to 2.
num_filters (#) – number of conv filters per layer. Defaults to 32.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

training: bool¶

class mtrl.agent.components.encoder.MixtureofExpertsEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, encoder_cfg: omegaconf.dictconfig.DictConfig, task_id_to_encoder_id_cfg: omegaconf.dictconfig.DictConfig, num_experts: int)[source]¶

Bases: mtrl.agent.components.encoder.Encoder

Mixture of Experts based encoder.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
encoder_cfg (ConfigType) – config for the experts in the mixture.
task_id_to_encoder_id_cfg (ConfigType) – mapping between the tasks and the encoders.
num_experts (int) – number of experts.

copy_conv_weights_from(source)[source]¶

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters: source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

training: bool¶

class mtrl.agent.components.encoder.PixelEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int = 2, num_filters: int = 32)[source]¶

Bases: mtrl.agent.components.encoder.Encoder

Convolutional encoder for pixels observations.

Parameters

env_obs_shape (List[int]) – shape of the observation that the actor gets.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
feature_dim (int) – feature dimension.
num_layers (int, optional) – number of layers. Defaults to 2.
num_filters (int, optional) – number of conv filters per layer. Defaults to 32.

copy_conv_weights_from(source: mtrl.agent.components.encoder.Encoder)[source]¶

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters: source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]¶

Encode the input observation.

Parameters

mtobs (MTObs) – multi-task observation.
detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError –

Returns

encoding of the observation.

Return type

TensorType

forward_conv(env_obs: torch.Tensor) → torch.Tensor[source]¶

Encode the environment observation using the convolutional layers.

Parameters: env_obs (TensorType) – observation from the environment.
Returns: encoding of the observation.
Return type: TensorType

reparameterize(mu: torch.Tensor, logstd: torch.Tensor) → torch.Tensor[source]¶

Reparameterization Trick

Parameters

mu (TensorType) – mean of the gaussian.
logstd (TensorType) – log of standard deviation of the gaussian.

Returns

sample from the gaussian.

Return type

TensorType

training: bool¶

mtrl.agent.components.encoder.make_encoder(env_obs_shape: List[int], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

mtrl.agent.components.encoder.tie_weights(src, trg)[source]¶

mtrl.agent.components.hipbmdp_theta module¶

Implementation of the theta component described in “Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP” Link: https://arxiv.org/abs/2007.07206

class mtrl.agent.components.hipbmdp_theta.ThetaModel(dim: int, output_dim: int, num_envs: int, train_env_id: List[str])[source]¶

Bases: mtrl.agent.components.base.Component

Implementation of the theta component described in: “Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP” Link: https://arxiv.org/abs/2007.07206

Parameters

dim (int) – input dimension.
output_dim (int) – output dimension.
num_envs (int) – number of environments.
train_env_id (List[str]) – index of environments corresponding to training tasks. Some strategies (for sampling theta) need this information.

forward(env_index: torch.Tensor, theta_sampling_strategy: str, modes: List[str]) → torch.Tensor[source]¶

Sample theta.

Following strategies are supported:

embedding - use an embedding layer and index into it using
task index. This is the default strategy and used during training and testing on in-distribution environments.

zero - set theta as tensor of zeros.

mean - use an embedding layer and set theta as the mean of
all the embeddings.

mean_train - use an embedding layer and set theta as the mean of
all the embeddings that were trained.

Parameters

env_index (TensorType) –
theta_sampling_strategy (str) – strategy to sample theta.
modes (List[str]) – List of train/eval/… modes.

Returns

sampled theta.

Return type

TensorType

training: bool¶

class mtrl.agent.components.hipbmdp_theta.ThetaSamplingStrategy(value)[source]¶

Bases: enum.Enum

Different strategies for sampling theta values.

embedding - use an embedding layer and index into it using
task index.
zero - set theta as tensor of zeros.
mean - use an embedding layer and set theta as the mean of
all the embeddings.
mean_train - use an embedding layer and set theta as the mean of
all the embeddings that were trained.

EMBEDDING = 'embedding'¶

MEAN = 'mean'¶

MEAN_TRAIN = 'mean_train'¶

ZERO = 'zero'¶

mtrl.agent.components.moe_layer module¶

Layers for parallelizing computation with mixture of experts.

A mixture of experts(models) can be easily simulated by maintaining a list of models and iterating over them. However, this can be slow in practice. We provide some additional modules which makes it easier to create mixture of experts without slowing down training/inference.

class mtrl.agent.components.moe_layer.AttentionBasedExperts(num_tasks: int, num_experts: int, embedding_dim: int, hidden_dim: int, num_layers: int, temperature: bool, should_use_soft_attention: bool, task_encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig, topk: Optional[int] = None)[source]¶

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Class for interfacing with a mixture of experts.

Parameters: multitask_cfg (ConfigType) – config for multitask training.

forward(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class mtrl.agent.components.moe_layer.ClusterOfExperts(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig, env_name: str, task_description: Dict[str, str], ordered_task_list: List[str], mapping_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Map the ith task to a subset (cluster) of experts.

Parameters

num_tasks (int) – number of tasks.
num_experts (int) – number of experts in the mixture of experts.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
multitask_cfg (ConfigType) – config for multitask training.
env_name (str) – name of the environment. This is used with the mapping configuration.
task_description (Dict[str, str]) – dictionary mapping task names to descriptions.
ordered_task_list (List[str]) – ordered list of tasks. This is needed because the task description is not always ordered.
mapping_cfg (ConfigType) – config for mapping the tasks to subset of experts.

training: bool¶

class mtrl.agent.components.moe_layer.EnsembleOfExperts(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Ensemble of all the experts.

Parameters

num_tasks (int) – number of tasks.
num_experts (int) – number of experts in the mixture of experts.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
multitask_cfg (ConfigType) – config for multitask training.

training: bool¶

class mtrl.agent.components.moe_layer.FeedForward(num_experts: int, in_features: int, out_features: int, num_layers: int, hidden_features: int, bias: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

A feedforward model of mixture of experts layers.

Parameters

num_experts (int) – number of experts in the mixture.
in_features (int) – size of each input sample for one expert.
out_features (int) – size of each output sample for one expert.
num_layers (int) – number of layers in the feedforward network.
hidden_features (int) – dimensionality of hidden layer in the feedforward network.
bias (bool, optional) – if set to False, the layer will not learn an additive bias. Defaults to True.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class mtrl.agent.components.moe_layer.Linear(num_experts: int, in_features: int, out_features: int, bias: bool = True)[source]¶

Bases: torch.nn.modules.module.Module

torch.nn.Linear layer extended for use as a mixture of experts.

Parameters

num_experts (int) – number of experts in the mixture.
in_features (int) – size of each input sample for one expert.
out_features (int) – size of each output sample for one expert.
bias (bool, optional) – if set to False, the layer will not learn an additive bias. Defaults to True.

extra_repr() → str[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class mtrl.agent.components.moe_layer.MaskCache(num_tasks: int, num_eval_episodes: int, batch_size: int, task_index_to_mask: torch.Tensor)[source]¶

Bases: object

In multitask learning, using a mixture of models, different tasks

can be mapped to different combination of models. This utility class caches these mappings so that they do not have to be revaluated.

For example, when the model is training over 10 tasks, and the tasks are always ordered, the mapping of task index to encoder indices will be the same and need not be recomputed. We take a very simple approach here: cache using the number of tasks, since in our case, the task ordering during training and evaluation does not change. In more complex cases, a mode (train/eval..) based key could be used.

This gets a little trickier during evaluation. We assume that we are running multiple evaluation episodes (per task) at once. So during evaluation, the agent is inferring over num_tasks*num_eval_episodes at once.

We have to be careful about not caching the mapping during update because neither the task distribution, nor the task ordering, is pre-determined during update. So we explicitly exclude the batch_size from the list of keys being cached.

Parameters

num_tasks (int) – number of tasks.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
task_index_to_mask (TensorType) – mapping of task index to mask.

get_mask(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]¶

Get the mask corresponding to a given task info.

Parameters: task_info (TaskInfo) –
Returns: encoder mask.
Return type: TensorType

class mtrl.agent.components.moe_layer.MixtureOfExperts(multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: torch.nn.modules.module.Module

Class for interfacing with a mixture of experts.

Parameters: multitask_cfg (ConfigType) – config for multitask training.

forward(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class mtrl.agent.components.moe_layer.OneToOneExperts(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Map the output of ith expert with the ith task.

Parameters

num_tasks (int) – number of tasks.
num_experts (int) – number of experts in the mixture of experts.
num_eval_episodes (int) – number of episodes run during evaluation.
batch_size (int) – batch size for update.
multitask_cfg (ConfigType) – config for multitask training.

mask_cache: mtrl.agent.components.moe_layer.MaskCache¶

training: bool¶

mtrl.agent.components.reward_decoder module¶

Reward decoder component for the agent.

class mtrl.agent.components.reward_decoder.RewardDecoder(feature_dim: int)[source]¶

Bases: mtrl.agent.components.base.Component

Predict reward using the observations.

Parameters: feature_dim (int) – dimension of the feature used to predict the reward.

forward(x: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

training: bool¶

mtrl.agent.components.scripted_soft_modularization module¶

mtrl.agent.components.soft_modularization module¶

Implementation of the soft routing network and MLP described in “Multi-Task Reinforcement Learning with Soft Modularization” Link: https://arxiv.org/abs/2003.13661

class mtrl.agent.components.soft_modularization.RoutingNetwork(in_features: int, hidden_features: int, num_experts_per_layer: int, num_layers: int)[source]¶

Bases: mtrl.agent.components.base.Component

Class to implement the routing network in ‘Multi-Task Reinforcement Learning with Soft Modularization’ paper.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

class mtrl.agent.components.soft_modularization.SoftModularizedMLP(num_experts: int, in_features: int, out_features: int, num_layers: int, hidden_features: int, bias: bool = True)[source]¶

Bases: mtrl.agent.components.base.Component

Class to implement the actor/critic in ‘Multi-Task Reinforcement Learning with Soft Modularization’ paper. It is similar to layers.FeedForward but allows selection of expert at each layer.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

mtrl.agent.components.task_encoder module¶

Component to encode the task.

class mtrl.agent.components.task_encoder.TaskEncoder(pretrained_embedding_cfg: omegaconf.dictconfig.DictConfig, num_embeddings: int, embedding_dim: int, hidden_dim: int, num_layers: int, output_dim: int)[source]¶

Bases: mtrl.agent.components.base.Component

Encode the task into a vector.

Parameters

pretrained_embedding_cfg (ConfigType) – config for using pretrained embeddings.
num_embeddings (int) – number of elements in the embedding table. This is used if pretrained embedding is not used.
embedding_dim (int) – dimension for the embedding. This is used if pretrained embedding is not used.
hidden_dim (int) – dimension of the hidden layer of the trunk.
num_layers (int) – number of layers in the trunk.
output_dim (int) – output dimension of the task encoder.

forward(env_index: torch.Tensor) → torch.Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool¶

mtrl.agent.components.transition_model module¶

Transition dynamics for the agent.

class mtrl.agent.components.transition_model.DeterministicTransitionModel(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.transition_model.TransitionModel

Determinisitc model for predicting the transition dynamics.

Parameters

encoder_feature_dim (int) – size of the input feature.
action_shape (List[int]) – size of the action vector.
layer_width (int) – width for each layer.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

forward(x: torch.Tensor) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]¶

Return the mean and standard deviation of the: gaussian distribution that the model predicts for the next state.

Parameters: x (TensorType) – input.
Returns: [mean of gaussian distribution, sigma of gaussian distribution]
Return type: Tuple[TensorType, TensorType]

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

sample_prediction(x: torch.Tensor) → torch.Tensor[source]¶

Sample a possible value of next state from the model.

Parameters: x (TensorType) – input.
Returns: predicted next state.
Return type: TensorType

training: bool¶

class mtrl.agent.components.transition_model.ProbabilisticTransitionModel(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig, max_sigma: float = 10.0, min_sigma: float = 0.0001)[source]¶

Bases: mtrl.agent.components.transition_model.TransitionModel

Probabilistic model for predicting the transition dynamics.

Parameters

encoder_feature_dim (int) – size of the input feature.
action_shape (List[int]) – size of the action vector.
layer_width (int) – width for each layer.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.
max_sigma (float, optional) – maximum value of sigma (of the learned gaussian distribution). Larger values are clipped to this value. Defaults to 1e1.
min_sigma (float, optional) – minimum value of sigma (of the learned gaussian distribution). Smaller values are clipped to this value. Defaults to 1e-4.

forward(x)[source]¶

Return the mean and standard deviation of the: gaussian distribution that the model predicts for the next state.

Parameters: x (TensorType) – input.
Returns: [mean of gaussian distribution, sigma of gaussian distribution]
Return type: Tuple[TensorType, TensorType]

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]¶

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns: list of layers.
Return type: List[ModelType]

sample_prediction(x)[source]¶

Sample a possible value of next state from the model.

Parameters: x (TensorType) – input.
Returns: predicted next state.
Return type: TensorType

training: bool¶

class mtrl.agent.components.transition_model.TransitionModel(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

Bases: mtrl.agent.components.base.Component

Model for predicting the transition dynamics.

Parameters

encoder_feature_dim (int) – size of the input feature.
action_shape (List[int]) – size of the action vector.
layer_width (int) – width for each layer.
multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

forward(x: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]¶

Return the mean and standard deviation of the: gaussian distribution that the model predicts for the next state.

Parameters: x (TensorType) – input.
Returns: [mean of gaussian distribution, sigma of gaussian distribution]
Return type: Tuple[TensorType, TensorType]

sample_prediction(x: torch.Tensor) → torch.Tensor[source]¶

Sample a possible value of next state from the model.

Parameters: x (TensorType) – input.
Returns: predicted next state.
Return type: TensorType

training: bool¶

mtrl.agent.components.transition_model.make_transition_model(action_shape: List[int], transition_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]¶

mtrl.agent.components package¶

Submodules¶

mtrl.agent.components.actor module¶

mtrl.agent.components.base module¶

mtrl.agent.components.critic module¶

mtrl.agent.components.decoder module¶

mtrl.agent.components.encoder module¶

mtrl.agent.components.hipbmdp_theta module¶

mtrl.agent.components.moe_layer module¶

mtrl.agent.components.reward_decoder module¶

mtrl.agent.components.scripted_soft_modularization module¶

mtrl.agent.components.soft_modularization module¶

mtrl.agent.components.task_encoder module¶

mtrl.agent.components.transition_model module¶

Module contents¶