mtrl.agent.components package

Submodules

mtrl.agent.components.actor module

Actor component for the agent.

class mtrl.agent.components.actor.Actor(env_obs_shape: List[int], action_shape: List[int], hidden_dim: int, num_layers: int, log_std_bounds: Tuple[float, float], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.actor.BaseActor

Actor component for the agent.

Parameters
  • env_obs_shape (List[int]) – shape of the environment observation that the actor gets.

  • action_shape (List[int]) – shape of the action vector that the actor produces.

  • hidden_dim (int) – hidden dimensionality of the actor.

  • num_layers (int) – number of layers in the actor.

  • log_std_bounds (Tuple[float, float]) – bounds to clip log of standard deviation.

  • encoder_cfg (ConfigType) – config for the encoder.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

encode(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Compute the predictions from the actor.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach_encoder (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

tuple of (mean of the gaussian, sample from the gaussian,

log-probability of the sample, log of standard deviation of the gaussian).

Return type

Tuple[TensorType, TensorType, TensorType, TensorType]

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

make_model(action_shape: List[int], hidden_dim: int, num_layers: int, encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig) → torch.nn.modules.module.Module[source]

Make the model for the actor.

Parameters
  • action_shape (List[int]) –

  • hidden_dim (int) –

  • num_layers (int) –

  • encoder_cfg (ConfigType) –

  • multitask_cfg (ConfigType) –

Returns

model for the actor.

Return type

ModelType

training: bool
class mtrl.agent.components.actor.BaseActor(env_obs_shape: List[int], action_shape: List[int], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig, *args, **kwargs)[source]

Bases: mtrl.agent.components.base.Component

Interface for the actor component for the agent.

Parameters
  • env_obs_shape (List[int]) – shape of the environment observation that the actor gets.

  • action_shape (List[int]) – shape of the action vector that the actor produces.

  • encoder_cfg (ConfigType) – config for the encoder.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

encode(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor, torch.Tensor, torch.Tensor][source]

Compute the predictions from the actor.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach_encoder (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

tuple of (mean of the gaussian, sample from the gaussian,

log-probability of the sample, log of standard deviation of the gaussian).

Return type

Tuple[TensorType, TensorType, TensorType, TensorType]

training: bool
mtrl.agent.components.actor.check_if_should_use_multi_head_policy(multitask_cfg: omegaconf.dictconfig.DictConfig) → bool[source]
mtrl.agent.components.actor.check_if_should_use_task_encoder(multitask_cfg: omegaconf.dictconfig.DictConfig) → bool[source]

mtrl.agent.components.base module

Interface for the agent components.

class mtrl.agent.components.base.Component[source]

Bases: torch.nn.modules.module.Module

Basic component (for building the agent) that every other component should extend.

It inherits torch.nn.Module.

Initializes internal Module state, shared by both nn.Module and ScriptModule.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

training: bool

mtrl.agent.components.critic module

Critic component for the agent.

class mtrl.agent.components.critic.Critic(env_obs_shape: List[int], action_shape: List[int], hidden_dim: int, num_layers: int, encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.base.Component

Critic component for the agent.

Parameters
  • env_obs_shape (List[int]) – shape of the environment observation that the actor gets.

  • action_shape (List[int]) – shape of the action vector that the actor produces.

  • hidden_dim (int) – hidden dimensionality of the actor.

  • num_layers (int) – number of layers in the actor.

  • encoder_cfg (ConfigType) – config for the encoder.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

encode(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Returns

encoding of the observation.

Return type

TensorType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, action: torch.Tensor, detach_encoder: bool = False) → Tuple[torch.Tensor, torch.Tensor][source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

training: bool
class mtrl.agent.components.critic.QFunction(obs_dim: int, action_dim: int, hidden_dim: int, num_layers: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.base.Component

Q-function implemented as a MLP.

Parameters
  • obs_dim (int) – size of the observation.

  • action_dim (int) – size of the action vector.

  • hidden_dim (int) – size of the hidden layer of the model.

  • num_layers (int) – number of layers in the model.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

build_model(obs_dim: int, action_dim: int, hidden_dim: int, num_layers: int, multitask_cfg: omegaconf.dictconfig.DictConfig) → torch.nn.modules.module.Module[source]

Build the Q-Function.

Parameters
  • obs_dim (int) – size of the observation.

  • action_dim (int) – size of the action vector.

  • hidden_dim (int) – size of the hidden layer of the trunk.

  • num_layers (int) – number of layers in the model.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

Returns

Return type

ModelType

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

training: bool

mtrl.agent.components.decoder module

Decoder component for the agent.

class mtrl.agent.components.decoder.PixelDecoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int = 2, num_filters: int = 32)[source]

Bases: mtrl.agent.components.base.Component

Convolutional decoder for pixels observations.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • feature_dim (int) – feature dimension.

  • num_layers (int, optional) – number of layers. Defaults to 2.

  • num_filters (int, optional) – number of conv filters per layer. Defaults to 32.

forward(h: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

training: bool
mtrl.agent.components.decoder.make_decoder(env_obs_shape: List[int], decoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

mtrl.agent.components.encoder module

Encoder component for the agent.

class mtrl.agent.components.encoder.Encoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, *args, **kwargs)[source]

Bases: mtrl.agent.components.base.Component

Interface for the encoder component of the agent.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

copy_conv_weights_from(source: mtrl.agent.components.encoder.Encoder) → None[source]

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters

source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False) → torch.Tensor[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

training: bool
class mtrl.agent.components.encoder.FeedForwardEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int, hidden_dim: int, should_tie_encoders: bool)[source]

Bases: mtrl.agent.components.encoder.Encoder

Feedforward encoder for state observations.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • feature_dim (int) – feature dimension.

  • num_layers (int, optional) – number of layers. Defaults to 2.

  • hidden_dim (int, optional) – number of conv filters per layer. Defaults to 32.

  • should_tie_encoders (bool) – should the feed-forward layers be tied.

copy_conv_weights_from(source: mtrl.agent.components.encoder.Encoder)[source]

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters

source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

training: bool
class mtrl.agent.components.encoder.FiLM(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int, hidden_dim: int, should_tie_encoders: bool)[source]

Bases: mtrl.agent.components.encoder.FeedForwardEncoder

Feedforward encoder for state observations.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • feature_dim (int) – feature dimension.

  • num_layers (int, optional) – number of layers. Defaults to 2.

  • hidden_dim (int, optional) – number of conv filters per layer. Defaults to 32.

  • should_tie_encoders (bool) – should the feed-forward layers be tied.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

training: bool
class mtrl.agent.components.encoder.IdentityEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int)[source]

Bases: mtrl.agent.components.encoder.Encoder

Identity encoder that does not perform any operations.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • feature_dim (#) – feature dimension.

  • num_layers (#) – number of layers. Defaults to 2.

  • num_filters (#) – number of conv filters per layer. Defaults to 32.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

training: bool
class mtrl.agent.components.encoder.MixtureofExpertsEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, encoder_cfg: omegaconf.dictconfig.DictConfig, task_id_to_encoder_id_cfg: omegaconf.dictconfig.DictConfig, num_experts: int)[source]

Bases: mtrl.agent.components.encoder.Encoder

Mixture of Experts based encoder.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • encoder_cfg (ConfigType) – config for the experts in the mixture.

  • task_id_to_encoder_id_cfg (ConfigType) – mapping between the tasks and the encoders.

  • num_experts (int) – number of experts.

copy_conv_weights_from(source)[source]

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters

source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

training: bool
class mtrl.agent.components.encoder.PixelEncoder(env_obs_shape: List[int], multitask_cfg: omegaconf.dictconfig.DictConfig, feature_dim: int, num_layers: int = 2, num_filters: int = 32)[source]

Bases: mtrl.agent.components.encoder.Encoder

Convolutional encoder for pixels observations.

Parameters
  • env_obs_shape (List[int]) – shape of the observation that the actor gets.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • feature_dim (int) – feature dimension.

  • num_layers (int, optional) – number of layers. Defaults to 2.

  • num_filters (int, optional) – number of conv filters per layer. Defaults to 32.

copy_conv_weights_from(source: mtrl.agent.components.encoder.Encoder)[source]

Copy convolutional weights from the source encoder.

The no-op implementation should be overridden only by encoders that take convnets.

Parameters

source (Encoder) – encoder to copy weights from.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs, detach: bool = False)[source]

Encode the input observation.

Parameters
  • mtobs (MTObs) – multi-task observation.

  • detach (bool, optional) – should detach the observation encoding from the computation graph. Defaults to False.

Raises

NotImplementedError

Returns

encoding of the observation.

Return type

TensorType

forward_conv(env_obs: torch.Tensor) → torch.Tensor[source]

Encode the environment observation using the convolutional layers.

Parameters

env_obs (TensorType) – observation from the environment.

Returns

encoding of the observation.

Return type

TensorType

reparameterize(mu: torch.Tensor, logstd: torch.Tensor) → torch.Tensor[source]

Reparameterization Trick

Parameters
  • mu (TensorType) – mean of the gaussian.

  • logstd (TensorType) – log of standard deviation of the gaussian.

Returns

sample from the gaussian.

Return type

TensorType

training: bool
mtrl.agent.components.encoder.make_encoder(env_obs_shape: List[int], encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]
mtrl.agent.components.encoder.tie_weights(src, trg)[source]

mtrl.agent.components.hipbmdp_theta module

Implementation of the theta component described in “Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP” Link: https://arxiv.org/abs/2007.07206

class mtrl.agent.components.hipbmdp_theta.ThetaModel(dim: int, output_dim: int, num_envs: int, train_env_id: List[str])[source]

Bases: mtrl.agent.components.base.Component

Implementation of the theta component described in

“Multi-Task Reinforcement Learning as a Hidden-Parameter Block MDP” Link: https://arxiv.org/abs/2007.07206

Parameters
  • dim (int) – input dimension.

  • output_dim (int) – output dimension.

  • num_envs (int) – number of environments.

  • train_env_id (List[str]) – index of environments corresponding to training tasks. Some strategies (for sampling theta) need this information.

forward(env_index: torch.Tensor, theta_sampling_strategy: str, modes: List[str]) → torch.Tensor[source]

Sample theta.

Following strategies are supported:

  • embedding - use an embedding layer and index into it using

    task index. This is the default strategy and used during training and testing on in-distribution environments.

  • zero - set theta as tensor of zeros.

  • mean - use an embedding layer and set theta as the mean of

    all the embeddings.

  • mean_train - use an embedding layer and set theta as the mean of

    all the embeddings that were trained.

Parameters
  • env_index (TensorType) –

  • theta_sampling_strategy (str) – strategy to sample theta.

  • modes (List[str]) – List of train/eval/… modes.

Returns

sampled theta.

Return type

TensorType

training: bool
class mtrl.agent.components.hipbmdp_theta.ThetaSamplingStrategy(value)[source]

Bases: enum.Enum

Different strategies for sampling theta values.

  • embedding - use an embedding layer and index into it using

    task index.

  • zero - set theta as tensor of zeros.

  • mean - use an embedding layer and set theta as the mean of

    all the embeddings.

  • mean_train - use an embedding layer and set theta as the mean of

    all the embeddings that were trained.

EMBEDDING = 'embedding'
MEAN = 'mean'
MEAN_TRAIN = 'mean_train'
ZERO = 'zero'

mtrl.agent.components.moe_layer module

Layers for parallelizing computation with mixture of experts.

A mixture of experts(models) can be easily simulated by maintaining a list of models and iterating over them. However, this can be slow in practice. We provide some additional modules which makes it easier to create mixture of experts without slowing down training/inference.

class mtrl.agent.components.moe_layer.AttentionBasedExperts(num_tasks: int, num_experts: int, embedding_dim: int, hidden_dim: int, num_layers: int, temperature: bool, should_use_soft_attention: bool, task_encoder_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig, topk: Optional[int] = None)[source]

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Class for interfacing with a mixture of experts.

Parameters

multitask_cfg (ConfigType) – config for multitask training.

forward(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mtrl.agent.components.moe_layer.ClusterOfExperts(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig, env_name: str, task_description: Dict[str, str], ordered_task_list: List[str], mapping_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Map the ith task to a subset (cluster) of experts.

Parameters
  • num_tasks (int) – number of tasks.

  • num_experts (int) – number of experts in the mixture of experts.

  • num_eval_episodes (int) – number of episodes run during evaluation.

  • batch_size (int) – batch size for update.

  • multitask_cfg (ConfigType) – config for multitask training.

  • env_name (str) – name of the environment. This is used with the mapping configuration.

  • task_description (Dict[str, str]) – dictionary mapping task names to descriptions.

  • ordered_task_list (List[str]) – ordered list of tasks. This is needed because the task description is not always ordered.

  • mapping_cfg (ConfigType) – config for mapping the tasks to subset of experts.

training: bool
class mtrl.agent.components.moe_layer.EnsembleOfExperts(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Ensemble of all the experts.

Parameters
  • num_tasks (int) – number of tasks.

  • num_experts (int) – number of experts in the mixture of experts.

  • num_eval_episodes (int) – number of episodes run during evaluation.

  • batch_size (int) – batch size for update.

  • multitask_cfg (ConfigType) – config for multitask training.

training: bool
class mtrl.agent.components.moe_layer.FeedForward(num_experts: int, in_features: int, out_features: int, num_layers: int, hidden_features: int, bias: bool = True)[source]

Bases: torch.nn.modules.module.Module

A feedforward model of mixture of experts layers.

Parameters
  • num_experts (int) – number of experts in the mixture.

  • in_features (int) – size of each input sample for one expert.

  • out_features (int) – size of each output sample for one expert.

  • num_layers (int) – number of layers in the feedforward network.

  • hidden_features (int) – dimensionality of hidden layer in the feedforward network.

  • bias (bool, optional) – if set to False, the layer will not learn an additive bias. Defaults to True.

forward(x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mtrl.agent.components.moe_layer.Linear(num_experts: int, in_features: int, out_features: int, bias: bool = True)[source]

Bases: torch.nn.modules.module.Module

torch.nn.Linear layer extended for use as a mixture of experts.

Parameters
  • num_experts (int) – number of experts in the mixture.

  • in_features (int) – size of each input sample for one expert.

  • out_features (int) – size of each output sample for one expert.

  • bias (bool, optional) – if set to False, the layer will not learn an additive bias. Defaults to True.

extra_repr() → str[source]

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mtrl.agent.components.moe_layer.MaskCache(num_tasks: int, num_eval_episodes: int, batch_size: int, task_index_to_mask: torch.Tensor)[source]

Bases: object

In multitask learning, using a mixture of models, different tasks

can be mapped to different combination of models. This utility class caches these mappings so that they do not have to be revaluated.

For example, when the model is training over 10 tasks, and the tasks are always ordered, the mapping of task index to encoder indices will be the same and need not be recomputed. We take a very simple approach here: cache using the number of tasks, since in our case, the task ordering during training and evaluation does not change. In more complex cases, a mode (train/eval..) based key could be used.

This gets a little trickier during evaluation. We assume that we are running multiple evaluation episodes (per task) at once. So during evaluation, the agent is inferring over num_tasks*num_eval_episodes at once.

We have to be careful about not caching the mapping during update because neither the task distribution, nor the task ordering, is pre-determined during update. So we explicitly exclude the batch_size from the list of keys being cached.

Parameters
  • num_tasks (int) – number of tasks.

  • num_eval_episodes (int) – number of episodes run during evaluation.

  • batch_size (int) – batch size for update.

  • task_index_to_mask (TensorType) – mapping of task index to mask.

get_mask(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]

Get the mask corresponding to a given task info.

Parameters

task_info (TaskInfo) –

Returns

encoder mask.

Return type

TensorType

class mtrl.agent.components.moe_layer.MixtureOfExperts(multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: torch.nn.modules.module.Module

Class for interfacing with a mixture of experts.

Parameters

multitask_cfg (ConfigType) – config for multitask training.

forward(task_info: mtrl.agent.ds.task_info.TaskInfo) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mtrl.agent.components.moe_layer.OneToOneExperts(num_tasks: int, num_experts: int, num_eval_episodes: int, batch_size: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.moe_layer.MixtureOfExperts

Map the output of ith expert with the ith task.

Parameters
  • num_tasks (int) – number of tasks.

  • num_experts (int) – number of experts in the mixture of experts.

  • num_eval_episodes (int) – number of episodes run during evaluation.

  • batch_size (int) – batch size for update.

  • multitask_cfg (ConfigType) – config for multitask training.

mask_cache: mtrl.agent.components.moe_layer.MaskCache
training: bool

mtrl.agent.components.reward_decoder module

Reward decoder component for the agent.

class mtrl.agent.components.reward_decoder.RewardDecoder(feature_dim: int)[source]

Bases: mtrl.agent.components.base.Component

Predict reward using the observations.

Parameters

feature_dim (int) – dimension of the feature used to predict the reward.

forward(x: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

training: bool

mtrl.agent.components.scripted_soft_modularization module

mtrl.agent.components.soft_modularization module

Implementation of the soft routing network and MLP described in “Multi-Task Reinforcement Learning with Soft Modularization” Link: https://arxiv.org/abs/2003.13661

class mtrl.agent.components.soft_modularization.RoutingNetwork(in_features: int, hidden_features: int, num_experts_per_layer: int, num_layers: int)[source]

Bases: mtrl.agent.components.base.Component

Class to implement the routing network in ‘Multi-Task Reinforcement Learning with Soft Modularization’ paper.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool
class mtrl.agent.components.soft_modularization.SoftModularizedMLP(num_experts: int, in_features: int, out_features: int, num_layers: int, hidden_features: int, bias: bool = True)[source]

Bases: mtrl.agent.components.base.Component

Class to implement the actor/critic in ‘Multi-Task Reinforcement Learning with Soft Modularization’ paper. It is similar to layers.FeedForward but allows selection of expert at each layer.

forward(mtobs: mtrl.agent.ds.mt_obs.MTObs) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

mtrl.agent.components.task_encoder module

Component to encode the task.

class mtrl.agent.components.task_encoder.TaskEncoder(pretrained_embedding_cfg: omegaconf.dictconfig.DictConfig, num_embeddings: int, embedding_dim: int, hidden_dim: int, num_layers: int, output_dim: int)[source]

Bases: mtrl.agent.components.base.Component

Encode the task into a vector.

Parameters
  • pretrained_embedding_cfg (ConfigType) – config for using pretrained embeddings.

  • num_embeddings (int) – number of elements in the embedding table. This is used if pretrained embedding is not used.

  • embedding_dim (int) – dimension for the embedding. This is used if pretrained embedding is not used.

  • hidden_dim (int) – dimension of the hidden layer of the trunk.

  • num_layers (int) – number of layers in the trunk.

  • output_dim (int) – output dimension of the task encoder.

forward(env_index: torch.Tensor) → torch.Tensor[source]

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

training: bool

mtrl.agent.components.transition_model module

Transition dynamics for the agent.

class mtrl.agent.components.transition_model.DeterministicTransitionModel(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.transition_model.TransitionModel

Determinisitc model for predicting the transition dynamics.

Parameters
  • encoder_feature_dim (int) – size of the input feature.

  • action_shape (List[int]) – size of the action vector.

  • layer_width (int) – width for each layer.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

forward(x: torch.Tensor) → Tuple[torch.Tensor, Optional[torch.Tensor]][source]
Return the mean and standard deviation of the

gaussian distribution that the model predicts for the next state.

Parameters

x (TensorType) – input.

Returns

[mean of gaussian distribution, sigma of gaussian distribution]

Return type

Tuple[TensorType, TensorType]

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

sample_prediction(x: torch.Tensor) → torch.Tensor[source]

Sample a possible value of next state from the model.

Parameters

x (TensorType) – input.

Returns

predicted next state.

Return type

TensorType

training: bool
class mtrl.agent.components.transition_model.ProbabilisticTransitionModel(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig, max_sigma: float = 10.0, min_sigma: float = 0.0001)[source]

Bases: mtrl.agent.components.transition_model.TransitionModel

Probabilistic model for predicting the transition dynamics.

Parameters
  • encoder_feature_dim (int) – size of the input feature.

  • action_shape (List[int]) – size of the action vector.

  • layer_width (int) – width for each layer.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

  • max_sigma (float, optional) – maximum value of sigma (of the learned gaussian distribution). Larger values are clipped to this value. Defaults to 1e1.

  • min_sigma (float, optional) – minimum value of sigma (of the learned gaussian distribution). Smaller values are clipped to this value. Defaults to 1e-4.

forward(x)[source]
Return the mean and standard deviation of the

gaussian distribution that the model predicts for the next state.

Parameters

x (TensorType) – input.

Returns

[mean of gaussian distribution, sigma of gaussian distribution]

Return type

Tuple[TensorType, TensorType]

get_last_shared_layers() → List[torch.nn.modules.module.Module][source]

Get the list of last layers (for different sub-components) that are shared across tasks.

This method should be implemented by the subclasses if the component is to be trained with gradnorm algorithm.

Returns

list of layers.

Return type

List[ModelType]

sample_prediction(x)[source]

Sample a possible value of next state from the model.

Parameters

x (TensorType) – input.

Returns

predicted next state.

Return type

TensorType

training: bool
class mtrl.agent.components.transition_model.TransitionModel(encoder_feature_dim: int, action_shape: List[int], layer_width: int, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Bases: mtrl.agent.components.base.Component

Model for predicting the transition dynamics.

Parameters
  • encoder_feature_dim (int) – size of the input feature.

  • action_shape (List[int]) – size of the action vector.

  • layer_width (int) – width for each layer.

  • multitask_cfg (ConfigType) – config for encoding the multitask knowledge.

forward(x: torch.Tensor) → Tuple[torch.Tensor, torch.Tensor][source]
Return the mean and standard deviation of the

gaussian distribution that the model predicts for the next state.

Parameters

x (TensorType) – input.

Returns

[mean of gaussian distribution, sigma of gaussian distribution]

Return type

Tuple[TensorType, TensorType]

sample_prediction(x: torch.Tensor) → torch.Tensor[source]

Sample a possible value of next state from the model.

Parameters

x (TensorType) – input.

Returns

predicted next state.

Return type

TensorType

training: bool
mtrl.agent.components.transition_model.make_transition_model(action_shape: List[int], transition_cfg: omegaconf.dictconfig.DictConfig, multitask_cfg: omegaconf.dictconfig.DictConfig)[source]

Module contents