rlskyjo.models package


rlskyjo.models.action_mask_model module

class rlskyjo.models.action_mask_model.TorchActionMaskModel(obs_space, action_space, num_outputs, model_config, name, **kwargs)

Bases: ray.rllib.models.torch.torch_modelv2.TorchModelV2, torch.nn.modules.module.Module

PyTorch version

Model that handles simple discrete action masking. This assumes the outputs are logits for a single Categorical action dist. Getting this to work with a more complex output (e.g., if the action space is a tuple of several distributions) is also possible but left as an exercise to the reader.

forward(input_dict, state, seq_lens)

Call the model with the given input tensors and state.

Any complex observations (dicts, tuples, etc.) will be unpacked by __call__ before being passed to forward(). To access the flattened observation tensor, refer to input_dict[“obs_flat”].

This method can be called any number of times. In eager execution, each call to forward() will eagerly evaluate the model. In symbolic execution, each call to forward creates a computation graph that operates over the variables of this model (i.e., shares weights).

Custom models should override this instead of __call__.

input_dict (dict): dictionary of input tensors, including “obs”,

“obs_flat”, “prev_action”, “prev_reward”, “is_training”, “eps_id”, “agent_id”, “infos”, and “t”.

state (list): list of state tensors with sizes matching those

returned by get_initial_state + the batch dimension

seq_lens (Tensor): 1d tensor holding input sequence lengths

(outputs, state): The model output tensor of size

[BATCH, num_outputs], and the new RNN state.

>>> def forward(self, input_dict, state, seq_lens):
>>>     model_out, self._value_out = self.base_model(
...         input_dict["obs"])
>>>     return model_out, state
training: bool

Returns the value function output for the most recent forward pass.

Note that a forward call has to be performed first, before this methods can return anything and thus that calling this method does not cause an extra forward pass through the network.


value estimate tensor of shape [BATCH].

rlskyjo.models.random_admissible_policy module

rlskyjo.models.random_admissible_policy.policy_ra(observation: numpy.array, action_mask: numpy.array, rng: Union[None, numpy.random._generator.Generator] = None) int

for demonstration. picks randomly an admissible action from the action mask


observation (np.array): [description] action_mask (np.array): [description] rng (Union[None, np.random.Generator], optional): [description]. Defaults to None.


int: [description]

rlskyjo.models.train_model_simple_rllib module

rlskyjo.models.train_model_simple_rllib.load_ray(path, ppo_config)

Load a trained RLlib agent from the specified path. Call this before testing a trained agent. :param path:

Path pointing to the agent’s saved checkpoint (only used for RLlib agents)


ppo_config – dict config


train trainer and sample

rlskyjo.models.train_model_simple_rllib.prepare_train() Tuple[ray.rllib.agents.trainer_template.PPO, ray.rllib.env.wrappers.pettingzoo_env.PettingZooEnv]
rlskyjo.models.train_model_simple_rllib.sample_trainer(trainer, env)
rlskyjo.models.train_model_simple_rllib.train(trainer, max_steps=2000000.0)
rlskyjo.models.train_model_simple_rllib.train_ray(ppo_config, timesteps_total: int = 10)

train trainer and sample

Module contents