rlskyjo.models package

Submodules

rlskyjo.models.action_mask_model module

class rlskyjo.models.action_mask_model.TorchActionMaskModel(obs_space, action_space, num_outputs, model_config, name, **kwargs)

Bases: ray.rllib.models.torch.torch_modelv2.TorchModelV2, torch.nn.modules.module.Module

PyTorch version

Model that handles simple discrete action masking. This assumes the outputs are logits for a single Categorical action dist. Getting this to work with a more complex output (e.g., if the action space is a tuple of several distributions) is also possible but left as an exercise to the reader.

forward(input_dict, state, seq_lens)

Call the model with the given input tensors and state.

Any complex observations (dicts, tuples, etc.) will be unpacked by __call__ before being passed to forward(). To access the flattened observation tensor, refer to input_dict[“obs_flat”].

This method can be called any number of times. In eager execution, each call to forward() will eagerly evaluate the model. In symbolic execution, each call to forward creates a computation graph that operates over the variables of this model (i.e., shares weights).

Custom models should override this instead of __call__.

Args:

input_dict (dict): dictionary of input tensors, including “obs”,: “obs_flat”, “prev_action”, “prev_reward”, “is_training”, “eps_id”, “agent_id”, “infos”, and “t”.
state (list): list of state tensors with sizes matching those: returned by get_initial_state + the batch dimension

seq_lens (Tensor): 1d tensor holding input sequence lengths

Returns:

(outputs, state): The model output tensor of size: [BATCH, num_outputs], and the new RNN state.

Examples:

>>> def forward(self, input_dict, state, seq_lens):
>>>     model_out, self._value_out = self.base_model(
...         input_dict["obs"])
>>>     return model_out, state

training: bool

value_function()

Returns the value function output for the most recent forward pass.

Note that a forward call has to be performed first, before this methods can return anything and thus that calling this method does not cause an extra forward pass through the network.

Returns:: value estimate tensor of shape [BATCH].

rlskyjo.models.random_admissible_policy module

rlskyjo.models.random_admissible_policy.policy_ra(observation: numpy.array, action_mask: numpy.array, rng: Union[None, numpy.random._generator.Generator] = None) → int

for demonstration. picks randomly an admissible action from the action mask

Args:: observation (np.array): [description] action_mask (np.array): [description] rng (Union[None, np.random.Generator], optional): [description]. Defaults to None.
Returns:: int: [description]

rlskyjo.models.train_model_simple_rllib module

rlskyjo.models.train_model_simple_rllib.load_ray(path, ppo_config)

Load a trained RLlib agent from the specified path. Call this before testing a trained agent. :param path:

Path pointing to the agent’s saved checkpoint (only used for RLlib agents)

Parameters: ppo_config – dict config

rlskyjo.models.train_model_simple_rllib.manual_training_loop(timesteps_total=10000): train trainer and sample

rlskyjo.models.train_model_simple_rllib.prepare_train() → Tuple[ray.rllib.agents.trainer_template.PPO, ray.rllib.env.wrappers.pettingzoo_env.PettingZooEnv]

rlskyjo.models.train_model_simple_rllib.sample_trainer(trainer, env)

rlskyjo.models.train_model_simple_rllib.train(trainer, max_steps=2000000.0)

rlskyjo.models.train_model_simple_rllib.train_ray(ppo_config, timesteps_total: int = 10)

rlskyjo.models.train_model_simple_rllib.tune_training_loop(timesteps_total=10000): train trainer and sample

rlskyjo.models package

Submodules

rlskyjo.models.action_mask_model module

rlskyjo.models.random_admissible_policy module

rlskyjo.models.train_model_simple_rllib module

Module contents