rlskyjo.models package
Submodules
rlskyjo.models.action_mask_model module
- class rlskyjo.models.action_mask_model.TorchActionMaskModel(obs_space, action_space, num_outputs, model_config, name, **kwargs)
Bases:
ray.rllib.models.torch.torch_modelv2.TorchModelV2
,torch.nn.modules.module.Module
PyTorch version
Model that handles simple discrete action masking. This assumes the outputs are logits for a single Categorical action dist. Getting this to work with a more complex output (e.g., if the action space is a tuple of several distributions) is also possible but left as an exercise to the reader.
- forward(input_dict, state, seq_lens)
Call the model with the given input tensors and state.
Any complex observations (dicts, tuples, etc.) will be unpacked by __call__ before being passed to forward(). To access the flattened observation tensor, refer to input_dict[“obs_flat”].
This method can be called any number of times. In eager execution, each call to forward() will eagerly evaluate the model. In symbolic execution, each call to forward creates a computation graph that operates over the variables of this model (i.e., shares weights).
Custom models should override this instead of __call__.
- Args:
- input_dict (dict): dictionary of input tensors, including “obs”,
“obs_flat”, “prev_action”, “prev_reward”, “is_training”, “eps_id”, “agent_id”, “infos”, and “t”.
- state (list): list of state tensors with sizes matching those
returned by get_initial_state + the batch dimension
seq_lens (Tensor): 1d tensor holding input sequence lengths
- Returns:
- (outputs, state): The model output tensor of size
[BATCH, num_outputs], and the new RNN state.
- Examples:
>>> def forward(self, input_dict, state, seq_lens): >>> model_out, self._value_out = self.base_model( ... input_dict["obs"]) >>> return model_out, state
- training: bool
- value_function()
Returns the value function output for the most recent forward pass.
Note that a forward call has to be performed first, before this methods can return anything and thus that calling this method does not cause an extra forward pass through the network.
- Returns:
value estimate tensor of shape [BATCH].
rlskyjo.models.random_admissible_policy module
- rlskyjo.models.random_admissible_policy.policy_ra(observation: numpy.array, action_mask: numpy.array, rng: Union[None, numpy.random._generator.Generator] = None) int
for demonstration. picks randomly an admissible action from the action mask
- Args:
observation (np.array): [description] action_mask (np.array): [description] rng (Union[None, np.random.Generator], optional): [description]. Defaults to None.
- Returns:
int: [description]
rlskyjo.models.train_model_simple_rllib module
- rlskyjo.models.train_model_simple_rllib.load_ray(path, ppo_config)
Load a trained RLlib agent from the specified path. Call this before testing a trained agent. :param path:
Path pointing to the agent’s saved checkpoint (only used for RLlib agents)
- Parameters
ppo_config – dict config
- rlskyjo.models.train_model_simple_rllib.manual_training_loop(timesteps_total=10000)
train trainer and sample
- rlskyjo.models.train_model_simple_rllib.prepare_train() Tuple[ray.rllib.agents.trainer_template.PPO, ray.rllib.env.wrappers.pettingzoo_env.PettingZooEnv]
- rlskyjo.models.train_model_simple_rllib.sample_trainer(trainer, env)
- rlskyjo.models.train_model_simple_rllib.train(trainer, max_steps=2000000.0)
- rlskyjo.models.train_model_simple_rllib.train_ray(ppo_config, timesteps_total: int = 10)
- rlskyjo.models.train_model_simple_rllib.tune_training_loop(timesteps_total=10000)
train trainer and sample