Additional Interfaces

See also

More Optimization Interfaces

User guide page on the topic.

The interfaces documented here are less commonly used, but may still be useful in certain circumstances. They are all subclasses of Env. Unlike the Core Classes of This Package, they expect you to subclass them, not just to define the same methods as them.

>>> from cernml import coi
...
>>> class MySeparable(coi.SeparableEnv):
...     def compute_observation(self, action, info):
...         print(f"compute_observation({action!r}, {info!r})")
...         return "obs"
...
...     def compute_reward(self, obs, goal, info):
...         print(f"compute_reward({obs!r}, {goal!r}, {info!r})")
...         return 0.0
...
...     def compute_terminated(self, obs, reward, info):
...         print(f"compute_terminated({obs!r}, {reward!r}, {info!r})")
...         return True
...
...     def compute_truncated(self, obs, reward, info):
...         print(f"compute_truncated({obs!r}, {reward!r}, {info!r})")
...         return False
...
>>> env = MySeparable()
>>> env.step("action")
compute_observation('action', {})
compute_reward('obs', None, {})
compute_terminated('obs', 0.0, {'reward': 0.0})
compute_truncated('obs', 0.0, {'reward': 0.0})
('obs', 0.0, True, False, {'reward': 0.0})
class cernml.coi.GoalEnv

Bases: Env[Any, ActType], Generic[ObsType, GoalType, ActType]

This is a vendored copy of gymnasium_robotics.core.GoalEnv. It is only used if the gymnasium-robotics package is not installed. If it is installed, this is automatically an alias to the original class.

class cernml.coi.SeparableEnv

Bases: Env[ObsType, ActType]

An environment whose calculations nicely separate.

This interface is superficially similar to GoalEnv, but doesn’t pose any requirements to the observation space. (By contrast, GoalEnv requires that the observation space is a dict with keys "observation", "desired_goal" and "achieved_goal".) The only requirement is that the calculation of observation, reward and end-of-episode can be separated into distinct steps.

This makes two things possible:

  • replacing compute_observation() with a function approximator, e.g. a neural network;

  • estimating the goodness of the very initial observation of an episode via env.compute_reward(env.reset(), None, {}).

Because of these use cases, all state transition should be restricted to compute_observation(). In particular, it must be possible to call compute_reward(), compute_terminated() and compute_truncated() multiple times without changing the internal state of the environment.

step(
action: ActType,
) tuple[ObsType, SupportsFloat, bool, bool, InfoDict]

Implementation of gymnasium.Env.step().

This calls in turn the four new abstract methods: compute_observation(), compute_reward(), compute_terminated() and compute_truncated().

abstractmethod compute_observation(
action: ActType,
info: InfoDict,
) ObsType

Apply the given action and return the next observation.

This should encapsulate all state transitions of the environment. This means that after any call to compute_observation(), the other two compute methods can be called as often as desired and always give the same results, given then the same arguments.

Parameters:
  • action – the action that was passed to step().

  • info – an info dictionary that may be filled with additional information.

Returns:

The next observation to be returned by step().

abstractmethod compute_reward(
obs: ObsType,
goal: None,
info: InfoDict,
) SupportsFloat

Calculate the reward for the given observation and current state.

This externalizes the reward function. In this regard, it is similar to compute_reward(), but it doesn’t impose any structure on the observation space.

Note that this function should be free of side-effects or modifications of self. In particular, the user is allowed to do multiple calls to env.compute_reward(obs, None, {}) and always expect the same result.

Parameters:
  • obs – The observation calculated by reset() or compute_observation().

  • goal – A dummy parameter to stay compatible with the GoalEnv API. This parameter generally is None. If you want a multi-goal environment, consider SeparableGoalEnv.

  • info – an info dictionary with additional information. It may or may not have been passed to compute_observation() before.

Returns:

The reward that corresponds to the given observation. This value is returned by step().

abstractmethod compute_terminated(
obs: ObsType,
reward: float,
info: InfoDict,
) bool

Compute whether the episode ends in this step.

This externalizes the decision whether the agent has reached the terminal state of the environment (e.g. winning or losing a game). This function should be free of side-effects or modifications of self. In particular, it must be possible to call env.compute_terminated(obs, reward, {}) multiple times and always get the same result.

If you want to indicate that the episode has ended in a success, consider setting info["success"] = True.

Parameters:
Returns:

True if the episode has reached a terminal state, False otherwise.

abstractmethod compute_truncated(
obs: ObsType,
reward: float,
info: InfoDict,
) bool

Compute whether the episode ends in this step.

This externalizes the decision whether a condition outside of the environment has ended the episode (e.g. a time limit). This function should be free of side-effects or modifications of self. In particular, it must be possible to call env.compute_truncated(obs, reward, {}) multiple times and always get the same result.

Parameters:
Returns:

True if the episode has been terminated by outside forces, False otherwise.

class cernml.coi.SeparableGoalEnv

Bases: GoalEnv, Generic[ObsType, GoalType, ActType]

A multi-goal environment whose calculations nicely separate.

This interface is superficially similar to GoalEnv, but additionally also splits out the calculation of the observation and the end-of-episode flag. This class differs from SeparableEnv in the meaning of the parameters that are passed to compute_reward().

The split introduced by this class makes two things possible:

Because of these use cases, all state transition should be restricted to compute_observation(). In particular, it must be possible to call compute_reward(), compute_terminated(), and compute_truncated() multiple times without changing the internal state of the environment.

step(
action: ActType,
) tuple[GoalObs, SupportsFloat, bool, bool, InfoDict]

Implementation of gymnasium.Env.step().

This calls in turn the four new abstract methods: compute_observation(), compute_reward(), compute_terminated(), and compute_truncated().

abstractmethod compute_observation(
action: ActType,
info: InfoDict,
) GoalObs[ObsType, GoalType]

Compute the next observation if action is taken.

This should encapsulate all state transitions of the environment. This means that after any call to compute_observation(), the other two compute methods can be called as often as desired and always give the same results, given then the same arguments.

Parameters:
  • action – the action that was passed to step().

  • info – an info dictionary that may be filled with additional information.

Returns:

The next observation to be returned by step().

cernml.coi.GoalType: TypeVar

The generic type variable for the achieved_goal and desired_goal of GoalEnv. This is exported for the user’s convenience.

class cernml.coi.GoalObs

Bases: TypedDict, Generic[ObsType, GoalType]

Type annotation for the observation type of GoalEnv.

observation: ObsType

The actual observation of the environment.

desired_goal: GoalType

The goal that the agent has to achieved.

achieved_goal: GoalType

The goal that the agent has currently achieved instead. The objective of the environments is for this value to be close to desired_goal.