Additional Interfaces¶
See also
- More Optimization Interfaces
User guide page on the topic.
The interfaces documented here are less commonly used, but may still be useful
in certain circumstances. They are all subclasses of Env
. Unlike
the Core Classes of This Package, they expect you to subclass them, not just to define the
same methods as them.
>>> from cernml import coi
...
>>> class MySeparable(coi.SeparableEnv):
... def compute_observation(self, action, info):
... print(f"compute_observation({action!r}, {info!r})")
... return "obs"
...
... def compute_reward(self, obs, goal, info):
... print(f"compute_reward({obs!r}, {goal!r}, {info!r})")
... return 0.0
...
... def compute_terminated(self, obs, reward, info):
... print(f"compute_terminated({obs!r}, {reward!r}, {info!r})")
... return True
...
... def compute_truncated(self, obs, reward, info):
... print(f"compute_truncated({obs!r}, {reward!r}, {info!r})")
... return False
...
>>> env = MySeparable()
>>> env.step("action")
compute_observation('action', {})
compute_reward('obs', None, {})
compute_terminated('obs', 0.0, {'reward': 0.0})
compute_truncated('obs', 0.0, {'reward': 0.0})
('obs', 0.0, True, False, {'reward': 0.0})
- class cernml.coi.GoalEnv¶
Bases:
Env
[Any
,ActType
],Generic
[ObsType
,GoalType
,ActType
]This is a vendored copy of
gymnasium_robotics.core.GoalEnv
. It is only used if the gymnasium-robotics package is not installed. If it is installed, this is automatically an alias to the original class.
- class cernml.coi.SeparableEnv¶
-
An environment whose calculations nicely separate.
This interface is superficially similar to
GoalEnv
, but doesn’t pose any requirements to the observation space. (By contrast,GoalEnv
requires that the observation space is a dict with keys"observation"
,"desired_goal"
and"achieved_goal"
.) The only requirement is that the calculation of observation, reward and end-of-episode can be separated into distinct steps.This makes two things possible:
replacing
compute_observation()
with a function approximator, e.g. a neural network;estimating the goodness of the very initial observation of an episode via
env.compute_reward(env.reset(), None, {})
.
Because of these use cases, all state transition should be restricted to
compute_observation()
. In particular, it must be possible to callcompute_reward()
,compute_terminated()
andcompute_truncated()
multiple times without changing the internal state of the environment.- step(
- action: ActType,
Implementation of
gymnasium.Env.step()
.This calls in turn the four new abstract methods:
compute_observation()
,compute_reward()
,compute_terminated()
andcompute_truncated()
.
- abstractmethod compute_observation( ) ObsType ¶
Apply the given action and return the next observation.
This should encapsulate all state transitions of the environment. This means that after any call to
compute_observation()
, the other two compute methods can be called as often as desired and always give the same results, given then the same arguments.
- abstractmethod compute_reward( ) SupportsFloat ¶
Calculate the reward for the given observation and current state.
This externalizes the reward function. In this regard, it is similar to
compute_reward()
, but it doesn’t impose any structure on the observation space.Note that this function should be free of side-effects or modifications of self. In particular, the user is allowed to do multiple calls to
env.compute_reward(obs, None, {})
and always expect the same result.- Parameters:
obs – The observation calculated by
reset()
orcompute_observation()
.goal – A dummy parameter to stay compatible with the
GoalEnv
API. This parameter generally is None. If you want a multi-goal environment, considerSeparableGoalEnv
.info – an info dictionary with additional information. It may or may not have been passed to
compute_observation()
before.
- Returns:
The reward that corresponds to the given observation. This value is returned by
step()
.
- abstractmethod compute_terminated( ) bool ¶
Compute whether the episode ends in this step.
This externalizes the decision whether the agent has reached the terminal state of the environment (e.g. winning or losing a game). This function should be free of side-effects or modifications of self. In particular, it must be possible to call
env.compute_terminated(obs, reward, {})
multiple times and always get the same result.If you want to indicate that the episode has ended in a success, consider setting
info["success"] = True
.- Parameters:
obs – The observation calculated by
reset()
orcompute_observation()
.reward – The return value of
compute_reward()
.info – an info dictionary with additional information. It may or may not have been passed to
compute_reward()
before. Thestep()
method adds a key"reward"
that contains the result ofcompute_reward()
.
- Returns:
True if the episode has reached a terminal state, False otherwise.
- abstractmethod compute_truncated( ) bool ¶
Compute whether the episode ends in this step.
This externalizes the decision whether a condition outside of the environment has ended the episode (e.g. a time limit). This function should be free of side-effects or modifications of self. In particular, it must be possible to call
env.compute_truncated(obs, reward, {})
multiple times and always get the same result.- Parameters:
obs – The observation calculated by
reset()
orcompute_observation()
.reward – The return value of
compute_reward()
.info – an info dictionary with additional information. It may or may not have been passed to
compute_reward()
before. Thestep()
method adds a key"reward"
that contains the result ofcompute_reward()
.
- Returns:
True if the episode has been terminated by outside forces, False otherwise.
- class cernml.coi.SeparableGoalEnv¶
Bases:
GoalEnv
,Generic
[ObsType
,GoalType
,ActType
]A multi-goal environment whose calculations nicely separate.
This interface is superficially similar to
GoalEnv
, but additionally also splits out the calculation of the observation and the end-of-episode flag. This class differs fromSeparableEnv
in the meaning of the parameters that are passed tocompute_reward()
.The split introduced by this class makes two things possible:
replacing
compute_observation()
with a function approximator, e.g. a neural network;estimating the goodness of the very initial observation of an episode via
compute_reward()
.
Because of these use cases, all state transition should be restricted to
compute_observation()
. In particular, it must be possible to callcompute_reward()
,compute_terminated()
, andcompute_truncated()
multiple times without changing the internal state of the environment.- step(
- action: ActType,
Implementation of
gymnasium.Env.step()
.This calls in turn the four new abstract methods:
compute_observation()
,compute_reward()
,compute_terminated()
, andcompute_truncated()
.
- abstractmethod compute_observation( ) GoalObs[ObsType, GoalType] ¶
Compute the next observation if action is taken.
This should encapsulate all state transitions of the environment. This means that after any call to
compute_observation()
, the other two compute methods can be called as often as desired and always give the same results, given then the same arguments.
- cernml.coi.GoalType: TypeVar¶
The generic type variable for the achieved_goal and desired_goal of
GoalEnv
. This is exported for the user’s convenience.