init_noise_std silently ignored when distribution_type='tanh_normal' (default)

## Summary

When using `make_ppo_networks()` with the default `distribution_type='tanh_normal'`, the `init_noise_std` parameter is accepted without any warning but has **no effect**.

## Root Cause

In `make_policy_network()` (`networks.py:393-399`), the `tanh_normal` branch creates a plain `MLP`:

```python
if distribution_type == 'tanh_normal':
    policy_module = MLP(
        layer_sizes=list(hidden_layer_sizes) + [param_size],
        activation=activation,
        kernel_init=kernel_init,
        layer_norm=layer_norm,
    )
```

`init_noise_std`, `noise_std_type`, and `state_dependent_std` are all accepted by the function signature but **never passed to this branch**. The std is entirely determined by the network's output (second half of `2*action_size`, passed through `softplus` in `NormalTanhDistribution.create_dist()`), with the initial value depending on random weight initialization.

Only the `normal` branch creates `PolicyModuleWithStd`, where `init_noise_std` actually initializes the learnable `LogParam`/`Param`.

## Impact

- Users who set `init_noise_std` while using the default `tanh_normal` get **zero feedback** that the parameter is being ignored
- Any hyperparameter sweep over `init_noise_std` under `tanh_normal` produces identical results — wasted compute
- `learner.py` default flags combine `tanh_normal` with `init_noise_std=1.0` (dead code)
- `train_test.py` tests `tanh_normal` + `init_noise_std=0.8` without verifying it has any effect

We discovered this while training dexterous manipulation policies with MJX. We had been tuning `init_noise_std` under the default `tanh_normal` for some time before realizing it had zero effect.

## Suggested Fix (any of)

1. Raise a warning when `init_noise_std` is explicitly set with `tanh_normal`
2. Document that `init_noise_std` only applies to `distribution_type='normal'`
3. Consider changing the PPO default to `distribution_type='normal'` — most PPO implementations in the community (IsaacLab, rl_games, rsl_rl, CleanRL, Stable-Baselines3) use state-independent std without tanh squashing, consistent with Schulman's original PPO

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init_noise_std silently ignored when distribution_type='tanh_normal' (default) #661

Summary

Root Cause

Impact

Suggested Fix (any of)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

init_noise_std silently ignored when distribution_type='tanh_normal' (default) #661

Description

Summary

Root Cause

Impact

Suggested Fix (any of)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions