Did you check the docs?
Is your feature request related to a problem? Please describe.
The model-based jailbreak detection rail depends on a trained Random Forest classifier (snowflake.pkl) that is not included in the repository and does not seem to have public download or training instructions. This makes the rail effectively unavailable to open-source users.
Describe the solution you'd like
If this rail is to be kept as it is, perhaps:
- shipping a pre-trained classifier in the repo, or providing a training script and dataset reference so the community can reproduce it
- or explicitly documenting this
Describe alternatives you've considered
Re-architecting the rails so that it does not require a separate embedding model + classifier could also be considered (I'd be happy to take this on as it might actually align with the hf classifier rail that was suggested by the RH team)
cc @Pouyanpi
Additional context
No response
Did you check the docs?
Is your feature request related to a problem? Please describe.
The model-based jailbreak detection rail depends on a trained Random Forest classifier (snowflake.pkl) that is not included in the repository and does not seem to have public download or training instructions. This makes the rail effectively unavailable to open-source users.
Reference 1
Reference 2
Describe the solution you'd like
If this rail is to be kept as it is, perhaps:
Describe alternatives you've considered
Re-architecting the rails so that it does not require a separate embedding model + classifier could also be considered (I'd be happy to take this on as it might actually align with the hf classifier rail that was suggested by the RH team)
cc @Pouyanpi
Additional context
No response