nav2_ros_common: add lifecycle-managed Subscription wrapper#5834
nav2_ros_common: add lifecycle-managed Subscription wrapper#5834Lotusymt wants to merge 26 commits intoros-navigation:mainfrom
Conversation
Thanks for checking! This was just a draft attempt, and I haven’t addressed the CI issues yet. I’ll follow up after discussing the proposed approach for this issue. |
4ae2015 to
1be146b
Compare
|
I think this is ready to be applied to all subscriptions now! I still have to verify that below is correct, but otherwise LGTM. Can you clarify where you found this + the |
|
Hi @SteveMacenski, this is following ROS 2 core (rclcpp) behavior/pattern:
|
|
OK sounds good! I think the remaining bit is to update for all the servers :-) |
|
@Lotusymt any update? I would love to get this moving |
@SteveMacenski sorry for the slow update, I misinterpreted “apply to all servers” as “move on to the remaining server interfaces. I believe the “apply everywhere” step is already done, as remaining |
|
Don't we need to activate all the subscriptions in |
Yes, for now they(all interfaces) are activated/deactivated at call sites. Per the #5298 issue description:
So my understanding is:
|
|
Hi @SteveMacenski, just wanted to gently follow up here in case this got buried. Please let me know if I missed anything. |
|
Sorry, I've been really slammed and haven't been able to clear all the github comments I need to respond to each day. Sorry that this was one that was temporarily deprioritized. I was traveling and this one was hard to review on my small laptop screen. My bad that it was just Q&A, I thought it was going to be another full review of many files :-) Ooops.
Eventually, to start, lets manually do it like we do for publishers and action servers. The ordering here is really important for bringup stability or segfaults, so I want that to be thoughtfully done
I would want to change to use the auto activate part once we have pub/sub/service/service clients/action clients all (maybe action server; maybe not yet since that's a special child). |
|
Hi Steve , just want to let you know I am actively working on it and here's a little update. I manage to pass all tests expect for some in nav2_system_test. Seems like some activation deadlock issue? Hopefully I can solve it soon : ) |
|
Ready for me now? :-) |
| void on_activate() override | ||
| { | ||
| rclcpp_lifecycle::SimpleManagedEntity::on_activate(); | ||
| } | ||
|
|
||
| void on_deactivate() override | ||
| { | ||
| rclcpp_lifecycle::SimpleManagedEntity::on_deactivate(); | ||
| } |
There was a problem hiding this comment.
These don't need to be overrided if just calling the base class, no? :-)
Not yet, unfortunately qwq. I ran into a few more issues while testing this. For now I’m trying a workaround: for any Transient Local topic, I avoid having the subscription exist in an “inactive” state at all. Concretely, I activate the managed-entity wrapper right before creating the underlying rclcpp::Subscription, so the subscription is effectively “active” for its entire lifespan and we don’t risk losing the latched/transient sample while the wrapper is inactive. My reasoning:
Even with this change, I’m still seeing failures in some system tests. That makes me suspect there may be other subscriptions that also need to be processed earlier, but I’m still tracking down exactly which ones and why. Does this workaround sound reasonable to you? Any suggestions would be really appreciated!! |
|
@fujitatomoya a question for you based on the last comment - you don't need to read the whole thread: Is there a way for the subscription callback to reject a message for a later delivery or create a subscriber object on the network but not add it to be serviced by the executor yet? We're trying to implement lifecycle versions of the subscriptions/services/clients since they don't exist in rclcpp_lifecycle and we're having a problem with how to indicate "No! I'm not active, don't even try it" for services and subscriptions. Services will just happily accept a response of a default constructed result if we exit the callback when the node is not active, and we have problems with subscriptions that are transient local since the delivery will happen only once. Streaming data being dropped (like a sensor) is fine, but this is a case where its not fine. We have long been able to do this with our Simple Action Server since there's an explicit "reject goal" API for actions so the client knows that its request was thrown out. I don't think that is sensible for services/subscriptions, but I feel like there should be a way to construct them and connect them on bringup without servicing them until active. This seems like a gap in the available API that after some digging I couldn't see how to address. But you're more in these details so I'm curious about your thoughts :-) As always, I don't really want to maintain my own version of things that have general purpose; happy to have these all donated to rclcpp_lifecycle (and simple action server to rclcpp_actions) :-) |
|
@Lotusymt all things should be processed in active state, with the sole exception of TF which is required to be running beforehand so we can check TF transformations as existing as part of the activation phase.
That is strange and should generally not be true. The nav2 lifecycle manager should bring all nodes into configure before all nodes into active. So by the time something is being activated, they're all configured but should not be processing data (except those already activated beforehand). There should not be dependency on anything processing messages in configure -- and if there are that's a bug we need to fix (again with the exception of TF). Could be ordering of bringup -- but also could be some interaction we should address. Do you know what node(s) or what topic(s) are the offenders? What are the failures? |
|
1st thing's 1st to answer your question.
AFAIK, no... unfortunately, there is no such things yet... probably something you guys need is something like ros2/rclcpp#2715? IMO LifecycleEntity makes sense for some use cases, there is a few primary states that we need to consider by design.
i think that this is doable... but i am not sure if DDS or zenoh have these managed states internally to be mapped to lifecycle entities of ROS 2. even if they dont, maybe we can land somewhere on "skip to take out the data in inactive state". i may be missing some things here, we obviously need to take some time to consider more details and design before implementation. |
|
Thanks for the correction! I was initially suspicious that AMCL’s map_sub can’t be handled in the node's on_activate. But I looked into it again, it should be fine. Maybe I missed something, and the timeout error happened when I added buffering logic for transient-local in the subscription wrapper at that time. I’m currently not able to reliably reproduce the exact same timeout anymore. So I think the next step would be to try again when
is available. |
This suffers from the same issues, no? :( It creates the subscription on creation -- so if you send a message when the node is in the inactive state, it'll be received by the callback (even though its rejected for processing, if its transient local it will not be reattempted later resulting in meaningful loss). Though @Lotusymt note the use of I think you pointed out the exact thing we need in This doesn't just impact subscriptions, but also services. Clients and publishers are easy because they can just come up and then just check a flag that if someone asks them to do something to simply say "no". subscriptions and services need something to say "no" before we try to execute a callback. I think as such there should be something to create an interface, have it discoverable, but not serviceable until said its ready.
Either of these (and I imagine more) would work. |
|
IMO (i believe everybody agrees...) this feature should be implemented in the core base classes like |
|
OK - for now @Lotusymt I think we should create it on the active state to get past the issue (even if not ideal) for now. Do you also mind starting a thread in discourse with this? I could but I want you to get credit for uncovering this gap :-) |
|
Thanks Steve! really appreciate you encouraging me to start the Discourse thread. I submitted the topic in Discourse, but it’s currently pending moderator approval. I’ll post the link here as soon as it’s visible. If there’s anything you’d like me to adjust, please let me know. On the PR side: I also switched the workaround to the |
|
Great - thanks! |
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu> Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt luiseyang36@gmail.com Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Removed unnecessary CMake module path and include statement. Signed-off-by: Mengting Yang <87471734+Lotusymt@users.noreply.github.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> Signed-off-by: Mengting Yang <87471734+Lotusymt@users.noreply.github.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> Signed-off-by: Mengting Yang <87471734+Lotusymt@users.noreply.github.com>
Co-authored-by: Steve Macenski <stevenmacenski@gmail.com> Signed-off-by: Mengting Yang <87471734+Lotusymt@users.noreply.github.com>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
Signed-off-by: lotusymt <mengtiy5@uci.edu>
2dca3e7 to
e0a8b96
Compare
Signed-off-by: lotusymt <mengtiy5@uci.edu>
The link is here: https://discourse.openrobotics.org/t/design-discussion-discoverable-not-serviceable-ros-interfaces-for-lifecycle-nodes/52842 |
|
LMK when you want me to review / test :-) |
Signed-off-by: lotusymt <mengtiy5@uci.edu>
| } catch (const std::runtime_error & e) { | ||
| RCLCPP_ERROR(get_logger(), "%s", e.what()); | ||
| on_cleanup(state); | ||
| on_shutdown(state); |
There was a problem hiding this comment.
test failed without it and line 66 above
|
Hi Steve, I think it is ready for a review, although codecov didn't pass. I will add tests if the current direction is ok! |
SteveMacenski
left a comment
There was a problem hiding this comment.
I did a brief look through before going camping for the weekend!
| * Gates message delivery on activation state (on_activate/on_deactivate), so it can be | ||
| * used like nav2::Subscription when using point_cloud_transport for transport hints. | ||
| */ | ||
| class LifecyclePointCloudSubscription : public rclcpp_lifecycle::SimpleManagedEntity |
| if (qos.durability() == rclcpp::DurabilityPolicy::TransientLocal) { | ||
| rclcpp_lifecycle::SimpleManagedEntity::on_activate(); | ||
| } | ||
|
|
There was a problem hiding this comment.
I thought we talked about actually creating the subscription objcet on activation instead for now? I don't think we should be doing something like this, its a hack and we should not be activating anything preemptively.
There was a problem hiding this comment.
I'm also not sure about why this is using the subscription factory now (?)
There was a problem hiding this comment.
or - maybe you can explain this more to me if this somehow gets around all the issues we had!
There was a problem hiding this comment.
Yeah I misunderstood. Having a second thought, I think creating subs in activation is more consistent with our intent to make everything start working only after activation.
One thing I want to make sure before proceeding: Should I create every sub in activation as a work around for now, or only transient local subs?
And subcriptionfactory is used is because we need a custom subscription type that overrides the handle_ methods* (and thus can’t be the default rclcpp::Subscription)
There was a problem hiding this comment.
Should I create every sub in activation as a work around for now, or only transient local subs?
I'd say all. Sorry for the delay, I was traveling
There was a problem hiding this comment.
Got it! Thanks for the reply. I am a bit busy lately, but I should be able to address this in a few days!
There was a problem hiding this comment.
I appreciate your time! I know this has been ongoing to for a long time -- your time here is valuable and thanks!
There was a problem hiding this comment.
Hello Steve. I am really sorry for the delay, I have done this modification but new test failures occurred. I am trying to fix those failures in a more automatic manner, but it didn't go very smoothly. I still want to explore this approach in the next few days, but I would aim to solve it by next Wednesday even if this doesn't work out.
I hope this plan sound ok to you : )
There was a problem hiding this comment.
Please revert changes to these geojson files
|
This pull request is in conflict. Could you fix it @Lotusymt? |
Basic Info
Description of contribution in a few bullet points
nav2::Subscriptionwrapper (similar tonav2::Publisher) usingrclcpp_lifecycle::SimpleManagedEntity, so subscriptions can participate in lifecycle transitions.is_activated()and emit a one-time warning when messages are received before activation (messages are dropped until activated).rclcpp::Node(no lifecycle manager to trigger activation), the wrapper auto-activates so behavior matches existing non-lifecycle usage.test_actionsslightly to make the callback type/signature match the new subscription wrapper expectations (no new tests added).Description of documentation updates required from your changes
Description of how this change was tested
nav2_ros_commonand ran its test suite viacolcon test(and verified results withcolcon test-result).Future work that may be required in bullet points
LifecycleNode::activateInterfaces()ordering work (exporters before consumers; deactivate in reverse), and then remove the manual activation boilerplate in task servers.nav2::Publisher).For Maintainers:
backport-*.