Feature flags controllable by configuration or an "admin tool", are an excellent way to deploy new features. When going live, you can turn on a feature only for your Production test user, and run some smoke tests, before turning the feature on for the wider world.
Any issues found later can then easily be mitigated by "switching off" the broken features for live users, while at the same time, leaving them on for the test users so that some fault investigation can be done on Production.
Providing there are no breaking regression changes, this will help to avoid rollbacks.
Every significant action should be monitored and logged. Obviously this includes calls to external services, but you should also strive to implement logging based on items in the Acceptance criteria. Log results of actions so that you can see if they match expectations. This allows for granular diagnosis so you can see exactly what isn't working and where.
Reduced Functionality Instead of Errors
A good, defensive technique is to fall back to previous functionality when something fails, rather than giving the user an error. Obviously this depends on the scenario and this won't always be possible.
However, combined with functionality switches, this can allow users to continue to use your application while you identify a fault using your production test users. This is also greatly dependent on your logging, of course, as your fault will not be manifesting in the interface.