The following article was posted on the Google Cloud Devops-SRE blog and provides advice for SRE teams around choosing their SLOs appropriately. I especially appreciate two components of this blog post. First, the emphasis on visibility into your system when assessing appropriate SLOs, and second, the use of visibility and risk analysis in the process.
Have you been responsible for identifying SLOs for your systems? Did you use a systemic analysis of risk as recommended by this article? Did you consider the impact on visibility into system performance if the services you use for monitoring and alerting fail?
https://cloud.google.com/blog/products/devops-sre/how-sres-analyze-risks-to-evaluate-slos