A service-level agreement (SLA) is a promise made to a user of a service, to indicate that the availability and reliability of the service should meet a certain level of expectation. SLAs act as a pact between the software provider and the software user or client. SLAs may also include responsiveness to incidents and bugs. It depends on the contract. However, if an SLA is broken, then some penalty may incurred, such as a refund or a service subscription credit.
SLAs are an integral part of an IT vendor contract. An SLA pulls together information on all of the contracted services and their agreed-upon expected reliability into a single document. They clearly state metrics, responsibilities, and expectations so that in the event of issues with the service, neither party can plead ignorance. It ensures both sides have the same understanding of requirements. Any significant contract without an associated SLA (reviewed by legal counsel) is open to deliberate or inadvertent misinterpretation. The SLA protects both parties in the agreement.
The types of SLA metrics required will depend on the services that is provided. Many items can be monitor as part of an SLA, but the scheme should be keep as simple as possible to avoid confusion and excessive cost on either side. The availability service should not be much better than the SLO, the availability SLO in the SLA is normally a looser objective than the internal availability SLO. That might expressed in availability numbers: for instance, an availability SLO of 99.9% over one month, with an internal availability SLO of 99.95%. Alternatively, the SLA might only specify a subset of the metrics that make up the internal SLO. The goal should be an equitable incorporation of best practices and requirements that maintain service performance and avoid additional costs.
This chapter from the book "Google Cloud for DevOps Engineers" explains how SLAs represent an external agreement with customers about the reliability of a service, what consequences are if agreement is violated, and how SLIs drives SLOs that informs SLAs.
Read more at 👉🏻 Defining SLAs
so SLA includes SLO. And in order to get the numbers signed with the customer I need to have a n internal SLO to watch it. That’s why you wrote “an availability SLO of 99.9% over one month, with an internal availability SLO of 99.95%”.
Let me know if I understand it correctly.
SLI, SLO, SLA!
Service Level Indicator, Service Level Objective and Service Level Agreement!
@ahmedtariq1 ! And as always, when we talk about these terms, I can’t help but remember the great takeaway post from @YasinQuareshy for the very nice event we had with Alexis Moussine-Pouchkine ( @alexismp) - you can find the takeaway post here: Getting Started with Site Reliability Engineering (SRE) - Key Takeaways
Thanks for posting this! 😎
@ilias you get it right. SLA refers to SLOs for internal purposes. If an SLA guarantees a service uptime of 99.99%, the business may set an internal target of 99.995%. In other words, for every one million requests, no more than 50 should fail as consumers will come to expect this level of performance.
Thanks for the clarification