DevOps and SRE
Step into the cultural movements that of DevOps and SRE to more quickly deliver reliable and healthy software solutions.
- 42 Topics
- 123 Replies
Let us introduce you to our DevOps and SRE Moderators: Jennifer Bergstrom.Username: jennworks40Bio:Jenn Bergstrom is a multi-cloud solution architect with experience building cloud native solutions to meet complex customer requirements. She has over 15 years of experience in the software industry, with a focus over the last 5 years on Cloud, DevSecOps, and Chaos Engineering. Jenn enjoys helping others to grow in their knowledge and believes that mentoring is an imperative. Her curiosity and passion for learning has led to recognition as a Parsons Fellow for her cloud expertise and to opportunities to present at international summits and conferences. When not working to grow the community of cloud engineers, Jenn enjoys spending time enjoying nature’s beauty with her husband and two daughters, reading, and creating art.Company: ParsonsJob Title: PARSONS X Senior Technical Director Ahmed Tariq. Username: ahmedtariq1Bio:Ahmed Tariq is an ambitious DevOps Engineer with a keen inter
What is CI/CD and how can it be improved in your serverless application? In this video, Developer Advocate Martin Omander chats with Developer Relations Engineer Mia Villaseñor about automated builds, Continuous Delivery vs Continuous Deployment, canary releases, and more. Watch to learn three ways you can improve CI/CD in your serverless app. Click on the video below to watch it in detail. Chapters:0:00 - Intro0:41 - What is Continuous Integration?1:14 - Automatic tests / automatic builds2:36 - Continuous Delivery vs Continuous Deployment?3:40 - How do I set up Continuous Delivery or Deployment for my Cloud Run project?4:30 - What are canary releases?5:50 - How long should I run the canary release?6:13 - How do I set up canary releases for my Cloud Run service?6:28 - Revision-specific URLs6:51 - Wrap up Extra Credit:Tutorial: Deploying to Cloud Run using Cloud Build → https://goo.gle/3H4MWnLGithub Repository: Node.js → https://goo.gle/3weNVNwGo → https://goo.gle/3Qy9hgVPython → http
In DORA’s 2022 State of DevOps Report, they made some observations about the importance of culture in an organization’s ability to successfully implement DevOps practices. The three key components they identified are Work Arrangement Shifts, Employee Churn, and Employee Burnout. DORA observed that in high performing teams, employee churn is low, employees are invested and motivated in the team success, and the company provides flexibility in work arrangements for their teams.The linked blog describes these key components nicely:https://cloud.google.com/blog/products/devops-sre/culture-in-the-2022-state-of-devops-reportWhat have you observed as you’ve worked in successful DevOps organizations? Were these key components present? How about in less successful DevOps organizations? These components resonate with me, and they build on each other. At the root of them all is trust. Does the company trust its employees to try to do what’s right, or do they expect them to deliberately do what’s
GitOps takes DevOps best practices used for application development (such as version control and CI/CD) and applies them to infrastructure automation. In GitOps, the Git repository serves as the source of truth and the CD pipeline is responsible for building, testing, and deploying the application code and the underlying infrastructure. Mete Atamel, Developer Advocate at Google cloud, in his new blog talk about how to set up a Git-driven development, testing, and deployment pipeline for Workflows using Cloud Build.Click in the link below to read it more detail:https://cloud.google.com/blog/topics/developers-practitioners/gitsops-service-orchestration
From Monolith to Microservice - updated DevOps tech: Architecture article in Cloud Architecture Center
It’s always interesting to revisit the documentation provided in the Cloud Architecture Center. This article about DevOps tech: Architecture was just updated yesterday and covers quite a lot of ground in its text.One thing I noted - the Strangler Fig Pattern is the most common monolith to microservice pattern I’ve seen described, and it is focused on in the linked article as well. Big Bang Rewrite is a high risk alternative that is generally viewed as being a bad idea. This article talks about several migration patterns, including Strangler Fig and Big Bang Rewrite as well as a few others. Have you seen other patterns used successfully? What were they?
Hi everyone!After all the excitement that comes with watching the sessions at Next this week, I’m hungry to pursue my next Google Cloud Certification.Who here is certified as a Professional Cloud DevOps Engineer? What was your favorite part of studying and learning for this certification?Who here is looking to get certified? What are you most excited to learn about as you prepare?There are lots of great free resources linked in this post https://cloud.google.com/certification/cloud-devops-engineer that I’ll be using as I study. Who’s studying with me?
Hi Community! It has surprised me, bringing a Workspace add-on to the market that I have had to employ a number of work-arounds to bring good practice to DevOps and Add-On development. Anyone else?I need to use `clasp` and maintain a disconnected git branch to effectively manage versioning, and CI,CD is practically unmanaged. Would be happy to share exactly how we run a Gold and Silver build so that our team can confidently demo code, while others work on sprints… but good to hear what everyone else is doing?
Hi, I am a founder of https://brokee.io. We provide hands-on technical assessments for system engineers, so that’s why I am writing this post in DevOps and SRE topic. While I’d be grateful for feedback on the product (ping me if you’d like to try our tests), coming here with a technical question. I want to automatically evaluate our tests. We provide broken IT infrastructure, engineer/candidate has to fix problems and then we check how the system was fixed. Our infrastructure is built on Kubernetes, but tests can be in the form of an isolated linux machine, Kubernetes environment or Cloud environment. Currently, automated evaluation is done only on endpoints - e.g. we check if some URL is working, DNS is correct, etc., whatever we can reach via network request. I can’t check if some files are present or those were changed. I am wondering if there is some software I am not aware of, I will share some ideas I have in mind: 1. I found https://kubeshop.github.io/testkube/test-types, b
The biggest announcement from Google Cloud Next 2022 from a DevOps perspective is the general availability of the Software Delivery Shield security solution. Check out the blog post about it herehttps://cloud.google.com/blog/products/devops-sre/introducing-software-delivery-shield-from-google-cloudand let us know what you think of this suite of tools designed as an end-to-end solution to protect software from security attacks along the entire software supply chain.What do you think about this toolset?
Hi everyone! I wanted to make sure that everyone here is aware of the Google Cloud Next ‘22 conference happening virtually from October 11 - 13 this year. Registration is free and easy and the sessions should be fantastic. I’ll be dialed in, and I hope to see some folks from this community there as well! You can get more information and register by visiting the Google Cloud Next website https://cloud.withgoogle.com/next/registerYou can view the full catalog of sessions here https://cloud.withgoogle.com/next/catalog#view-all once you’ve registered. There really are sessions for everyone!Check it out and post the session(s) you’re most excited for in this conversation!
Do you want to learn about modern DevOps practices from DORA (DevOps Research and Assessment), the industry's longest running research program?Check the link below if you want to learn about the four key metrics of high performing software teams, lessons you can build on from our 2021 State of DevOps Report, outcomes you could achieve from high quality DevOps, and how to accelerate software delivery performance using Google Cloud tools. 3 ½ hours on demandKeynote: Driving business success with DevOps - 35 minutes Building software delivery pipelines - 25 minutes Running applications at autoscale - 30 minutes Reliable and blameless: Learning from SRE - 30 minutes Automating deployment operations - 30 minutes DevOps awards show - 60 minutes
Have you seen the blog post just published today about Google Cloud Deploy and the enhancements that are now available? If you haven’t, go check it out! Thanks to the work of the Google Cloud Deploy team, continuous software deployment to GKE is getting easier to manage.Google Cloud Deploy was only released to general availability in January 2022, and the team is actively enhancing its capabilities. In this update, Google Cloud Deploy has added auto generation of a Skaffold configuration for single manifest applications. This is a nice jump start for teams who aren’t familiar with Skaffold.Delivery management improvements include the ability to pause a pipeline temporarily, and to abandon a release. Release Inspector is a shiny new difference comparison tool that enables users to more easily review application manifests.From an enterprise perspective, this update allows you to deploy Google Cloud Deploy delivery pipelines and target resources using Google Cloud Platform’s Terraform pro
Practical approach to implementing Apdex alerts in PrometheusWhy Apdex?Apdex provides a single number that attempts to quantify a user’s experience of requests to a web service. We first decide what we consider to be an acceptable response time from our service, in Apdex terminology this is called the T value. We then classifies requests as follows:All error responses are intolerable. All responses qucker than T are satisfactory. All responses slower than T, but quicker than 4T are tolerable. All responses slower than 4T are intolerable.The Apdex score for a service, over a given time, is a ratio of intolerable vs satisfactory and tolerable responses.apdex = (satisfactory + (tolerable / 2)) / totalHigh error rates and slow response times will result in a lower Apdex score. By encapsulating errors and high latency in a single metric Apdex attempts to quantify our users experience of our service as a single number. Since a low Apdex score implies either high rates of errors or undesirabl
Hi all,I’ve recently created several GKE clusters through some custom Terraform code. However, by default it looks like NAT-ing from the pod network is not enabled which is not desirable. So I found this article on how to enable it:https://cloud.google.com/kubernetes-engine/docs/how-to/ip-masquerade-agent#how_ipmasq_works Which did the trick just fine. However, I can’t seem to find a way to enable this during the cluster creation. I’d prefer not to have to add in the daemonset and configmap after the cluster is created. Is there any way to configure this as part of the cluster creation through Terraform? Also, this is a private cluster and I do not have access to the cluster through kubectl from where I’m running Terraform. Thanks!
Abhinav Rau, Principal Architect and Madhav Sathe Customer Engineer both are at Google will discuss about Kubernetes can be a cornerstone of DevOps and developer productivity, but the legacy ingress and controller architectures are fragmented with limited portability across different implementations. In this session, we’ll introduce you to the new Kubernetes Gateway API and show you how features like traffic splitting and continuous microservice delivery boosts productivity while implementing stronger DevOps practices. To join this session , Please check the following link:https://cloudonair.withgoogle.com/events/innovators-enhance-productivity-devops
Shobhit Gupta, Solutions Architect Google Cloud, will discuss about Modernizing software delivery starts with a strong blueprint. Join us to learn how Google Cloud tools and software can improve developer productivity and CI/CD. We will look at how platform admins can build consistent infrastructure across different environments, app devs can streamline development to focus on coding, and security specialists can monitor and apply security policies. Please check the following link to join this session:https://cloudonair.withgoogle.com/events/innovators-software-delivery-bp
Continuous Integration in GCP Continuous Integration forms the CI of the CI/CD process and at its heart is the culture of submitting smaller units of change frequently. The smaller changes minimize the risk, help to resolve issues quickly, increase development velocity, and provide frequent feedback. At its core, it is about getting feedback early and often, which makes it possible to identify and correct problems early in the development process. With CI, you integrate your work frequently, often multiple times a day, instead of waiting for one complex integration at the end of the day. Each integration is first verified with an automated build, which enables you to detect integration issues as quickly as possible and reduce problems downstream.Some important elements are required to make up the CI process, such as making changes in the code, managing the source code and building the artifacts, and storing the artifacts. Google Cloud has an appropriate service for each of the elements
Cloud Monitoring metrics as Managed Service for Prometheus Prometheus is an open-source systems monitoring and alerting toolkit. Prometheus collects and stores its metrics as time series data, such as metrics information is stored with the timestamp at which it was record, alongside optional key-value pairs. Prometheus provides a functional query language called PromQL (Prometheus Query Language) which lets the user select and aggregate time series data in real time.According to a recent CNCF survey, 86% of the cloud-native community reports that, they use Prometheus for observability. As Prometheus becomes more of a standard, an increasing number of developers are becoming fluent in PromQL. PromQL is a powerful, flexible, and expressive query language but its only able to query Prometheus time series data. Other sources of telemetry, such as metrics generated from logs, remain isolated in separate products and might require developers to learn new query tools to access them.Prometheus
Cloud Native Apps and DevOps Services Cloud-native is an approach for building and running applications that exploits the advantages of the cloud computing delivery model. When companies build and operate applications using a cloud-native architecture, they bring new ideas to market faster and respond sooner to customer demands. A cloud-native application is a program, which is designed for a cloud computing architecture. These applications are run and hosted in the cloud and are designed to capitalize on the inherent characteristics of a cloud computing software delivery model. A native app is a software that is developed for use on a specific platform or device.Cloud-native applications use a microservice architecture. This architecture efficiently allocates resources to each service that the application uses, making the application flexible and adaptable to a cloud architecture. These microservices that are part of the cloud-native app architecture are packaged in containers that co
What Your SLAs Means?A service-level agreement (SLA) is a promise made to a user of a service, to indicate that the availability and reliability of the service should meet a certain level of expectation. SLAs act as a pact between the software provider and the software user or client. SLAs may also include responsiveness to incidents and bugs. It depends on the contract. However, if an SLA is broken, then some penalty may incurred, such as a refund or a service subscription credit.SLAs are an integral part of an IT vendor contract. An SLA pulls together information on all of the contracted services and their agreed-upon expected reliability into a single document. They clearly state metrics, responsibilities, and expectations so that in the event of issues with the service, neither party can plead ignorance. It ensures both sides have the same understanding of requirements. Any significant contract without an associated SLA (reviewed by legal counsel) is open to deliberate or inadvert
How do you quantify user happiness? It’s not easy to measure directly in our systems, but we can look for signals in the user journey. You may experience an outage or other problem that internally seems relatively small, but your users take to Twitter in droves and express their displeasure. Or, you may have a catastrophic event but receive few or no complaints from end users. It is impossible to get inside your users’ heads and see whether they are happy or not while using your service.To overcome this problem, we use the happiness metrics also known as Service Level Indicators (SLIs). SLIs specify, measure, and track user journey success. They are quantifiable measures of reliability. SLIs tell you whether you are in or out of compliance with your SLO targets and are therefore in danger of making users unhappy.Once you choose the services you want to measure, you can then think about the SLIs that you will use to measure users common tasks and critical activities. Choosing SLIs tha
Increasing visibility into our systems, code, and repositories is essential for effective DevOps processes. Previously repository audit logging was available for Docker repositories, but Google Cloud has expanded the capability to include Maven, npm, and Python repositories as well! This is a boon for visibility and monitoring! You can read about how to configure the monitoring by following the link included in this post.How have you implemented monitoring for your repositories? How mature is your organization in providing automated alerting and monitoring? Do you use the audit logging capabilities Google Cloud provides?https://cloud.google.com/artifact-registry/docs/audit-logging
Login to the community
Social LoginLogin With Your C2C Credentials
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.