Learn | C2C Community

C2C Connect Live: Cambridge

On June 14, C2C hosted an event in Google’s Cambridge office. We believe in-person connections are invaluable to everyone in our community, especially when our members are able to immediately converse with amazing speakers who are sharing their journeys and business outcomes.The stories from this event—presented on stage from Google Cloud customers, partners, and employees—can all be reviewed below. Introduction from Google Yee-chen Tjie (@yeetjie), Google Cloud Life Sciences Head of Customer Engineering, kicked off the program at C2C Connect Live: Cambridge with a few words about how Google is using 10x thinking to make major unique and substantial investments in Healthcare and Life Sciences technology. Tjie made a point of mentioning Google’s record of solving problems using AI and ML, particularly with AlphaFold 2, the focus of the presentation Luke Ge of Intel gave later in the afternoon.After his opening remarks, Tjie hosted a round of Google trivia, inviting everyone in the audience to stand and then sit down every time they answered one of his true-or-false questions incorrectly. After guessing whether Google Suite was initially offered on CD in 2006 (false), the first Google Doodle was about Coachella because the founders were going (false––they were going to Burning Man), and the English translation of Kubernetes is “cargo ship” (false––it’s “pilot”), Tjie handed the lucky winner a free Google hub device. CISO Healthcare and Life Sciences Reflections Before beginning his presentation, Taylor Lehmann (@taylorlehmann1), Director of the Office of the CISO at Google Cloud, thanked the hosts for the opportunity to join and speak, noting that he had just had his “mind blown” talking to fellow presenter Jonathan Sheffi before the event. Lehmann went on to discuss some of the core principles of invisible security, and his office’s mission to “get to this vision where security is unavoidable.” A big part of this project, he explained, is eliminating the shared responsibility model in favor of what Google calls “shared fate.” Under this model, Google provides blueprints, solutions, and curated patterns to enable customers to manage their own security infrastructures. “If you have a bad day on Google Cloud, it’s a bad day for us too,” he summarized. “If you win on Google Cloud, you win too.” The History and Future of Human Genomics Jonathan Sheffi (@sheffi) formerly a Director of Product Strategy at Veeva Systems and Google Cloud, began his presentation by prodding the audience with an enthusiastic “How’s everyone doing?” and then added “First rule of public speaking, make sure the audience is awake.” The focus of Sheffi’s presentation, the history and future of human genomics, took the audience back to the year 1990, when, in Sheffi’s words, “Nirvana’s Nevermind is a year from coming out, it’s a very exciting time.”Sheffi went on to cover the advents of next-gen sequencing and of public cloud computing, government and pharmaceutical adoption of genomic sequencing, and recent cost-cutting advancements in genomics. When he opened things up to the audience for questions, Michael Preston of Healthcare Triangle shared his own experience seeking treatment for melanoma to ask how genomic sequencing can be used to predict patient reactions to prescribed medications. Sheffi took the question to heart, and acknowledged the need for sequencing and screening processes that take into account data on patient-reported side effects. End-to-End Optimization of AlphaFold2 on Intel Architecture Luke Ge (@Liangwei77ge) an AI Solution Specialist at Intel, opened his presentation by saying, “Yesterday I spent 6 hours on a plane to come to this beautiful city,” prompting a round of applause form the audience. Then he asked “How many of you are using Alphafold 2?” A few hands went up. He followed up with, “How many of you have heard of Alphafold 2?” Many more hands raised.Ge’s presentation explored how analyzing human tissue from DNA to protein structure requires using AI for processing huge sequence data. The Google product that handles this processing is AlphaFold 2. Ge explained how Intel’s computing hardware supports Alphafold 2, including by providing a deep Learning model inference and removing memory bottlenecks in AlphaFold 2’s attention and evoformer modules. At the end of his presentation, Ge demonstrated a model generated using non-optimized versus optimized Alphafold 2 code. The difference was clear. Panel Discussion Tjie moderated the panel discussion with Sheffi and Ge by asking each whether he is a Celtics fan or a Warriors fan. Immediately, the tension in the room rose: Sheffi and Ge are from opposite coasts, making Sheffi a Celtics fan and Ge a Warriors fan. The tension was short-lived, however. When Tjie asked Ge what he considers the best way to choose a compute instance, Sheffi followed up to ask Ge if it’s possible to run multiple sequences on a single instance and maintain performance. Ge said yes.When Tjie opened questions to the audience, several guests rose to ask Sheffi questions about genomic sequencing, more than one of them focusing on use cases for genomic research for patients and caregivers. After several of these questions in a row, Tjie turned to the crowd and said, “I warned Luke that if he picked the Warriors then he would get less questions from the audience.” After the laughs in the room died down, Tjie asked Ge where he sees HCLS problems being solved with AI. Ge did not have to think long before citing computer vision as a solution for detecting cancerous cells. Winding Down Following the presentations, all in attendance broke away to connect during a networking reception. To read more about it, check out the exclusive onsite report linked below in the Extra Credit section. Extra Credit  

Categories:Data AnalyticsIndustry SolutionsIdentity and SecurityGoogle Cloud PartnersHealthcare and Life SciencesSession Recording

C2C Connect Live: New York City (full video)

On May 12, C2C hosted its first east coast event at Google’s New York office. We believe in-person connections are invaluable to everyone in our community, especially when our members are able to immediately converse with amazing speakers who are sharing their journeys and business outcomes.The stories from this event—presented on stage from Google Cloud customers, partners, and employees—can all be reviewed below.  A Warm Welcome from C2C and Google Cloud Opening the event was Marco ten Vaanholt (@artmarco), who leads C2C initiatives at Google Cloud. To kick things off, Marco prompted the audience to get to know each other, and all enthusiastically turned to their table neighbors. After Marco covered the history of C2C and our early adventures in hosting face to face events, Marcy Young (@Marcy.Young), Director of Partnerships at C2C, followed to reiterate our mission statement: we’re here to connect Google Cloud customers across the globe. Since March of 2021, when the C2C online community first launched, our community has grown in size to make valuable connections with people like Arsho Toubi (@Arsho Toubi), Customer Engineer, Google Cloud, who followed Young to introduce C2C’s partner speakers.All three introductory speakers emphasized the excitement of being able to make new connections in person again. As ten Vaanholt put it, peers introducing themselves and initiating new relationships is “the start of community building.” When Toubi announced “I received some business cards, and that was a fun experience I haven’t had in two years,” the room responded with a knowing laugh. Toubi also asked the Googlers in the room to stand up so others could identify them. “These are my colleagues,” she said. “We’re all here to help you navigate how to use GCP to your best advantage.”  Getting to Know AMD and DoiT C2C partners and the sponsors for this event, DoiT and @AMD shared updates of the partnership between the two companies focused on cloud optimization.Michael Brzezinski (@mike.brzezinski), Global Sales Manager, AMD Spenser Paul (@spenserpaul), Head of Global Alliances, DoiTBrzezinski framed the two presentations as a response to a question he received from another attendee he met just before taking the stage, a question about how the two companies work together to enhance performance while reducing cost. One half of the answer is AMD’s compute processors, which Brzezinski introduced one by one. To complete the story of the partnership between the two companies, Spenser Paul of DoiT took the stage with his Labrador Milton. “I’m joining the stage with a dog, which means you won’t hear anything I’m saying from here on,” he said as he took the microphone. “And that’s totally okay.” The key to minimizing cost on AMD’s hardware, Paul explained, is DoiT’s Flexsave offering, which automates compute spend based on identified need within a workload.  A Fireside Chat with DoiT and CurrentSpenser Paul, Head of Global Alliances, DoiT Trevor Marshall (@tmarshall), Chief Technology Officer, CurrentPaul invited Marshall to join him onstage, and both took a seat facing the audience, Milton resting down at Paul’s feet. After asking Marshall to give a brief introduction to Current, Paul asked him why Current chose Google Cloud. Marshall did not mince words: Current accepted a $100,000 credit allowance from Google after spending the same amount at AWS. Why did Current stay with Google Cloud? The Google Kubernetes Engine. “I like to say we came for the credits, but stayed for Kubernetes,” Marshall said. Paul wryly suggested the line be used for a marketing campaign. The conversation continued through Current’s journey to scale and its strategy around cost optimization along the way.When Paul opened questions to the audience, initially, none came up. Seeing an opportunity, Paul turned to Marshall and said, “Selfishly, I need to ask you: what’s going to happen with crypto?” Just in time, a guest asked what other functionalities Current will introduce in the future. After an optimistic but tight-lipped response from Marshall, another moment passed. Marshall offered Paul a comforting hand and said, “We’re all going to make it through,” before fielding a few more questions.  Panel Discussion  All our presenters, with the addition of Michael Beal (@MikeBeal), CEO, Data Capital Management reconvened on stage for a panel discussion. Toubi, who moderated the conversation, began by asking Michael Beal to introduce himself and his company, Data Capital Management, which uses AI to automate the investment process. Beal ran through Data Capital Management’s product development journey, and then, when he recalled the company’s initial approach from Google, playfully swatted Marshall and said, “The credits don’t hurt.” Toubi then guided Beal and Brzezinski through a discussion of different uses cases for High Performance Computing, particularly on AMD’s processors.When Toubi turned the panel’s attention to costs, Paul took the lead to explain in practical detail how DoiT’s offerings facilitate the optimization process. “I have an important question,” said Toubi. “Can DoiT do my taxes?” Then she put the guests on the spot to compare Google Cloud to AWS’s Graviton. Brzezinski was ready for the question. The initial cost savings Graviton provides, he explained, don’t translate to better price performance when taking into account the improved overall performance on Google Cloud. Other questions covered financial services use cases for security, additional strategies for optimizing workloads for price performance, and wish-list items for Google Cloud financing options.Marco ten Vaanholt kicked off the audience Q&A by asking what a Google Cloud customer community can do for the customers on the panel. Marshall said he’s interested in meeting talented developers, and Beal said he’s interested in meeting anyone who can give him ideas. As he put it, “Inspiration is always a very interesting value proposition.” After a couple more questions about estimating cost at peak performance and addressing customer pain points, Toubi asked each panelist to offer one piece of advice for someone considering using Google Cloud who isn’t already. Again, Paul saw a shot and took it. “If you’ve never been to Google before,” he said, “Come for the credits, stay for the Kubernetes.” Winding Down Following the presentations, all in attendance broke away to connect during a networking reception. To read more about it, check out the exclusive onsite report linked below in the Extra Credit section, and to get involved in the customer-to-customer connections happening in person in the C2C community, follow the link to our live event in Cambridge, MA to register and attend. We look forward to seeing you there! Extra Credit 

Categories:Data AnalyticsGoogle Cloud StrategyContainers and KubernetesIndustry SolutionsGoogle Cloud PartnersFinancial ServicesSession Recording

The Value of Looker for Startups (full recording)

Looker is a business intelligence platform used for data applications and embedded analytics. Looker helps you easily explore, share, and visualize your company's data so that you can make better business decisions. During this deep dive, Cat Huang and Tema Johnson, Looker customer engineers at Google Cloud, discussed the value of Looker for startup companies, including recommendations for how to choose a data warehouse complete with a product demo. The recording from this session includes the topics listed below, plus plenty of conversation infused in the presentation from open Q&A from community members present at the live event:(0:00) Welcome and introduction from C2C and the Google Startups Team (5:25) Looker (creating a data culture) vs. Data Studio (data visualizations) (9:00) Using Looker and Data Studio together for a complete, unified platform for self-service and centralized BI (10:10) Using looker with a data warehouse like BigQuery (13:15) Serverless big data analytics vs. traditional data warehouses (14:10) Integrated AI and ML services for data analytics (15:30) The power of Looker: in-database architecture, semantic modeling layer, and cloud native (21:05) Live demo: Looker (40:00) Closing comments and audience Q&AWatch the full recording below: Preview What’s NextJoin the Google Cloud Startups group to stay connected on events like this one, plus others we have coming up: 

Categories:Data AnalyticsGoogle Cloud StartupsSession Recording

Clean Clouds, Happy Earth Panel Discussion: Sustainability in EMEA

The centerpiece of C2C’s virtual Earth Day conference, Clean Clouds, Happy Earth, was a panel discussion on sustainability in EMEA featuring C2C and Google Cloud partners HCL and AMD and cosmetics superpower L’Oreal. Moderated by Ian Pattison, EMEA Head of Sustainability Practice at Google Cloud, the conversation lasted the better part of an hour and explored a range of strategies for enabling organizations to build and run sustainable technology on Google Cloud.According to Sanjay Singh, Executive VP of the Google Cloud Ecosystem Unit at HCL technologies, when advising customers across the value chain evaluating cloud services, Google Cloud becomes a natural choice because of its focus on sustainable goals. Connecting customers to Google Cloud is a key part of HCL’s broader program for maintaining sustainable business practices at every organizational level. “What you cannot measure, you cannot improve” says Singh, which is why HCL has created systems to measure every point of emission under their purview for carbon footprint impact. In alignment with Google Cloud’s commitment to run a carbon-free cloud platform by 2030, HCL plans to make its processes carbon neutral in the same timeframe.Suresh Andani, Senior Director of Cloud Vertical Marketing at AMD, serves on a task force focused on defining the company’s sustainability goals as an enterprise and as a vendor. As a vendor, AMD prioritizes helping customers migrate to the cloud itself as well as making its compute products (CPUS and GPUS) more energy efficient, which they plan to do by a factor of 30 by 2025. On the enterprise side, Andani says, AMD relies on partners and vendors, so making sure AMD as an organization is sustainable expands to its ecosystem of suppliers. One of the biggest challenges, he says, is to measure partners’ operations. This challenge falls to AMD’s corporate responsibility team.Health and beauty giant L’Oreal recently partnered with Google Cloud to run its beauty tech data engine. In the words of architect Antoine Castex, a C2C Team Lead in France, sustainability at L’Oreal is all about finding “the right solution for the right use case.” For Castex, this means prioritizing Software as a Service (SaaS) over Platform as a Service (PaaS), and only in the remotest cases using Infrastructure as a Service (IaaS). He is also emphatic about the importance of using serverless architecture and products like AppEngine, which only run when in use, rather than running and consuming energy 24/7.For Hervé Dumas, L’Oreal’s Sustainability IT Director, these solutions are part of what he calls “a strategic ambition,” which must be common across IT staff. Having IT staff dedicated to sustainability, he says, creates additional knowledge and enables necessary transformation of the way the company works. As Castex puts it, this transformation will come about when companies like L’Oreal are able to “change the brain of the people.”As Castex told C2C in a follow-up conversation after the event, the most encouraging takeaway from the panel for L’Oreal was the confirmation that other companies and tech players have “the same dream and ambition as us.” Watch a full recording of the conversation below, and check back to the C2C website over the next two weeks for more content produced exclusively for this community event.  Also, if you’re based in EMEA and want to connect with other Google Cloud customers and partners in the C2C community, join us at one of our upcoming face-to-face events:  Extra Credit:  

Categories:Data AnalyticsGoogle Cloud StrategyComputeIndustry SolutionsCloud MigrationGoogle Cloud PartnersSustainabilityConsumer Packaged GoodsSession Recording

Healthcare Case Study: Mayo Clinic's Remote Patient Monitoring Program for COVID-19

People with COVID-19 are typically advised to self-isolate for two weeks, with some patients needing comprehensive home care. Mayo Clinic's Center for Connected Care originally designed its Remote Patient Monitoring Program to be used for patients with chronic conditions. Now it has adapted the model for patients with COVID-19.Quarantined Mayo Clinic patients participating in the Remote Patient Monitoring Program receive medical devices they use to screen and electronically transmit their vital signs. A team of remote nurses regularly monitors the patients’ health assessment data and contacts the patients if their conditions worsen, or if they may require support. How the Remote Patient Monitoring Program Works Mayo’s Remote Patient Monitoring Program serves two categories of patients: Patients who are at moderate to high risk for complications are given remote patient monitoring kits with blood pressure cuffs, thermometers, pulse oximeters, and a scale. Two to four times a day, patients use these devices to screen and process their vital signs to Mayo Clinic through the tablets they receive with their kits. Mayo’s Patient Monitoring nurses monitor these vital signs and call patients to ask if if they are experiencing COVID-19 symptoms such as vomiting, nausea, or diarrhea. Patients who are at low risk for complications monitor their conditions each day through the Mayo Clinic app. They receive a daily alert reminding them to provide their health assessments to their Mayo Patient Monitoring team.  What Is Remote Monitoring? Remote patient monitoring allows physicians and healthcare facilities to track outpatient progress in real time. Caregivers also use this technology for geriatric wellness monitoring. Devices used for remote patient monitoring include wearable fitness trackers, smart watches, ECG monitors, blood pressure monitors, and glucose monitors for diabetes. Collected data is electronically transmitted to the patient’s doctors for assessment and recommendations. Benefits of this technology include: Remote care reduces burden for healthcare practitioners and healthcare organizations.  Hospitals and clinics save on operational costs by reducing readmissions, staff engagement, and in-person visits.  Remote patient devices enable early detection of deterioration and comorbidities, thereby reducing emergency visits, hospitalizations, and the duration of hospital stays. According to the Financial Times, remote patient technology could save the U.S. a total of $6 billion per year. A more recent scientific report calculated $361 in savings per patient per day, or around $13,713 in total savings per patient per year. Results Mayo Clinic’s Remote Patient Monitoring Program has reduced its caseload from 800 Covid patients to 350 patients with intensive needs. These patients were connected to 1-2 physicians per shift who monitored their symptoms and escalated care as needed.One such patient reported: “[This program] was our lifeline…. It just took some of that fear away, because we knew that there was somebody still there taking care of us with our vital signs. It motivated us to do better on getting well.” The Impact of Google Cloud Mayo Clinic uses Google Cloud and Google Health to positively transform patient and clinician experiences, improve diagnostics and patient outcomes, and conduct innovative clinical research. In addition to building its data platform on Google Cloud, Mayo uses Google Health to create machine-learning models for assessing symptoms of serious and complex diseases.

Categories:Data AnalyticsIndustry SolutionsHealthcare and Life Sciences

Leveraging Data for Consumer Behavior (full video)

This session was led by Quantiphi, a foundational partner of C2C and a Google Partner that uses AI to solve the problems at the heart of businesses. Connect with them directly @Quantiphi in the C2C community.One of the best ways for enterprises across a broad range of business sectors to remain relevant is to use consumer behavior data in ways that will help their brands stand out from the competition. Using this data effectively and uniquely can help businesses improve the rate of customer acquisition, increase the ROI from marketing spends, and also ensure customer centricity and personalization. But what can we do to improve customer experiences by leveraging customer data, and how? To learn more, C2C sat down with Vijay Mannur, Customer and Marketing Analytics Practice Lead at Quantiphi, to discuss how to enhance consumer engagement and conversion using behavioral data. Questions answered will include:The recording from this Deep Dive includes:(1:55) Agenda overview and introduction to speakers (8:05) Marketing analytics How and why Quantiphi built a dedicated marketing and analytics team Options for marketing analytics from Google Cloud (14:50) Consumer data Third-party vs. first-party cookies and rich data quality Consumer Data Platform (CDP) vs. traditional Customer Relationship Management (CRM) How to build and upskill teams to use CDP effectively Using BigQuery and other Google Cloud analytics tools (32:25) Examples of customer stories using CDP How a French retailer centrally connected their consumer databases with custom pipelines from BigQuery How a bank optimized consumer segmentation and profiling using Vertex AI (39:00) Future of analytics The future of consumer data and trends nearing the end of their lifecycle Addressing privacy concerns using Google Cloud data warehousing and analytics solutions Ethical use of machine learning for consumer behavior  Speakers featured in this Deep Dive   Vijay Mannur Practice Head, Customer and Marketing Analytics, Quantiphi   Vijay Mannur is a Practice Head at Quantiphi with 12+ years of experience in the field of Performance Marketing, Sales and Analytics. He leads the Customer and Marketing Analytics practice at Quantiphi, a leading Digital Transformation and AI solutions company. He has grown the practice at Quantiphi to encompass engineering teams building cutting edge solutions, delivery teams, and sales teams. He has delivered multiple large-scale digital transformation solutions to marketing teams of large Retail and FSI clients. Prior to Quantiphi, Vijay worked for companies like Media.net, Idea Cellular, and NEC Corporation.     Daniel Lees Staff Partner Engineer, Google Cloud A Cloud Architect at Google, Daniel Lees was a Principal Architect in Financial Services Select helping Google’s most valued clients build in Google Cloud Platform before joining the Partner Engineering team in support of Google’s most important partners. He has extensive expertise in defined best practices, blueprints, security and compliance standards, and evangelism of reusable assets for cloud deployment in CICD pipelines with IoC working on both cloud native and hybrid application modernization. Before Google, he had 20 years of experience at HSBC Bank where he was the Chief Technical Architect for AWS Cloud globally, leading a small team of SME cloud architects.     Other ResourcesRedefine customer and marketing analytics Google Cloud Marketing analytics & AI solutions Responsible AI practices

Categories:Data AnalyticsIndustry SolutionsGoogle Cloud PartnersRetailSession Recording

Monitoring and observability Drive Conversation at C2C Connect: France Session on January 11

On January 11, 2022, C2C members @antoine.castex and @guillaume blaquiere hosted a powerful session for France and beyond in the cloud space. C2C Connect: France sessions intend to bring together a community of cloud experts and customers to connect, learn, and shape the future of cloud. 60 Minutes Summed Up in 60 Seconds  Yuri Grinshteyn, Customer SRE at Google Cloud, was the guest of the session. Also known as “Stack Doctor” on YouTube, Grinshteyn advocates the best way to monitor, observe and follow the SRE best practices as learnt by Google in their own service SRE teams. Grinshteyn explained the difference between monitoring and observability: Monitoring is “only” the data about a service, a resource. Observability is the behavior of the service metrics through time. To observe data, you need different data sources; metrics, of course, but also logs and traces. There are several tools available, but the purpose of each is observability: FluentD, Open Sensus, Prometheus, Graphana, etc. All are open-source, portable, and compliant with Cloud Operations. The overhead of instrumented code is quite invisible, and the provided metrics are much more important than the few CPU cycles lost because of it. Microservices and monoliths should use trace instrumentation. Even a monolith never works alone: it uses Google Cloud Services, APIs, Databases, etc. Trace allows us to understand North-South and East-West traffic.  Get in on the Monitoring and Observability Conversation! Despite its 30-minute time limit, this conversation didn’t stop. Monitoring and observability is a hot topic, and it certainly kept everyone’s attention. The group spent time on monitoring, logging, error budget, SRE, and other topics such as:  Cloud Operations Managed Services for Prometheus Cloud Monitoring Members also shared likes and dislikes. For example, one guest, Mehdi, “found it unfortunate not to have out of the box metrics on GKE to monitor golden signals,” and said “it’s difficult to convince ops to install Istio just for observability.”  Preview What's Next Two upcoming sessions will cover topics that came up but didn’t make it to the discussion floor: If either of these events interests you, be sure to sign up to get in touch with the group! Extra Credit Looking for more Google Cloud products news and resources? We got you. The following links were shared with attendees and are now available to you! Video of the session Cloud Monitoring Managed Services for Prometheus Sre.google website SRE books Stack Doctor Youtube playlist  

Categories:Data AnalyticsDevOps and SRECloud Operations

Cloud Technologies: Boon for Sustainable Future (a Fireside Chat with SpringML)

The effort to combat climate change is such a major undertaking that no metaphor does it justice. It will take more than “all hands on deck.” We need to be more than “on board.” Every one of us has a crucial role to play. That’s why the data we have must be available to the entire public, not just governments and corporations.In October 2021, Google Cloud established partnerships with five companies engaged in environmental data collection efforts: CARTO, Climate Engine, Geotab, Egis, and Planet Labs. These companies are working with Google to make their datasets available globally on Google Cloud. As a 2020 Google Cloud Partner of the Year and a company with a stated commitment to sustainability, C2C foundational partner SpringML is excited to raise awareness of this initiative.In this fireside chat, Lizna Bandeali and SpringML’s Director of Google Cloud Services Masaf Dawood explore the background and the implications of this recent effort. Key points discussed include ease, transparency, and accessibility of data, and a focus on actionable insights. With the datasets available and Google Cloud Platform tools like BigQuery, organizations and individuals working in environmental science, agriculture, food production, and related fields can make informed predictions about everything from weather patterns to soil quality. These organizations and individuals can use these predictions to plan future resource use around vital sustainability guidelines. Watch the full video below:Are you an individual or a decision-maker at an organization pursuing sustainability? What are you doing to take up this effort? Contact us on our platform and tell us your story! 

Categories:Data AnalyticsDatabasesSustainability

To Collate or To Analyze: Cloud Bigtable vs. BigQuery

The Google Cloud Platform hosts all kinds of tools for data storage and management, but two of the most versatile and popular are Bigtable and BigQuery. While each service is a database, the key difference between the two lies in their names. Bigtable (BT) is literally a “big table” that scales to petabytes if not terabytes for storing and collecting your data. BigQuery (BQ), on the other hand, conducts a “big query” into your massive troves of data. Each database has other unique attributes that define when and how to use it. These topics, along with use cases, case stories, and costs associated with each product, are covered in the following sections. Bigtable  Bigtable, Google Cloud’s fully-managed database for hefty analytical and operational workloads, powers major Google products like Google Search, Google Maps, and Gmail. The database supports high read/write per second speed, processes reads/writes at ultra-low latency, and scales to billions of rows and thousands of columns for massive troves of data. Bigtable is ideal for Cloud data visualization products, such as BigQuery, DataFlow, and DataProc. It integrates well with Big Data tools such as Hadoop, DataFlow, Beam, and Apache HBase. Bigtable Use CasesBigtable is best-used for instances with lots of data, such as the following:  Time series data, e.g., CPU usage over time for multiple servers. Financial data, e.g., currency exchange rates. Marketing data, like customers’ purchase histories and preferences. Internet of things data, such as usage reports from home appliances. Fraud detection, i.e. detecting fraud in real time on ongoing transactions. Product recommendation engines to handle thousands of personalized recommendations. BigQuery BigQuery is Google Cloud’s serverless fully-managed service that helps you ingest, stream, and analyze massive troves of information in seconds. In contrast to Bigtable, BigQuery is a query engine that helps you import and then analyze your data.Since BigQuery uses SQL (Structured Query Language), this database is ideal for Amazon Redshift, which uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. BigQuery Use CasesBigQuery is commonly used for instances that include: Real-time fraud detection; BQ ingests and analyzes massive amounts of data in real-time to identify or prevent unauthorized financial activity. Real-time analytics; BQ is immensely useful for businesses or organizations that need to analyze their latest business data. Log analysis; BQ reviews, interprets, and understands computer-generated log files. Complex data pipeline processing; BQ manages and interprets the steps of one or more complex data pipelines generated by source systems or applications. Similarities Between Bigtable and BigQuery Each database boasts ultra-low latency on the order of single-digit microseconds, high-performance and speed on the order of 10,000 rows per second, and powerful scalability that enables you to scale (or descale) for additional storage capacity. Both are end-to-end managed and thoroughly secure as they encrypt at-rest and transit data. Differences Between Bigtable and BigQuery While Bigtable collates and manages your data, BigQuery collates and analyzes those troves of data.Bigtable resembles an Online Transaction Processing (OLTP) tool, where you can execute a number of transactions occurring concurrently—such as online banking, shopping, order entries, or text messages. BigQuery, in contrast, is ideal for OLAP (Online Analytical Processing) — for creating analytical business reports or dashboards. In short, for anything related to business analysis, such as for scrolling through last year’s logs to see how to improve business. While Bigtable is NoSQL — mandatory for its flexible database — BigQuery uses SQL, making it ideal for performing complex queries on heavy-duty transactions. Don’t expect BigQuery to be used as a regular relational database or for CRUD (to Create, Read, Update, and Delete data). It’s immutable, which means its information is encoded so that it can’t be edited or removed. Case Studies Companies use Bigtable for structuring and managing their massive troves of data,while they use BigQuery for mining insight from these troves of data. Below are a few examples of how businesses have used each in practice: Bigtable Digital fraud detection and payment solution company Ravelin uses Bigtable to store and query 1.2 billion transactions of more than 230 million active users.  AdTech provider OpenX uses Bigtable to serve more than 30,000 brands, more than 1,200 websites, and more than 2,000 premium mobile apps, and processes more than 150 billion ad requests per day. Dow Jones DNA uses Bigtable for fast, robust storage of key events that the company has documented in over 30 years of news content.  BigQuery UPS uses BigQuery to achieve precise package volume forecasting for the company. Major League Baseball is expanding its fan base with highly-personalized immersive experiences. They analyze their marketing  using BigQuery. The Home Depot uses BigQuery to manage customer service and keep 50,000 items routinely stocked across 2,000 stores.  Costs When using BigQuery, you pay for storage (based on how much data you store). There are two storage rates: active storage ($0.020 per GB), or long-term storage ($0.010 per GB). With both, the first ten GB are free each month. You also pay for processing queries. Query costs are either on-demand (i.e., charged by the amount of data processed per query), or flat-rate.BigQuery also charges for certain other operations, such as streaming results and the use of its Storage API. Loading and exporting data is free. For details, see BigQuery pricing. Using Bigtable, you pay for storage and bandwidth. Here’s all you need to know on Bigtable pricing across countries.If you’re ready to start using or testing either product for a current or upcoming project, you can create a Bigtable instance using Cloud Console’s project selector page, or Cloud’s Bigtable Admin API. BigQuery is accessible via Google Cloud Console, The BigQuery REST API, or an external tool such as a Jupyter notebook or business intelligence platform. Extra Credit: 

Categories:Data AnalyticsDatabases

C2C Community Members Get in the ML Mindset

Machine Learning (ML) is a major solution business and technical leaders can use to drive innovation and meet operational challenges. For managers pursuing specific organizational goals, ML is not just a tool: it’s a mindset. C2C’s community members and partners are dynamic thinkers; choosing the right products for their major projects requires balancing concrete goals with the flexibility to ask questions and adapt. With these considerations in mind, C2C recently invited Google Cloud Customer Engineer KC Ayyagari to host a C2C Deep Dive on The ML Mindset for Managers.Ayyagari started the session by asking attendees to switch on their cameras and then ran a sentiment analysis of their faces in Vision API:After giving some background on basic linguistic principles of ML, Ayyagari demonstrated an AI trained to play Atari Breakout via neural networks and deep reinforcement learning:To demonstrate how mapping applications can use ML to rank locations according to customer priority, Ayyagari asked the attendees for considerations they might take into account when deciding between multiple nearby coffee shops to visit:As a lead-in to his talking points about the ML mindset for managers, Ayyagari asked attendees for reasons they would choose to invest in a hypothetical startup he founded versus one founded by Google’s Madison Jenkins. He used the responses as a segue into framing the ML mindset in the terms of the scientific method. Startup management should start with a research goal, he explained, and ML products and functions should be means to testing that hypothesis and generating insights to confirm it:Before outlining a case study of using ML to predict weather patterns, Ayyagari asked attendees what kinds of data would be necessary to use ML to chart flight paths based on safe weather. Guest Jan Strzeiecki offered an anecdote about the flight planning modus operandi of different airports. Ayyagari provided a unique answer: analyzing cloud types based on those associated with dangerous weather events.The theme of Ayyagari’s presentation was thinking actively about ML: in every segment, he brought attendees out of their comfort zones to get them to brainstorm, just like an ML engineer will prompt it’s machines to synthesize new data and learn new lessons. ML is a mindset for this simple reason: machines learn just like we do, so in order to use them to meet our goals, we have to think and learn along with them.Are you a manager at an organization building or training new ML models? Do any of the best practices Ayyagari brought up resonate with you? Drop us a line and let us know! Extra Credit:  

Categories:AI and Machine LearningData AnalyticsAPI ManagementSession Recording

Bringing More Insights to the Table with Cloud Bigtable

Cloud Bigtable powers major Google products like Search and Maps. You can use this incredibly scalable database for analyzing large workloads, such as your customers’ purchase histories and preferences, or currency exchange rates. Bigtable is cheap, scalable, fast, and reliable. This article outlines Bigtable’s attributes, uses, strengths, and weaknesses so you can evaluate whether it’s the right tool for you in any context. What is Bigtable? Bigtable is Google Cloud’s fully-managed, NoSQL database for large analytical and operational workloads. This innovative database: Supports high read/write speed per second. Processes these reads/writes at ultra-low latency, on the order of single-digit microseconds. Scales to billions of rows and thousands of columns, adapting itself to terabytes, if not petabytes, of data.  Bigtable is ideal for Cloud data visualization products, such as BigQuery, DataFlow, and DataProc. You can use Cloud Bigtable in various ways, such as for storing marketing data, financial data, and Internet of Things data (e.g., usage reports from energy meters and home appliances). You can also use it for storing time-series data (e.g., CPU usage over time for multiple servers) and graph data (e.g., hospital patients’ dosage regimen over a period of years). What does Bigtable bring to the table? Bigtable is a dynamic product with many identifiable assets. The following are the three that most set it apart from the other products in its field: Speed: The database processes quantities of reads/ writes on the order of 10,000 rows per second. Scalability: You can stretch that table by adding or removing nodes. Each node - or compute resource that Bigtable uses to manage your data - gives you additional storage capacity.  Reliability: Bigtable gives you key-level performance, stability, and tools for debugging that usually takes far longer on a self-hosted data store.  How does Bigtable work? Cloud Bigtable is superbly simple. The following four functions will alow you to execute almost any project you’re using Bigtable to support: Scale or descale the Table by adding or removing nodes. Replicate your data by adding clusters; replication starts automatically. Clusters describe where your data is stored and how many nodes are used for your data.  Group columns that relate to each other into “column families” for organizational purposes. Incorporate time stamps by creating rows for each new event or measurement instead of adding cells in existing rows. (This makes Bigtable great for time series analysis). Bigtable integrates well with Big Data tools such as Hadoop, DataFlow, Beam, and Apache HBase, making it a cinch for users to get started. Case Histories Some of the world’s most recognizable companies and institutions have used Bigtable for projects managing massive amounts of data. A small but representative sample of these projects follows below. Dow JonesDow Jones, one of the world’s largest news organizations, used Bigtable to structure its Knowledge Graph. The tool compressed key global events from 1.3 billion documents over a 30-year period into Bigtable, for users to mine for insights. Users could also customize the Graph to suit their needs. “With the help of Cloud Bigtable,” a spokesperson from Dow Jones partner Quantiphi said, “we can easily store a huge corpus of data that needs to be processed, and BigQuery allows data manipulations in split seconds, helping to curate the data very easily.” RavelinRavelin, a digital fraud detection and payment solution company for online retailers, uses Bigtable to effortlessly and seamlessly store and query over 1.2 billion transactions of the clients of its more than 230 million active users. Ravelin also profits from Bigtable’s encrypted security  mechanisms. According to Jono MacDougall, Principal Software Engineer at Ravelin: “We like Cloud Bigtable because it can quickly and securely ingest and process a high volume of data.” AdTechAdTech provider OpenX serves more than 30,000 brands, more than 1,200 websites, and more than 2,000 premium mobile apps. It also processes more than 150 billion ad requests per day and about 1 million such requests per second, so it needed a highly scalable, extremely fast, fully managed database to fit its needs. Bigtable provided the perfect solution. How do I know if Bigtable is right for me? As powerful as Bigtable is, it’s not a good choice for every situation. In certain contexts, you’ll want to keep other options in mind. For example: Choose SQL-structured Spanner if you need ultra-strong consistency.  Use NoSQL Cloud Firestore if you want a flexible data model with strong consistency. Opt for SQL-based BigQuery if you need an enterprise data warehouse that gives you insights into your massive amounts of business data.   Ready to set up Bigtable? You can create a Bigtable instance using Cloud Console’s the project selector page, or the Cloud Bigtable Admin API. However, Bigtable isn’t free. Users pay by type of instance and amount of nodes, how much storage a table uses, and how much bandwidth Bigtable uses overall. (Here’s all you need to know on Bigtable pricing across countries).Next time you’re looking to analyze large workloads, take a minute to check out Bigtable. It could help you crunch all that information in a matter of minutes.Have you ever used Bigtable? For what kinds of projects? How did it work for you? Start a conversation in one of our community groups and share your story! Extra Credit  

Categories:Data AnalyticsGoogle Cloud Product UpdatesDatabases

Ingest, Store, Query, and More: What BigQuery Can Do for You

If you’re a web developer, a software engineer, or anyone else working with small batches of data, you know how to use a spreadsheet. The problem arises when you have massive amounts of data that need to be stored, ingested, analyzed, and visualized rapidly. More often than not, the product you need to solve this problem is Google Cloud’s serverless, fully-managed service, BigQuery. BigQuery deals with megabytes, terabytes, and petabytes of information, helping you store, ingest, stream, and analyze those massive troves of information in seconds.Small stores can use Excel to classify, analyze and visualize their data. What if your organization is a busy multinational corporation with branches across cities and regions? You need a magical warehouse database you can use to store, sort, and analyze streams of incoming information. That’s where BigQuery comes in. What is BigQuery? BigQuery is Google Cloud’s enterprise data cloud warehouse built to process read-only data. It’s fully managed, which means you don’t need to set up or install anything, nor do you need a data-based administrator. All you need to do is import and analyze your data.To communicate with BigQuery, you need to know SQL (Structured Query Language), the standard language for relational databases, used for tasks such as updating, editing or retrieving data from a database. BigQuery in Action BigQuery executes three primary actions: Ingestion: uploading data by ingesting it from cloud storage or by streaming it live from Google Cloud partners, such as BigTable, Cloud Storage, Cloud SQL, and Google Drive, enabling real-time insights Storage: storing data in a structured table, using SQL for easy query and data analysis Querying: answering questions about data in BigQuery with SQL Getting BigQuery up and running is fairly simple. Just follow these steps: Find BigQuery on the left-side menu of the Google Cloud Platform Console, under “Resources.” Choose one or more of these three options: Load your own data into BigQuery to analyze (and convert that data batch into a common format such as CSV, Parquet, ORC, Avro, or JSON). Use any of the free public datasets hosted by Google Cloud (e.g., the Coronavirus Data in the European Union Open Data Portal). Import your data from an external data source.  BigQuery ML You can also use BigQuery for your machine learning models. You can train and execute your models on BigQuery data without needing to train and move them around. To get started using BigQuery ML, see Getting started with BigQuery ML using the Cloud Console.Where can you find BigQuery (and BigQuery ML)? Both BigQuery and BigQuery ML are accessible via: Google Cloud Console The BigQuery command-line tool The BigQuery REST API An external tool such as a Jupyter notebook or a business intelligence platform  BigQuery Data Visualization When the time comes to visualize your data, BigQuery can integrate with several business intelligence tools such as Looker, Tableau, and Data Studio to help you turn complex data into compelling stories. BigQuery in Practice Depending on your company’s needs, you will want to take advantage of different capabilities of BigQuery for different purposes. Use cases for BigQuery include the following: Real-time fraud detection: BigQuery ingests and analyzes massive amounts of data in real time to identify or prevent unauthorized financial activity. Real-time analytics: BigQuery is immensely useful for businesses or organizations that need to analyze their latest business data as they compile it. Log analysis: BigQuery reviews, interprets, and understands all computer-generated log files. Complex data pipeline processing: BigQuery manages and interprets the steps of one or multiple complex data pipelines generated by source systems or applications.  Best BigQuery Features BigQuery has a lot to offer. Here are some of the tools BigQuery’s platform includes: Real-time analytics that analyzes data on the spot. Logical data warehouses wherein you can process data from external sources, either in BigQuery itself or in Google Drive. Data transfer services where you can import data from external sources including: Google Marketing Platform Google Ads YouTube Partner SaaS applications to BigQuery Teradata Amazon S3 Storage compute separation, an option that allows you to choose the storage and processing solution that’s best for your project Automatic backup and easy restore, so you don’t lose your information. BigQuery also keeps a seven-day history of changes.  BigQuery Pros  It’s fast. BigQuery processes billions of data rows in seconds. It’s easy to set up and simple to use; all you need to do is load your data. BigQuery also integrates easily with other data management solutions like Data Studio and Google Analytics BigQuery is the only data warehouse that handles huge amounts of data. BigQuery gives you real-time feedback that could thwart potential business problems. With BigQuery, you can avoid data silo complications that arise when you have individual teams within your company that have their own data marts.   BigQuery Cons  It falls short when used for constantly changing information. It only works on Google Cloud. It can become costly as data storage and query costs accumulate. PCMag suggests you go for flat pricing to reduce costs. You need to know SQL and its particular technical habits to use BigQuery. BigQuery ML can only be used in the US, Asia, and Europe.  When should you use BigQuery? BigQuery is best used ad-hoc for massive amounts of data, run for longer than five seconds, that you want analyzed in real time. The more complex the query, the more you’ll benefit from BigQuery. At the same time, don’t expect the tool to be used as a regular relational database or for CRUD, i.e., to Create, Read, Update, and Delete data. BigQuery Costs Multiple costs come with using BigQuery. Here is a breakdown of what you will pay for when you use it: Storage (based on how much data you store): There are two storage rates: active storage ($0.020 per GB), or long-term storage ($0.010 per GB). With both, the first ten GB are free each month. Processing queries: Query costs are either on-demand (i.e., by the amount of data processed per query), or flat-rate. BigQuery also charges for certain other operations, such as streaming results and use of the BigQuery Storage API. Loading and exporting data is free.For details, see Data ingestion pricing. This Coupler Guide to BigQuery Cost is also extremely helpful. TL;DR: With BigQuery, you can assign read or write permissions to specific users, groups or projects, collaborating across teams, and it is thoroughly secure, since it automatically encrypts at-rest and transit data.If you’re a data scientist or web developer running ML or data mining operations, BigQuery may be your best solution for those spiky, massive workloads. It is also useful for anyone handling bloated data batches, within reason. Be wary of those costs. Have you ever used BigQuery? How do you use it? Reach out and tell us about your experience! Extra Credit:  

Categories:Data AnalyticsStorage and Data TransferGoogle Cloud Product Updates

Get to Know the Google Cloud Data Engineer Certification

Personal development and professional development are among the hottest topics within our community. At C2C, we’re passionate about helping Google Cloud users grow in their careers. This article is part of a larger collection of Google Cloud certification path resources.The Google Cloud Professional Data Engineer certification covers highly technical knowledge concerning how to build scalable, reliable data pipelines and applications. Anyone who intends to take this exam should also be comfortable selecting, monitoring, and troubleshooting machine learning models.In 2021, the Professional Data Engineer rose to number one on the top-paying cloud certifications list, surpassing the Professional Cloud Architect, which had held that spot for the two years prior. According to the Dice 2020 Tech Job Report, it’s one of the quickest growing IT professions, and even with an influx of people chasing that role, the supply can’t meet the demand. More than ever, businesses are driven to take advantage of advanced analytics; data engineers design and operationalize the infrastructure to make that possible.Before you sit at a test facility for the real deal, we highly recommend that you practice with the example questions (provided by Google Cloud) with Google Cloud’s documentation handy. All the questions are scenario-based and incredibly nuanced, so lean in to honing your reading comprehension skills and verifying your options using the documentation.We’ve linked out to plenty of external resources for when you decide to commit and study, but let’s start just below with questions like:What experience should I have before taking this exam? What roles and job titles does Google Cloud Professional Data Engineer certification best prepare me for? Which topics do I need to brush up on before taking the exam? Where can I find resources and study guides for Google Cloud Professional Data Engineer certification? Where can I connect with fellow community members to get my questions answered? View image as a full-scale PDF here.  Looking for information about a different Google Cloud certification? Check out the directory in the Google Cloud Certifications Overview. Extra CreditGoogle Cloud’s certification page: Professional Data Engineer Example questions Exam guide Coursera: Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certification Pluralsight: Preparing for the Google Cloud Professional Data Engineer Exam AwesomeGCP Associate Cloud Engineer Playlist Global Knowledge IT Skills and Salary Report 2020 Global Knowledge 2021 Top-Paying IT CertificationsHave more questions? We’re sure you do! Career growth is a hot topic within our community and we have quite a few members who meet regularly in our C2C Connect: Certifications chat. Sign up below to stay in the loop.https://community.c2cglobal.com/events/c2c-connect-google-cloud-certifications-72

Categories:Data AnalyticsCareers in CloudStorage and Data TransferGoogle Cloud CertificationsDatabasesInfographic

People Analytics With Chris Hood, Natalie Piucco, and Mary Kate Stimmler

That Digital Show’s Digital Master Class Series continues this week with a breakdown of the “People Analytics” approach to employee data collection. Mary Kate Stimmler of the Google Cloud People Analytics team joins hosts Chris Hood and Natalie Piucco to describe how Google’s HR department uses People Analytics to collect dynamic and actionable data on employee experiences. What is People Analytics? According to Stimmler, it’s a combination of social science methodologies, statistics, data, and organizational theory that HR departments can use to inform decisions that transform company culture.As Piucco points out early in the episode, collecting data on company employees is common practice for HR departments. Google’s People Analytics team solicits feedback about every aspect of the employee lifestyle, from the interview process to new hire onboarding to reflections upon departure. The main tool Google uses to collect all this data is Googlegeist, the company’s annual employee survey, a massive data collection instrument designed to gather insights on all of the above and more.What has the People Analytics approach revealed about Google’s employee experience? The main driver of positive team sentiment, Stimmler says, is psychological safety. When managers give their teams space to experiment and make mistakes by offering non-punitive feedback, employees are more likely to keep and succeed in their jobs. Fostering psychological safety falls primarily to managers. As Stimmler puts it, “People don’t leave companies, people leave managers.” When managers practice active and visible leadership, these insights are put to productive use.This year, the People Analytics team has been using Googlegeist to guide decisions about returning to work. The survey collected data on 25 different aspects of the employee experience, including ideal working scenarios for heads-down work, collaboration, and brainstorming. Stimmler was baffled to find that responses were “very neutral.” Workplace preferences, she learned, comprise a very grey area. Despite the diffuse nature of these responses, Googlegeist’s rate of response is very high, which Stimmler attributes to a stated commitment to acting on gathered data.At the end of the show, Hood asked Stimmler for three main points listeners should take away from the conversation. The three she stressed were relationships, feedback, and data.How does your organization take feedback from employees? Does your company share the same priorities? Do you think they should? Come to our follow-up event with Chris Hood on Thursday, November 4 to share your thoughts and questions. Sign up below! 

Categories:Data Analytics

All About the New Google Analytics 4: Better Features, Reports, & Data Visualization

In addition to the core update that occurred between June and July earlier this summer, Google has also given its App + Web property a refresh with the roll out of Google Analytics 4. The Universal Analytics upgrade and expansion came with a lot of welcome improvements to reporting and user data visualization, but that’s just the tip of the iceberg.Because the web landscape is changing, Google introduced some upgrades that track users across devices and added artificial intelligence (AI) into their insight reporting. The new reporting features also integrate better with Google Ads, so organizations can have better control and understanding of the ways online ads convert viewers into customers. Find out more about all of the Google Analytics 4 features and changes that have so many seasoned Google Analytics users excited to make the switch to the latest version. What is Google Analytics 4? Google Analytics 4 is an expansion of the Universal Analytics and App + Dev property rebranding that took place last year. With new reporting capabilities, greater data visualization, and audience metrics, marketers and insights teams will be able to build more comprehensive user stories and integrated reports with all of the new and improved Google Analytics 4 features.The Analytics reporting tool has long been a part of Google’s reporting platform, but the new version adds several new features that encompass the growing landscape of web traffic. Users often have multiple devices to browse the web, and the Google Analytics 4 reporting tool targets these users so that organizations can track their experience. Google Analytics 4 Features The underlying theme of GA4’s recent update is better reporting. More control, greater flexibility, integrations with more marketing platforms, more comprehensive customer mapping, etc. Google is simply giving marketers more options for integrating and interpreting their customers’ data. Here are some of the new Google Analytics 4 features getting the most attention from marketers. Exploring Google Analytics 4 Reporting & Data Analysis FeaturesAmong the biggest takeaways from the Universal Analytics upgrade are more granular control of data and the ability to create more cohesive and integrated reports. In addition to greater data collection and retention, marketers now have greater control over data usage as well. The new Google Analytics 4 allows marketers to leverage certain data for ad optimization while also deciding which data remains a tool for reporting.While this might not seem revolutionary in the world of Google, Google Analytics 4 reporting is markedly better than what was previously offered in the App + Dev property due to its ability to integrate with Google Ads. What’s more, the Universal Analytics upgrade offers marketers a peek into a future in which customer journey mapping is no longer reliant on cookies. More Seamless Integration with Google AdsIn addition to great reporting capabilities, Google Analytics 4 features can now also integrate with a wider range of Google’s marketing platforms for more data-driven ad optimization. Marketing professionals track success of online ads to determine if the cost is worth the return on investment (ROI). Underperforming ads can be tweaked and analyzed to find the right ROI. Google Analytics 4 integrates with Google Ads across devices to better determine if the user is a paying customer from a specific marketing campaign. It also better tracks YouTube views and clicks so that organizations can link video with user engagements.  Machine Learning-Powered InsightsAs part of the new Google Analytics 4 features, marketers can now also leverage AI and ML to set up custom alerts and notifications around conversions and user behavior trends. Called “Analytics Intelligence,” this new feature uses advanced data modeling techniques to better understand an organization’s unique user experience and customer journey. AI is also integrated with support options so that marketing staff can ask questions and get meaningful answers about their reports. The new AI insights can also make meaningful predictions about purchase behavior, giving marketers valuable insight into future outcomes. Get information about potential anomalies and identify any trends quickly so that you can build better marketing campaigns around user behavior patterns. Marketing teams can customize their insight reporting based on industry, trends, services, and common issues. Better Audience Metrics & Lifecycle ReportingGetting the full picture of a customer journey has always been a fragmented affair in Google Analytics. But with the new analytics, reporting is framed along the customer lifecycle, giving marketers insights into channels that provide the greatest customer acquisition and retention and engagement reports that capture user behavior once they enter a site.Instead of piecing together one user across devices and web properties, organizations can get a better view of a customer’s journey across devices, domains, and applications. Google warns that sessions might seem lower with the new Google Analytics 4, but this is because the platform does a better job of tracking users as they go from web to mobile and purchase services on multiple business domains. How to Switch to GA4 If all of this information about Google Analytics 4 has you feeling compelled to make the switch, it’s actually very simple. In the Admin section of your Analytics property, click the “Upgrade to GA4” link. After the application upgrades, you must add streams to your reports. For example, if you have a web application that sells product, you add the domain and input information about the tracking method (e.g., tag manager or global site tag). During setup, you can also use “connected site tags” if you use Google Tag Manager across your web properties.It’s important to note however, that GA4 is still missing some of the features of the old Google Analytics. It might not make sense for every organization to switch over to GA4 if the switch could result in the loss of valuable reporting information. To get a sense of GA4’s reporting and data collection capabilities, many experts recommend simply getting started with creating a Google Analytics 4 account and allowing it to collect data while leaving legacy reporting intact. That way, marketers and insights teams can wade through the insights channels after a few months of data collection and determine whether or not the new Google Analytics 4 reporting features are right for their data collection needs.

Categories:Data Analytics

CRM and Marketing Automation: The Path to Better Lead Management

Automating business processes comes with many benefits. It can free up bandwidth, digitize manual work, and promote better lead management. However, for digital marketers and creative teams, manually managing every new lead that finds your work online can be overwhelming and detract from doing the vital work of fueling creative processes and strategies for your clients.One way to unlock the benefits of automation is to use CRM and marketing automation systems to improve internal processes, freeing up time for creativity and client management. In addition, marketing automation best practices can help drive business growth, but it isn’t always intuitive to know what is best for your team.  What Is CRM in Marketing? CRM stands for customer relationship management, and this useful software can help businesses automate a variety of internal processes, from time management to gathering better data. For example, in marketing, CRM systems are critical for tracking relationships, improving revenue, optimizing site elements for better engagement, and obtaining statistics for potential future marketing opportunities.  CRM Systems and Practices Investing in the right CRM system can help businesses of any size make the most data processing, time management, and lead nurturing. Sales teams, specifically, can employ CRM to track leads through the marketing funnel and use this data to improve conversion rates and the speed with which qualified leads are contacted.Skillful integration of CRM systems and practices can not only help client relations teams generate and nurture more leads, but they can also help marketers. By providing marketers with essential insights into their client relation successes and failures through lead scoring, stage mapping, and lead progression, marketers can make better decisions around campaigns, communications, and more.One of the most beneficial uses of CRM systems and practices in digital marketing is setting up triggers to target and influence qualified leads. For example, it’s common practice to utilize system triggers to build automated reminders for sales personnel to follow up with leads once they’ve reached a particular stage in their lead progression.  What Is Marketing Automation? Marketing automation can vary across organizations, but in its essence, marketing automation systems include any combination of CRM software used to automate marketing processes, such as user analytics gathering, email drip, newsletter automation, lead mapping, etc.For example, suppose that you want to send a marketing email to new signups. The new sign-up information would be added to the CRM database. With the right CRM tools, you can automatically send emails to the new signup based on your own configured cadence (e.g., daily, monthly, weekly). CRM automation isn’t just for new customers. You can use automation to keep in contact with existing customers to nourish the relationship and keep them coming back to your site. For example, CRM automation could send discounts and newsletters to existing customers by using a coupon code supplied in the email message. Marketing Automation Integration There are key considerations teams must take before an automation software integration of any scale. For example, poorly managed data can lead to poor outcomes, and they can cost more than the potential revenue for each lead. To avoid potential issues, you can follow some best practices. Marketing Automation Best PracticesEvery team is different, but to employ the right solution for your lead-generation needs or data gathering, you can use marketing automation best practices. For example, teams can follow to ensure their integration is set up with correct triggers and ROI goals. Setting Up TriggersTriggers are the events that cause the CRM to take action. Your chosen tool should have the ability to set up triggers so that you can perform marketing actions based on automated input. For example, once a lead becomes a customer (after buying products), this event can send a welcome email thanking the customer for the business. Lead State MappingA lead goes through stages as it moves from initial contact to being a qualified customer. You can even categorize a new reader on your site as a lead in the anonymous phase. There are several phases for leads but generally lead phases include engaged, qualified lead, sales lead, won (e.g., a customer buys products), lost (e.g., the customer is not interested), recycled (e.g., sales attempts contact again), and disqualified (e.g., contact information is false). Lead Scoring and QualificationScoring leads helps determine necessary marketing and sales efforts. A highly scored lead could mean that the contact is interested in company products, and sales should continue their attempts. A factor in scoring is lead qualification. A lead is highly qualified if contact information is verified. Qualified leads have a higher chance of being turned into customers, so they should prioritize salespeople. Running Targeted CampaignsTargeted email campaigns increase the chances of a sale. Instead of sending a general email, use targeted emails based on collected contact information. For example, if you have a lead interested in insurance, use automated CRM tools to send messages, discounts, and newsletters specific to the type of insurance the lead is interested in. How Can Marketing Automation Improve CRM and Lead Nurturing? Through marketing automation best practices and thoughtful integration with existing systems, the right CRM solution can significantly impact internal processes and future ROI goals. Nurturing leads is a big factor in marketing success, but qualifying leads, ensuring data is clean, and using triggers for communication are also valuable advantages as your business competes online. By automating emails, you can ensure that contact continues, whether the lead gets to the customer phase of the process or becomes stale. Even if you don’t sell the customer immediately, nurturing leads can turn a potential customer into a sale. Extra Credit https://www.agilecrm.com/blog/marketing-automation-best-practices/ https://www.act.com/blog/en/5-ways-marketing-automation-can-help-your-business-boost-revenues-and-grow/ https://keap.com/product/what-is-crm

Categories:Data AnalyticsIndustry Solutions

How to Auto-clean Your GCP Resources

A sub-dollar concierge for your Google Cloud environment A cook has to clean their kitchen at some point, right? It’s not just to humor the health inspection but also to keep things going as fluent and hygienic as possible. In the world of software engineering, this is no different: you’ll want to make sure that when you start your day, your pots and pans are clean.In this tutorial, we’ll craft a low-cost, cloud-native tool to keep your Google Cloud projects shiny. And what’s more, after completing this, you’ll be able to automate many more tasks using the same toolset!You can find a ready-to-go version of this setup called ZUNA (aka Zap Unused and Non-permanent Assets) on GitHub. Motivation When using cloud services, you can create new assets in a breeze. Even when your project is fully terraformed, you may still encounter some dirty footprints in your environment. Maybe it was that one time you quickly had to verify something by creating a Cloud SQL instance, or those cleanup scripts that occasionally fail when the integration tests go crazy.Indeed, a system can fail at any step: What if the instance running the tests breaks down? What if an unexpected exception occurs? What if the network is down? Any such failure can lead to resources not being cleaned up. In the end, all these dangling resources will cost you: either in direct resource cost, or in the form of toil¹.I do recognize that resources not being cleaned up might be the last thing on your mind when a production setup fails. Nevertheless, it’s still an essential aspect of maintaining a healthy environment, whether for development or production purposes. But don’t let this keep you from building auto-healing production setups! Deployment Overview We will create a system responsible for the automatic cleanup of specific resources in a GCP project. We can translate this into the following task: check periodically for labeled resources, and remove them.Ideally, the system is quick to set up, flexible, and low-cost. By the end of this post, our setup will look as follows: An overview of the ZUNA setup. In this post, we’ll focus on Pub/Sub Subscriptions.We will use the following GCP services to achieve this:Cloud Scheduler: takes care of automation and it will provide us with that literal press-of-the-button for manually triggered cleanups. Cloud Functions: a serverless Python 3 script to find and delete the GCP resources we’re interested in. You can easily extend such a script to include new resource types. Labels: many resources in GCP can be labeled; we’ll use this to mark temporary resources that should be removed periodically. IAM: we’ll make sure our system adheres to the least privilege principle by using a Service Account with only the required permissions.Using these services will quickly get you up and running while allowing multiple resources to be added later on. Moreover, as you’ll see later in this tutorial, this entire solution costs less than $1 per month. PrerequisitesA GCP project to test this code Permissions to use the services mentioned above Bash with the Google SDK (gcloud command) installed (you can also use Cloud Shell) Building Zuna We’ll chop this up into multiple steps:create some resources and attach labels to them make sure we have permissions to list and remove the resources craft a script that detects and removes the resources make the script executable in GCP trigger the script periodically optional cleanup Step 1: Create Resources First, we create a topic and a subscription so we have something to clean up. We’ll attach the label autodelete: true, so our script can automatically detect which resources are up for removal:#!/usr/bin/env bashGCP_PROJECT_ID=$(gcloud config get-value project)LOCATION=EU# Create Pub/Sub topicsgcloud pubsub topics create \ test-zuna-topic \ --labels=department=engineering# Create Pub/Sub subscription that should be removedgcloud pubsub subscriptions create \ test-zuna-subscription \ --topic=test-zuna-topic \ --labels=department=engineering,autodelete=true# Create Pub/Sub subscription that should NOT be removedgcloud pubsub subscriptions create \ test-zuna-subscription-dontremove \ --topic=test-zuna-topic \ --labels=department=engineering When you list the resources, you should see the labels:$ gcloud pubsub subscriptions describe test-zuna-subscriptionackDeadlineSeconds: 10expirationPolicy: ttl: 2678400slabels: autodelete: 'true' department: engineeringmessageRetentionDuration: 604800sname: projects/jonnys-project-304716/subscriptions/test-zuna-subscriptionpushConfig: {}topic: projects/jonnys-project-304716/topics/test-zuna-topic When you go to the cloud console, you should see the label appear on your newly created Pub/Sub subscription: Labels on subscriptions in the Cloud Console. Alright, we now have a resource that is up for deletion! When working with real resources, you can either label them manually or let your resource provisioning script take care of this. Next up: making sure we have permissions to delete these resources. Step 2: Get Permission To facilitate development later on, it’s best to work with a Service Account from the get-go. This account will be bound to your script when it executes and will provide it with the correct permissions to manage (in our case, delete) the resources.#!/usr/bin/env bashGCP_PROJECT_ID=$(gcloud config get-value project)gcloud iam service-accounts create sa-zuna gcloud iam service-accounts keys create sa-key.json --iam-account=sa-zuna@${GCP_PROJECT_ID}.iam.gserviceaccount.com These commands create a service account that lives in your project (identified by sa-zuna@<your-project-id>.iam.gserviceaccount.com). Next, it crafts a public-private key pair of which the private part is downloaded into the file sa-key.json. This file can now be used to authenticate your script, as we will see in the next section.First, let’s make sure that we have the correct permissions to list and remove subscriptions. Create the following file called zuna-role-definition.yaml:title: "ZUNA"description: "Permissions for ZUNA."stage: "ALPHA"includedPermissions:- pubsub.subscriptions.list- pubsub.subscriptions.deleteNext, execute the following script:#!/usr/bin/env bashGCP_PROJECT_ID=$(gcloud config get-value project)# Create a role specifically for ZUNA inside the projectgcloud iam roles create zuna --project=${GCP_PROJECT_ID} \ --file=zuna-role-definition.yaml# Bind the role to the ZUNA SA on a project levelgcloud projects add-iam-policy-binding ${GCP_PROJECT_ID} \ --member="serviceAccount:sa-zuna@${GCP_PROJECT_ID}.iam.gserviceaccount.com" \ --role="projects/${GCP_PROJECT_ID}/roles/zuna"The script creates a new role, specifically for our application ZUNA, with the two permissions we need. The role definition lives inside our project (this is important when referencing the role). Next, the role is assigned to the service account on a project level. This means that the permissions apply to all the subscription resources that live inside our project.Note: It’s also possible to assign pre-defined roles to the service account. Still, we opt for a specific custom role as the pre-defined ones would grant unnecessary permissions, e.g., consuming from a subscription. This way of working is in line with the principle of least privilege. Step 3: Delete Resources in Python It is time to remove our resource using a Python script! You can quickly setup a Python 3 virtual environment as follows:#!/usr/bin/env bashvirtualenv -p python3 venvsource venv/bin/activatepip install google-cloud-pubsub==2.4.0Now you can create a python file clean_subscriptions.py with the following contents:import osfrom google.cloud import pubsub_v1def clean_pubsub_subscriptions(project_id, delete=False): # Subscriptions # see: https://github.com/GoogleCloudPlatform/python-docs-samples/blob/efe5e78451c59415a7dcaaf72db77b13085cfa51/pubsub/cloud-client/subscriber.py#L43 client = pubsub_v1.SubscriberClient() project_path = f"projects/{project_id}" to_delete = [] # Go over ALL subscriptions in the project for subscription in client.list_subscriptions(request={"project": project_path}): # Collect those with the correct label if subscription.labels['autodelete'] == 'true': print(f'Subscription {subscription.name} is up for removal') to_delete.append(subscription.name) else: print(f'Skipping subscription {subscription.name} (not tagged for removal)') # Remove subscriptions if needed if delete: for subscription_name in to_delete: print(f'Removing {subscription_name}...') client.delete_subscription(subscription=subscription_name) print(f'Removed {len(to_delete)} subscriptions') else: print(f'Skipping removal of {len(to_delete)} subscriptions') client.close()if __name__ == "__main__": project_id = os.environ['GCP_PROJECT_ID'] clean_pubsub_subscriptions(project_id, False)Conceptually, the following happens:the script uses your project id to fetch all the subscriptions in the project it keeps only the subscriptions based on the label autodelete: true it attempts to remove all these subscriptionsNote that the actual removal is still disabled for safety reasons. You can enable it by setting the last line to clean_pubsub_subscriptions(project_id, True).You can run the script as follows:#!/usr/bin/env bashsource venv/bin/activateexport GCP_PROJECT_ID=$(gcloud config get-value project)export GOOGLE_APPLICATION_CREDENTIALS=sa-key.jsonpython clean_subscriptions.pyBecause we make use of Google’s Python client library, we can pass in our service account using the GOOGLE_APPLICATION_CREDENTIALS environment variable. The script will then automatically inherit the roles/permissions we assigned to the service account.The output of the script should resemble the following:Skipping subscription projects/jonnys-project-304716/subscriptions/test-zuna-subscription-dontremove (not tagged for removal)Subscription projects/jonnys-project-304716/subscriptions/test-zuna-subscription is up for removalSkipping removal of 1 subscriptionsThat’s correct: only one of our two subscriptions is up for removal. Now let’s move this to GCP! Step 4: Wrap in a Cloud Function We can easily wrap the previous section’s script in a Cloud Function. A Cloud Function is a piece of code that can be triggered using an HTTP endpoint or a Pub/Sub message. We’ll choose the latter as Cloud Scheduler can directly post messages to Pub/Sub: an ideal combination!import base64import osfrom clean_subscriptions import clean_pubsub_subscriptions# see: https://cloud.google.com/functions/docs/tutorials/pubsub#functions-prepare-environment-pythondef app_zuna(event, context): """ Background Cloud Function to be triggered by Pub/Sub. Args: event (dict): The dictionary with data specific to this type of event. The `data` field contains the PubsubMessage message. The `attributes` field will contain custom attributes if there are any. context (google.cloud.functions.Context): The Cloud Functions event metadata. The `event_id` field contains the Pub/Sub message ID. The `timestamp` field contains the publish time. """ print("""This Function was triggered by messageId {} published at {} """.format(context.event_id, context.timestamp)) if 'data' in event: payload = base64.b64decode(event['data']).decode('utf-8') else: payload = 'N/A' print('ZUNA started with payload "{}"!'.format(payload)) run_cleanup_steps()def run_cleanup_steps(): project_id = os.environ['GCP_PROJECT_ID'] print("ZUNA project:", project_id) print("Cleaning Pub/Sub Subscriptions...") clean_pubsub_subscriptions(project_id=project_id, delete=True) print("Pub/Sub Subscriptions cleaned!") # TODO Clean-up Pub/Sub Topics # TODO Clean-up BigQuery Datasetsif __name__ == "__main__": run_cleanup_steps()This code should be placed in main.py and is a simple wrapper for our function. You can test it locally by running python main.py. You'll notice from the output that our function from the previous step is executed; we also reserved some space for future resources (the# TODO lines).The additional function app_zuna will be the Cloud Function's entry point. Currently, it just prints the payload it receives from Pub/Sub and subsequently calls the cleanup function. This makes it behave similar to the local execution.Deploying can be done with the following script:#!/usr/bin/env bashGCP_PROJECT_ID=$(gcloud config get-value project)source venv/bin/activate# Deploy the functiongcloud functions deploy \ app-zuna \ --entry-point=app_zuna \ --region=europe-west1 \ --runtime python38 \ --service-account=sa-zuna@${GCP_PROJECT_ID}.iam.gserviceaccount.com \ --trigger-topic app-zuna-cloudscheduler \ --set-env-vars GCP_PROJECT_ID=${GCP_PROJECT_ID} \ --timeout=540s \ --quiet Note: You might want to change the region to something more suitable for your situation.Several important highlights:we refer to the Python function app_zuna to make sure this function is called when the Cloud Function is hit the service account we created earlier is used to execute the Cloud Function; this means that when our code runs in the cloud, it inherits the permissions assigned to the service account! the trigger topic refers to a Pub/Sub topic that the Cloud Function will “listen” to; whenever a message appears on there, the function will process it; the topic is created automatically the environment variable for the project is included so both local and remote (cloud) versions can operate identically the timeout is set to 9 minutes, which is the maximum at the time of writing; we set it this high as removal might take some time in the current non-parallel setupWhen you run this script, gcloud will package up your local resources and send them to the cloud. Note that you can exclude specific resources using the.gcloudignore file, which is created when you run the command the first time. When the upload completes, a Cloud Function instance is created that will run your code for every message that appears in the Pub/Sub topic.#!/usr/bin/env bashgcloud pubsub topics publish app-zuna-cloudscheduler \ --message="Hello ZUNA!"Note: In case you get an error that resembles Cloud Functions API has not been used in project ... before or it is disabled. you still need to enable the Cloud Functions API in your project. This can easily be done with the following commands or using the Cloud Console (see this documentation):gcloud services enable cloudfunctions.googleapis.comgcloud services enable cloudbuild.googleapis.comYou can easily test the cloud function by sending a message to the newly created Pub/Sub topic:Or in the Cloud Console using the “Publish Message” option directly on the topic: Send data to the Pub/Sub topic to test the Cloud Function.You can view the logs using gcloud:$gcloud functions logs read app-zuna --region=europe-west1D app-zuna 3brcusf1xgve 2021-02-28 20:37:12.444 Function execution started app-zuna 3brcusf1xgve 2021-02-28 20:37:12.458 This Function was triggered by messageId 2041279672282325 published at 2021-02-28T20:37:10.501Z app-zuna 3brcusf1xgve 2021-02-28 20:37:12.458 app-zuna 3brcusf1xgve 2021-02-28 20:37:12.458 ZUNA started with payload "Hello ZUNA"! app-zuna 3brcusf1xgve 2021-02-28 20:37:12.458 ZUNA project: jonnys-project-304716 app-zuna 3brcusf1xgve 2021-02-28 20:37:12.458 Cleaning PubSub Subscriptions... app-zuna 3brcusf1xgve 2021-02-28 20:37:13.139 Skipping subscription projects/jonnys-project-304716/subscriptions/test-zuna-subscription-dontremove (not tagged for removal) app-zuna 3brcusf1xgve 2021-02-28 20:37:13.139 Subscription projects/jonnys-project-304716/subscriptions/test-zuna-subscription is up for removal app-zuna 3brcusf1xgve 2021-02-28 20:37:13.139 Skipping subscription projects/jonnys-project-304716/subscriptions/gcf-app-zuna-europe-west1-app-zuna-cloudscheduler (not tagged for removal) app-zuna 3brcusf1xgve 2021-02-28 20:37:13.139 Removing projects/jonnys-project-304716/subscriptions/test-zuna-subscription... app-zuna 3brcusf1xgve 2021-02-28 20:37:17.869 Removed 1 subscriptions app-zuna 3brcusf1xgve 2021-02-28 20:37:17.945 PubSub Subscriptions cleaned!D app-zuna 3brcusf1xgve 2021-02-28 20:37:17.946 Function execution took 5505 ms, finished with status: 'ok'D app-zuna 9cbqexeyaie8 2021-02-28 20:38:49.068 Function execution started app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.147 [2021-02-28 20:38:50 +0000] [1] [INFO] Starting gunicorn 20.0.4 app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.148 [2021-02-28 20:38:50 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1) app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.148 [2021-02-28 20:38:50 +0000] [1] [INFO] Using worker: threads app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.202 [2021-02-28 20:38:50 +0000] [6] [INFO] Booting worker with pid: 6 app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.232 This Function was triggered by messageId 2041279609298300 published at 2021-02-28T20:38:48.348Z app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.233 app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.233 ZUNA started with payload "Test ZUNA using Cloud Console"! app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.233 ZUNA project: jonnys-project-304716 app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.233 Cleaning PubSub Subscriptions... app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.851 Skipping subscription projects/jonnys-project-304716/subscriptions/test-zuna-subscription-dontremove (not tagged for removal) app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.851 Skipping subscription projects/jonnys-project-304716/subscriptions/gcf-app-zuna-europe-west1-app-zuna-cloudscheduler (not tagged for removal) app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.851 Removed 0 subscriptions app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.852 PubSub Subscriptions cleaned!D app-zuna 9cbqexeyaie8 2021-02-28 20:38:50.854 Function execution took 1787 ms, finished with status: 'ok'Or in the Cloud Console: Logs of our newly deployed Cloud Function.Notice how our print statements appear in the output: the Pub/Sub message payload is logged, as well as the informational messages about which subscriptions have been deleted. Step 5: Automate the Process We now have a fully functioning cleanup system; the only missing piece is automation. For this, we employ Cloud Scheduler, which is a managed cron-service, for those of you who are familiar with it.#!/usr/bin/env bashgcloud scheduler jobs create pubsub \ zuna-weekly \ --time-zone="Europe/Brussels" \ --schedule="0 22 * * 5" \ --topic=app-zuna-cloudscheduler \ --message-body "Go Zuna! (source: Cloud Scheduler job: zuna-weekly)"In this script, we create a new scheduled “job” that will publish the specified message to the Pub/Sub topic that our Cloud Function is listening to.Note that the timezone is set specifically for my use case. Omitting this would make it default to Etc/UTC. Hence, you might want to change this to accommodate your needs. The TZ database names on this page should be used.When creating the job, you might get a message that your project does not have an App Engine app yet. You should create one before continuing², but make sure you choose the correct region.Your output of the Cloud Scheduler job creation should look like this:name: projects/jonnys-project-304716/locations/europe-west1/jobs/zuna-weeklypubsubTarget: data: R28gWnVuYSEgKHNvdXJjZTogQ2xvdWQgU2NoZWR1bGVyIGpvYjogenVuYS13ZWVrbHkp topicName: projects/jonnys-project-304716/topics/app-zuna-cloudschedulerretryConfig: maxBackoffDuration: 3600s maxDoublings: 16 maxRetryDuration: 0s minBackoffDuration: 5sschedule: 0 22 * * 5state: ENABLEDtimeZone: Europe/BrusselsuserUpdateTime: '2021-03-07T12:46:38Z'Every Friday, this scheduler will trigger. But, we get the additional benefit of manual triggering. Option 1 is the following gcloud command:#!/usr/bin/env bashgcloud scheduler jobs run zuna-weeklyOption 2 is via the UI, where we get a nice RUN NOW button: Our Cloud Scheduler job, complete with a “RUN NOW” button.Both options are great when you’d like to perform that occasional manual cleanup. After execution, you should see the new output in your Cloud Function’s logs.Warning: Triggering a cleanup during a test run (e.g., integration tests) of your system might fail your tests unexpectedly. Moreover, Friday evening might not make sense for your setup if it can break things. You don’t want to get a weekly alert when you’re enjoying your evening. Be careful! Step 6: Cleanup Well, when you’re done testing this, you should cleanup, right? The following script contains the gcloud commands to cleanup the resources that were created above:#!/usr/bin/env bashGCP_PROJECT_ID=$(gcloud config get-value project)# Remove virtual environmentrm -rf venv# Remove Pub/Sub subscriptionsgcloud pubsub subscriptions delete test-zuna-subscriptiongcloud pubsub subscriptions delete test-zuna-subscription-dontremove# Remove Pub/Sub topicsgcloud pubsub topics delete test-zuna-topic# Cloud Schedulergcloud scheduler jobs delete zuna-weekly --quiet# Cloud Functiongcloud functions delete app-zuna --region=europe-west1 --quiet# Rolesgcloud iam roles delete zuna --project=${GCP_PROJECT_ID}# Service Accountgcloud iam service-accounts delete sa-zuna@${GCP_PROJECT_ID}.iam.gserviceaccount.com Warning: Some of the resources, such as service accounts or custom roles, might cause issues when you re-create them soon after deletion as their internal counterpart is not immediately deleted.That’s it. You’re all done now! Go and enjoy the time you gained, or continue reading to find out how much this setup will cost you. Pricing At the beginning of this tutorial, we stated that we chose these specific services to help keep the costs low. Let’s investigate the cost model to verify this is the case.Cloud Scheduler has three free jobs per month per billing account ($0.10 per additional job). [source] Cloud Functions has 2M free invocations per billing account ($0.40 per 1M additional invocations + resource cost). We would have around four function calls per month for our use case, which is negligible. Note that at the time of writing, the default memory of a Cloud Function is set to 256MB, which we can tune down to128MB using the deployment flag --memory=128. This adjustment will make every invocation even cheaper. [source] The first 10 gigabytes for Cloud Pub/Sub are free ($40 per additional TiB). Even as the messages are counted as 1000 bytes minimum, we are still in the ballpark of a few kilobytes per month. So again, a negligible cost. [source]Hence, even for a setup where the free tiers are not applicable anymore, we don’t expect a cost that is higher than $0.20 per month. Next Steps Adding more resource types would definitely be useful. Checkout the ZUNA repository for hooks to cleanup Dataflow jobs, Pub/Sub topics, BigQuery datasets & tables, etc.You could check when certain resources were created and build in an expiration time. This will significantly reduce the risk of interfering with test runs. It’s also good to know that some services have expiration built-in!³Terraforming this setup is also a good idea. In that case, it could automatically be part of your deployment pipeline. Conclusion We’ve set up a Cloud Function that scans your current project for resources labeled with autodelete: true and removes them. The Cloud Function only has limited permissions and is triggered periodically using Cloud Scheduler and a Pub/Sub topic.We succeeded in building an automatic system that we can also trigger manually. It’s flexible as we can easily add code to clean up other types of resources. It was quick to set up, especially since we kept the bash scripts (although terraforming it would be nice). The cost is low as the service usage is minimal, and all services use a pay-as-you-go model. Using this setup will probably keep you in the free tier.Finally, since the components that we used are generic, the resulting setup translates directly into a helpful blueprint for countless other automation use cases.Eliminating Toil — Site Reliability Engineering — https://sre.google/sre-book/eliminating-toil/ It seems Cloud Scheduler is still tied to App Engine, hence the requirement to have an App Engine app. Indeed, check out Pub/Sub Subscription expiration and BigQuery table expiration!Originally published at https://connectingdots.xyz on March 28, 2021.

Categories:Data AnalyticsDevOps and SREServerless

Tips for Building Better Machine Learning Models

  Photo by Ivan Aleksic on Unsplash Hello, developers! If you have worked on building deep neural networks, you might know that building neural nets can involve performing a lot of experimentation. In this article, I will share some tips and guidelines that I feel are pretty useful and can use to build better deep learning models, making it a lot more efficient for you to stumble upon a good network.Also, you may need to choose which of these tips might be helpful in your scenario; everything mentioned in this article could straight up improve your models’ performance. A High-Level Approach for Hyperparameter Tuning One of the painful things about training deep neural networks is the many hyperparameters you have to deal with constantly. These could be your learning rate α, the discounting factor ρ, and epsilon ε if you are using the RMSprop optimizer (Hinton et al.) or the exponential decay rates β₁ and β₂ if you are using the Adam optimizer (Kingma et al.). You also need to choose the number of layers in the network or the number of hidden units for the layers; you might be using learning rate schedulers and want to configure that and a lot more! We need ways to organize our hyperparameter tuning process better.A common algorithm I usually tend to use to organize my hyperparameter search process is the random search. Though there exist improvements to this algorithm, I typically end up using random search. Let’s say, for this example, you want to tune two hyperparameters, and you suspect that the optimal values for both would be somewhere between one and five. The idea here is to instead of picking 25 values to try out [like (1, 1) (1, 2), etc.] systematically, it would be more effective to select 25 points at random. Based on Lecture Notes of Andrew Ng Here is a simple example with TensorFlow where I try to use random search on the Fashion-MNIST dataset for the learning rate and the number of units:Radom Search in TensorFlow.I would not be talking about the intuition behind doing so in this article. However, you could read about it in this article I wrote some time back. Use Mixed Precision Training for Large Networks Growing the size of the neural network usually results in improved accuracy. As model sizes grow, the memory and compute requirements for training these models also increase. While using mixed-precision training, according to Paulius Micikevicius and colleagues, the idea is to train deep neural networks using half-precision floating-point numbers to train large neural faster with no or negligible decrease in the performance of the networks. However, I would like to point out that this technique should be used only for large models with more than 100 million parameters.While mixed-precision would run on most hardware, it will only speed up models on recent NVIDIA GPUs, for example, Tesla V100 and Tesla T4 and Cloud TPUs. To give you an idea of the performance gains with using mixed precision when I trained a ResNet model on my Google Cloud Platform Notebook instance (consisting of a Tesla V100) I saw almost 3 times in the training time and almost 1.5 times on a Cloud TPU instance with near to no difference inaccuracies. To further increase your training throughput, you could also consider using a larger batch size (since we are using float16 tensors, you should not run out of memory).It is also rather easy to implement mixed precision with TensorFlow, the code to measure the above speed-ups was taken from this example. If you are looking for more inspiration to use mixed-precision training, here is an image demonstrating speedup for multiple models by Google Cloud on a TPU:Speedups on a Cloud TPU Use Grad Check for Backpropagation In multiple scenarios, I have had to custom implement a neural network, and usually implementing the backpropagation is the aspect prone to mistakes and is also difficult to debug. It could also occur that with an incorrect backpropagation, your model learns something that might look reasonable, thus making it even more difficult to debug. So, how cool would it be if we could implement something that could allow us to debug our neural nets easily?I often consider using gradient checks when implementing backpropagation to help me debug it. The idea here is to approximate the gradients using a numerical approach. If it is close to the calculated gradients by the backpropagation algorithm, then you could be more confident that the backpropagation was implemented correctly.As of now, you could consider using this expression in standard terms to get a vector, which we will call dθ[approx]Calculate approx gradientsIn case you are looking for the intuition behind this, you could find more about it in this article by me. So, now we have two vectors dθ[approx] and dθ (calculated by backprop). And these should be almost equal to each other. You could simply compute the Euclidean distance between these two vectors and use this reference table to help you debug your nets:Reference Table Cache Datasets Caching datasets is a simple idea but one I have not seen much to be used. The idea here is to go over the dataset in its entirety and cache the dataset either in a file or in memory (if it is a small dataset). This should save you from performing some expensive CPU operations like file opening and data reading from being executed during every single epoch. Well, this also means that your first epoch would comparatively take more time since you would ideally be performing all operations like opening files and reading data in the first epoch, and then you would cache them. The subsequent epochs should be a lot faster since you would be using the cached data in the subsequent epochs.This particularly seems like a very simple to implement the idea, indeed! Here is an example with TensorFlow showing how one can very easily cache datasets and also shows the speedup with implementing this idea: Example of Caching Datasets and the Speedup with It Common Approaches to Tackle Overfitting If you have worked on building neural networks, it is arguable overfitting or underfitting might be one of the most common problems you face. This section talks about some common approaches that I usually use when tackling these problems. You probably know this, but high bias will cause us to miss a relation between features and labels (underfitting), and high variance would cause capturing the noise and overfitting to the training data.I believe the most promising way to solve overfitting is to get more data, though you could also augment data. A benefit of deep neural networks is that their performance improves as they are fed more and more data. A benefit of very deep neural networks is that their performance continues to improve as they are fed larger and larger datasets. However, in a lot of situations, it might be too expensive to get more data (or simply infeasible to do), so let’s talk about a couple of methods you could use to tackle overfitting.Ideally, this is possible to do in two manners: either changing the architecture of the network or by applying some modifications to the network’s weights. A simple manner to change the architecture such that it doesn’t overfit would be to use random search to stumble upon a good architecture or try pruning nodes from your model. We already talked about random search, but in case you want to see an example of pruning, you could take a look at the TensorFlow Model Optimization Pruning Guide.Some common regularization methods I tend to try out are: Dropout: Randomly remove x% of input.  L2 Regularization: Force weights to be small reducing the possibility of overfitting. Early Stopping: Stop the model training when performance on the validation set starts to degrade.  Thank You Thank you for sticking together with me till the end. I hope you will benefit from this article and incorporate these in your own experiments. I am excited to see if it helps in improving the performance of your neural nets, too. If you have any feedback or suggestions for me, please feel free to send them over! @rishit_dagli 

Categories:AI and Machine LearningData AnalyticsDevOps and SRE