Learn | C2C Community

Key Takeaways from 2Gather Los Angeles: The Future is Now, Security and AI

Lytics, Wpromote, Google Cloud. 2Gather Los Angeles  June 6th 2023  Buzz Hays, Global Lead Entertainment Industry Solutions and Iman Ghanizada @iman, Global Head of Autonomic Security at Google Cloud opened the event by discussing that the purpose of AI is to improve what people are already doing. Whether they are writers or animators in a designated industry, AI aims to enhance the paintbrush for an artist. With trying to provide businesses with better tools, many questions surrounding security and data arose. One major question was regarding how to collect effective data that would result in projects using AI. A primary example that was discussed during the event was the entertainment industry. Many applications of AI within this industry need a sufficient amount of customer data to be developed. For the entertainment industry, identifying ad breaks and suggested content for streaming platforms are examples of AI use cases. Jascha Kaykas-Wolff from Lytics stated that mature organizations can adapt to data pipelines. Working across different departments makes the decision making process a lot easier, because it demonstrates how data is useful to certain parts of the organization. Paul Dumois , the CTO of Wpromote, also stated that businesses need to focus on specific problems to solve and retrieve data that will be helpful in providing solutions to these issues. Overall, the discussions between the panel and the audience highlighted that AI has many moving parts and trends. An organization should focus on a specific area and start with a singular project to learn about the challenges and dynamics of working with AI in real time. Additionally, analyzing the core metrics of a business and receiving top-down support can help to utilize resources when setting up projects or tasks associated with AI.   

Categories:AI and Machine LearningData AnalyticsIdentity and SecurityDatabases

2Gather Core Concepts: Sunnyvale Developer's Event

C2C’s first event for developers took place on April 26th, 2023 in Sunnyvale, CA. The event focused on data analytics and how organization can optimize their data. Below are some data buzzwords and their definitions, an overview of Dataplex, a product that was demonstrated at the event, and a summary of the key topics discussed.  Data warehouse: A system that is used for reporting and data analysis. A data warehouse is a large storage of data that has been accumulated from a range of sources and helps businesses with decision-making processes.Data lake: A centralized infrastructure that is designed to store and process large amounts of data. A data lake can store data in its original form and process it in any variety. Data lakes are scalable platforms that allow organizations to ingest data from any source at multiple speeds. Data Lakehouse: A modern data platform that is a combination of a data warehouse and a data lake. BigQuery: Serverless architecture built as a data warehouse that works across clouds while scaling with your data. BigQuery allows users to pick the right feature set for workload demands and can match these needs in real time. It can also analyze data across multiple clouds and securely exchange data sets internally or across businesses, making it a platform with scalable analytics. BigLake: A storage engine that unifies data warehouses and lakes through BigQuery to gain access to data.   DataplexDataplex is a lake administration and data governance tool. It enables organizations to discover, manage and evaluate their data across data lakes and data warehouses. Dataplex also has a variety of features that allow organizations to choose specific items to easily manage data. For example, the tag management feature ensures that specific users have access to the right data by setting policy templates and tags with different sets of data. Dataplex also has automated data quality management features. For example, if a report quotes incorrect numbers, the data can be corrected with automated data tools rather than manually.   Data and and Real Time Analytics A major point raised at the developer’s event was that data is rooted in an event-driven architecture. For instance, customers who work in finance get highly interested in data in real time during specific periods. This interest is event-based, as it usually occurs when the industry reaches a quarter close. Moving data around can be a difficult task; however, there are certain cloud features that can solve this issue, such as Dataplex. The main concern surrounding organizing data is access control and governance. Customers want to know that steps have been taken to ensure that unauthorized users do not gain access to private data. Visibility and transparency are also core tenets when discussing access to data and its governance tools.

Categories:Data AnalyticsStorage and Data TransferGoogle Cloud Product UpdatesDatabases

2Gather Sunnyvale: Strategies Surrounding Data Optimization in Cloud Technology

An engaged audience eagerly listens as Sanjay Chaudhary, Vice President of Product Management at Exabeam explains how hackers are able to use MFA bombing to hack employee emails in order to gain confidential company information. This is one of many topics surrounding data optimization discussed at the 2Gather event in Sunnyvale, California on February 3rd. “Not coming from a technical background, I wasn’t sure what to expect at my first event. However, the panel’s rich and engaging narrative made data security into an amazing story to listen to!” said June Lee, Senior Program Manager at Workspot. The first C2C event of the year embodied the essence of forming meaningful connections. At the beginning of the event, all attendees were asked to introduce themselves to two other individuals they have not spoken to. This created a strong sense of openness and going beyond comfort zones to spark personable interactions. Through peer to peer conversation, guests connected on driving advocacy and feedback surrounding how to use Google Cloud in regards to data analytics. The event was composed of a diverse panel of Google partners including NetApp, Exabeam, Lytics as well as Cisco systems. “Everything starts with a customer,” stated Bruno Aziza (@BrunoAziza), the Head of Data and Analytics at Google. This approach is the driving force behind Google building close relationships with their customers, understanding their journeys and what challenges can arise, one of these being receiving value from data that has been collected. “A large amount of organizations are struggling to turn data into value and money is being spent on data systems, yet companies are not always benefiting from it” says Bruno. Organizations now have access to large sets of data, however, critical pieces of data are not typically within their internal environment. A step in the right direction is to create data products that assist with tackling this issue. One of the major keynote speakers, Vishnudas Cheruvally, Cloud Solution Architect at Netapp provided insight on solutions that the organization is working on. “One of the main goals of Netapp is to build an environment that is rooted in trust and to create an infrastructure where users do not have to worry about basic tasks associated with optimizing data,” says Vishnudas. Through billing API’s and resizing data volume with Google Cloud services, customers have accessible tools that allow them to make informed decisions. This includes creating a customized dashboard to observe what is happening within their environment. Along with data optimization, emerging global trends and the impact it has on data sovereignty was also a recurring topic that captivated the audience. “Data sovereignty and upcoming global trends within data security were key topics discussed at the event and are also motivating factors of solutions developed by Netapp,” stated Vishnudas. “Everything starts with a customer.”  “An emerging trend is using excessive resources through multiple clouds and essentially creating a wasteland,” says Jascha Kaykas-Wolff (@kaykas), President of Lytics. This conversation sparked the topic of global trends, data sovereignty and cloud strategy. With high amounts of data being stored by organizations, questions begin to arise in regards to ownership. “Data has to live in a specific area and there has to be control or sovereignty over it,” says Jascha. The panel engaged in a conversation that covered dealing with shifting global trends and how it impacts customers. Sanjay Chaudary brings in a product management perspective, which is rooted in solving customer problems. “With more regulations being created, data cataloging is essential in order for customers to understand what is critical in terms of their data and security threats. The core principle of data is the same, the most important thing is being able to detect a problem with the data and how fast it can be addressed.” says Sanjay. From ownership to data security, the discussion highlighted a variety of fresh perspectives. What stood out amongst guests is the diversity of the panel that brought in differentiating views. “The event had extremely thought-provoking insights stemming from the issues of modern day data analytics and how it impacts a customer base as well as a panel that discussed their personal experiences with data,” said Dylan Steeg (@Dylan_Steeg), VP of business development at Aible. Both speakers and guests then attended a networking session following the event. Over refreshments and drinks, guests were able to mingle with one another to further expand the conversation. Most importantly, they were able to create meaningful connections. Connections that may lead to future collaborative efforts as well as identifying solutions that can take data optimization to new heights.You and your organization can also build these connections. To start, join C2C as a member today. We’ll see you at our next 2Gather event! Extra Credit:  

Categories:Data AnalyticsCloud OperationsStorage and Data TransferDatabasesGoogle Cloud Partners

Building for Scalability, Block by Block: An Interview with Carrefour Links CTO Mehdi Labassi

When I ask Mehdi Labassi (@Mehdi_Labassi), CTO of Carrefour Links, what he does outside of work, the first thing he mentions is his family. Mehdi spends a lot of his free time playing with his kids. Sometimes they play video games on Nintendo Switch, but they also enjoy hands-on activities like building with Legos. Lego is a popular interest among tech practitioners building products on Google Cloud––after all, the four letters in “Lego” can also be used to spell “Google.” This connection turns out to be a fitting point of departure for an examination of Mehdi’s journey to a decision-making role on the technical team at Carrefour Links.Mehdi began his career as a software engineer, working first in air travel and then moving on to Orange, “the one major telco in France.” At Orange, Mehdi led the company’s Google Cloud skills center and took part in a major migration to Google Cloud from a historically on-premises infrastructure. “We had a really strong on-prem culture, so we had our own data centers, our own Hadoop clusters with thousands of machines, and the shift to cloud-based services was not something natural,” he says. “There was a lot of resistance, and we needed to really show that this gives us something.”Proving the value of the Cloud to a historically on-prem organization required zeroing in on a specific technical limitation of the existing infrastructure: “As I was driving the big data platforms and the recommendations, I do remember we had a lot of issues in terms of scalability.” Google Cloud turned out to be the perfect solution to this problem. “Then we tried the Cloud, and we found that instant scalability,” says Mehdi. “That’s another level compared to what we had on prem, so this is really the proof by experimentation.” “You assemble and program the thing, and then you need to understand how each brick works.” When Carrefour introduced Carrefour Links, its cloud-hosted retail media and performance platform, in Spring of 2021, Mehdi was immediately interested in getting involved. He reached out directly to the executive team and joined as CTO three months after the company declared the platform. “I joined when the thing just got in production, the first version, the V1. That was kind of a proof of concept,” Mehdi says. In the time since––only a little over a year––the venture has grown considerably: “We have a lot more data from different verticals, everything that’s related to transactions, to the supply chain ecosystem, to finance, a lot more insights, and we are exploring machine learning, AI use cases… so we are scaling even in terms of use cases.”Even a fast-growing platform run on Google Cloud, however, will encounter challenges as it continues to scale. “The first thing is the ability to scale while keeping FinOps under control,” Mehdi says. As he sees it, this is a matter of “internal optimization,” something he believes Carrefour Links handles particularly well. “The second thing is how to provide what I call a premium data experience for our customers, because we are dealing with petabyte-scale pipelines on a daily basis, and however the end user connects to our data solutions, we want him to have instantaneous insights,” he adds. “We leverage some assets and technologies that are provided by Google Cloud to do this.”These are challenges any technical professional managing products or resources on the cloud is likely to face. Overcoming these challenges is also what makes new solutions on the cloud possible. What competencies do IT professionals need to be able to overcome these challenges and pursue these solutions? According to Mehdi, “a good engineer working on the cloud, with this plethora of tools, he needs to be good at Lego.” Mindstorms, Lego’s line of programmable robots, he explains, require a lot of the same skills to build as machines and systems hosted on the cloud. “You assemble and program the thing, and then you need to understand how each brick works,” he says. “I really find a lot of similarities between these activities and what we are doing in our day job.” Extra Credit:  

Categories:InfrastructureIndustry SolutionsDatabasesRetail

Cloud Technologies: Boon for Sustainable Future (a Fireside Chat with SpringML)

The effort to combat climate change is such a major undertaking that no metaphor does it justice. It will take more than “all hands on deck.” We need to be more than “on board.” Every one of us has a crucial role to play. That’s why the data we have must be available to the entire public, not just governments and corporations.In October 2021, Google Cloud established partnerships with five companies engaged in environmental data collection efforts: CARTO, Climate Engine, Geotab, Egis, and Planet Labs. These companies are working with Google to make their datasets available globally on Google Cloud. As a 2020 Google Cloud Partner of the Year and a company with a stated commitment to sustainability, C2C foundational partner SpringML is excited to raise awareness of this initiative.In this fireside chat, Lizna Bandeali and SpringML’s Director of Google Cloud Services Masaf Dawood explore the background and the implications of this recent effort. Key points discussed include ease, transparency, and accessibility of data, and a focus on actionable insights. With the datasets available and Google Cloud Platform tools like BigQuery, organizations and individuals working in environmental science, agriculture, food production, and related fields can make informed predictions about everything from weather patterns to soil quality. These organizations and individuals can use these predictions to plan future resource use around vital sustainability guidelines. Watch the full video below:Are you an individual or a decision-maker at an organization pursuing sustainability? What are you doing to take up this effort? Contact us on our platform and tell us your story! 

Categories:Data AnalyticsDatabasesSustainability

To Collate or To Analyze: Cloud Bigtable vs. BigQuery

The Google Cloud Platform hosts all kinds of tools for data storage and management, but two of the most versatile and popular are Bigtable and BigQuery. While each service is a database, the key difference between the two lies in their names. Bigtable (BT) is literally a “big table” that scales to petabytes if not terabytes for storing and collecting your data. BigQuery (BQ), on the other hand, conducts a “big query” into your massive troves of data. Each database has other unique attributes that define when and how to use it. These topics, along with use cases, case stories, and costs associated with each product, are covered in the following sections. Bigtable  Bigtable, Google Cloud’s fully-managed database for hefty analytical and operational workloads, powers major Google products like Google Search, Google Maps, and Gmail. The database supports high read/write per second speed, processes reads/writes at ultra-low latency, and scales to billions of rows and thousands of columns for massive troves of data. Bigtable is ideal for Cloud data visualization products, such as BigQuery, DataFlow, and DataProc. It integrates well with Big Data tools such as Hadoop, DataFlow, Beam, and Apache HBase. Bigtable Use CasesBigtable is best-used for instances with lots of data, such as the following:  Time series data, e.g., CPU usage over time for multiple servers. Financial data, e.g., currency exchange rates. Marketing data, like customers’ purchase histories and preferences. Internet of things data, such as usage reports from home appliances. Fraud detection, i.e. detecting fraud in real time on ongoing transactions. Product recommendation engines to handle thousands of personalized recommendations. BigQuery BigQuery is Google Cloud’s serverless fully-managed service that helps you ingest, stream, and analyze massive troves of information in seconds. In contrast to Bigtable, BigQuery is a query engine that helps you import and then analyze your data.Since BigQuery uses SQL (Structured Query Language), this database is ideal for Amazon Redshift, which uses SQL to analyze structured and semi-structured data across data warehouses, operational databases, and data lakes. BigQuery Use CasesBigQuery is commonly used for instances that include: Real-time fraud detection; BQ ingests and analyzes massive amounts of data in real-time to identify or prevent unauthorized financial activity. Real-time analytics; BQ is immensely useful for businesses or organizations that need to analyze their latest business data. Log analysis; BQ reviews, interprets, and understands computer-generated log files. Complex data pipeline processing; BQ manages and interprets the steps of one or more complex data pipelines generated by source systems or applications. Similarities Between Bigtable and BigQuery Each database boasts ultra-low latency on the order of single-digit microseconds, high-performance and speed on the order of 10,000 rows per second, and powerful scalability that enables you to scale (or descale) for additional storage capacity. Both are end-to-end managed and thoroughly secure as they encrypt at-rest and transit data. Differences Between Bigtable and BigQuery While Bigtable collates and manages your data, BigQuery collates and analyzes those troves of data.Bigtable resembles an Online Transaction Processing (OLTP) tool, where you can execute a number of transactions occurring concurrently—such as online banking, shopping, order entries, or text messages. BigQuery, in contrast, is ideal for OLAP (Online Analytical Processing) — for creating analytical business reports or dashboards. In short, for anything related to business analysis, such as for scrolling through last year’s logs to see how to improve business. While Bigtable is NoSQL — mandatory for its flexible database — BigQuery uses SQL, making it ideal for performing complex queries on heavy-duty transactions. Don’t expect BigQuery to be used as a regular relational database or for CRUD (to Create, Read, Update, and Delete data). It’s immutable, which means its information is encoded so that it can’t be edited or removed. Case Studies Companies use Bigtable for structuring and managing their massive troves of data,while they use BigQuery for mining insight from these troves of data. Below are a few examples of how businesses have used each in practice: Bigtable Digital fraud detection and payment solution company Ravelin uses Bigtable to store and query 1.2 billion transactions of more than 230 million active users.  AdTech provider OpenX uses Bigtable to serve more than 30,000 brands, more than 1,200 websites, and more than 2,000 premium mobile apps, and processes more than 150 billion ad requests per day. Dow Jones DNA uses Bigtable for fast, robust storage of key events that the company has documented in over 30 years of news content.  BigQuery UPS uses BigQuery to achieve precise package volume forecasting for the company. Major League Baseball is expanding its fan base with highly-personalized immersive experiences. They analyze their marketing  using BigQuery. The Home Depot uses BigQuery to manage customer service and keep 50,000 items routinely stocked across 2,000 stores.  Costs When using BigQuery, you pay for storage (based on how much data you store). There are two storage rates: active storage ($0.020 per GB), or long-term storage ($0.010 per GB). With both, the first ten GB are free each month. You also pay for processing queries. Query costs are either on-demand (i.e., charged by the amount of data processed per query), or flat-rate.BigQuery also charges for certain other operations, such as streaming results and the use of its Storage API. Loading and exporting data is free. For details, see BigQuery pricing. Using Bigtable, you pay for storage and bandwidth. Here’s all you need to know on Bigtable pricing across countries.If you’re ready to start using or testing either product for a current or upcoming project, you can create a Bigtable instance using Cloud Console’s project selector page, or Cloud’s Bigtable Admin API. BigQuery is accessible via Google Cloud Console, The BigQuery REST API, or an external tool such as a Jupyter notebook or business intelligence platform. Extra Credit: 

Categories:Data AnalyticsDatabases

Bringing More Insights to the Table with Cloud Bigtable

Cloud Bigtable powers major Google products like Search and Maps. You can use this incredibly scalable database for analyzing large workloads, such as your customers’ purchase histories and preferences, or currency exchange rates. Bigtable is cheap, scalable, fast, and reliable. This article outlines Bigtable’s attributes, uses, strengths, and weaknesses so you can evaluate whether it’s the right tool for you in any context. What is Bigtable? Bigtable is Google Cloud’s fully-managed, NoSQL database for large analytical and operational workloads. This innovative database: Supports high read/write speed per second. Processes these reads/writes at ultra-low latency, on the order of single-digit microseconds. Scales to billions of rows and thousands of columns, adapting itself to terabytes, if not petabytes, of data.  Bigtable is ideal for Cloud data visualization products, such as BigQuery, DataFlow, and DataProc. You can use Cloud Bigtable in various ways, such as for storing marketing data, financial data, and Internet of Things data (e.g., usage reports from energy meters and home appliances). You can also use it for storing time-series data (e.g., CPU usage over time for multiple servers) and graph data (e.g., hospital patients’ dosage regimen over a period of years). What does Bigtable bring to the table? Bigtable is a dynamic product with many identifiable assets. The following are the three that most set it apart from the other products in its field: Speed: The database processes quantities of reads/ writes on the order of 10,000 rows per second. Scalability: You can stretch that table by adding or removing nodes. Each node - or compute resource that Bigtable uses to manage your data - gives you additional storage capacity.  Reliability: Bigtable gives you key-level performance, stability, and tools for debugging that usually takes far longer on a self-hosted data store.  How does Bigtable work? Cloud Bigtable is superbly simple. The following four functions will alow you to execute almost any project you’re using Bigtable to support: Scale or descale the Table by adding or removing nodes. Replicate your data by adding clusters; replication starts automatically. Clusters describe where your data is stored and how many nodes are used for your data.  Group columns that relate to each other into “column families” for organizational purposes. Incorporate time stamps by creating rows for each new event or measurement instead of adding cells in existing rows. (This makes Bigtable great for time series analysis). Bigtable integrates well with Big Data tools such as Hadoop, DataFlow, Beam, and Apache HBase, making it a cinch for users to get started. Case Histories Some of the world’s most recognizable companies and institutions have used Bigtable for projects managing massive amounts of data. A small but representative sample of these projects follows below. Dow JonesDow Jones, one of the world’s largest news organizations, used Bigtable to structure its Knowledge Graph. The tool compressed key global events from 1.3 billion documents over a 30-year period into Bigtable, for users to mine for insights. Users could also customize the Graph to suit their needs. “With the help of Cloud Bigtable,” a spokesperson from Dow Jones partner Quantiphi said, “we can easily store a huge corpus of data that needs to be processed, and BigQuery allows data manipulations in split seconds, helping to curate the data very easily.” RavelinRavelin, a digital fraud detection and payment solution company for online retailers, uses Bigtable to effortlessly and seamlessly store and query over 1.2 billion transactions of the clients of its more than 230 million active users. Ravelin also profits from Bigtable’s encrypted security  mechanisms. According to Jono MacDougall, Principal Software Engineer at Ravelin: “We like Cloud Bigtable because it can quickly and securely ingest and process a high volume of data.” AdTechAdTech provider OpenX serves more than 30,000 brands, more than 1,200 websites, and more than 2,000 premium mobile apps. It also processes more than 150 billion ad requests per day and about 1 million such requests per second, so it needed a highly scalable, extremely fast, fully managed database to fit its needs. Bigtable provided the perfect solution. How do I know if Bigtable is right for me? As powerful as Bigtable is, it’s not a good choice for every situation. In certain contexts, you’ll want to keep other options in mind. For example: Choose SQL-structured Spanner if you need ultra-strong consistency.  Use NoSQL Cloud Firestore if you want a flexible data model with strong consistency. Opt for SQL-based BigQuery if you need an enterprise data warehouse that gives you insights into your massive amounts of business data.   Ready to set up Bigtable? You can create a Bigtable instance using Cloud Console’s the project selector page, or the Cloud Bigtable Admin API. However, Bigtable isn’t free. Users pay by type of instance and amount of nodes, how much storage a table uses, and how much bandwidth Bigtable uses overall. (Here’s all you need to know on Bigtable pricing across countries).Next time you’re looking to analyze large workloads, take a minute to check out Bigtable. It could help you crunch all that information in a matter of minutes.Have you ever used Bigtable? For what kinds of projects? How did it work for you? Start a conversation in one of our community groups and share your story! Extra Credit  

Categories:Data AnalyticsGoogle Cloud Product UpdatesDatabases

Get to Know the Google Cloud Data Engineer Certification

Personal development and professional development are among the hottest topics within our community. At C2C, we’re passionate about helping Google Cloud users grow in their careers. This article is part of a larger collection of Google Cloud certification path resources.The Google Cloud Professional Data Engineer certification covers highly technical knowledge concerning how to build scalable, reliable data pipelines and applications. Anyone who intends to take this exam should also be comfortable selecting, monitoring, and troubleshooting machine learning models.In 2021, the Professional Data Engineer rose to number one on the top-paying cloud certifications list, surpassing the Professional Cloud Architect, which had held that spot for the two years prior. According to the Dice 2020 Tech Job Report, it’s one of the quickest growing IT professions, and even with an influx of people chasing that role, the supply can’t meet the demand. More than ever, businesses are driven to take advantage of advanced analytics; data engineers design and operationalize the infrastructure to make that possible.Before you sit at a test facility for the real deal, we highly recommend that you practice with the example questions (provided by Google Cloud) with Google Cloud’s documentation handy. All the questions are scenario-based and incredibly nuanced, so lean in to honing your reading comprehension skills and verifying your options using the documentation.We’ve linked out to plenty of external resources for when you decide to commit and study, but let’s start just below with questions like:What experience should I have before taking this exam? What roles and job titles does Google Cloud Professional Data Engineer certification best prepare me for? Which topics do I need to brush up on before taking the exam? Where can I find resources and study guides for Google Cloud Professional Data Engineer certification? Where can I connect with fellow community members to get my questions answered? View image as a full-scale PDF here.  Looking for information about a different Google Cloud certification? Check out the directory in the Google Cloud Certifications Overview. Extra CreditGoogle Cloud’s certification page: Professional Data Engineer Example questions Exam guide Coursera: Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certification Pluralsight: Preparing for the Google Cloud Professional Data Engineer Exam AwesomeGCP Associate Cloud Engineer Playlist Global Knowledge IT Skills and Salary Report 2020 Global Knowledge 2021 Top-Paying IT CertificationsHave more questions? We’re sure you do! Career growth is a hot topic within our community and we have quite a few members who meet regularly in our C2C Connect: Certifications chat. Sign up below to stay in the loop.https://community.c2cglobal.com/events/c2c-connect-google-cloud-certifications-72

Categories:Data AnalyticsCareers in CloudStorage and Data TransferGoogle Cloud CertificationsDatabasesInfographic

Modernizing Data Lakes & Data Warehouses With Google Cloud Platform

Choosing how and where to store unstructured data is a big decision for any enterprise. From compliance questions to calculating the total cost for managing and maintaining a new data storage solution, app modernization can bring up many unknowns, causing organizations to weigh their options in creating their on-premise solution or integrating a third party.Luckily, many open-source tools help organizations arrive at a solution and create a cloud environment suited to their needs. Our experts are weighing in about the process of modernizing data lakes and data warehouses with Google Cloud Platform and how enterprises can quell governance, compliance, and bandwidth concerns with a Google Cloud Storage solution for their most extensive sets of unstructured data. What is a Data Lake? To make an informed decision about whether or not your organization could benefit from Google Cloud Platform data ingestion and creating a Google Cloud Storage data lake, let’s first understand what a data lake is. Data lakes are created when a steady stream of data flows into one centralized location. They differ slightly from a data warehouse in that the data has not been transformed and is unstructured.Unlike structured data, a data lake is a general repository of unstructured data that can later be structured or categorized and used in data analytics and reporting. Organizations use data lakes to dump as much data as possible and then move data into its category for application processing. Then, use the data for machine learning, data warehousing, reporting, analytics, and other applications. What is Google Cloud Storage? Modernizing data lakes and data warehouses requires choosing the right third-party platform or on-premise platform for the job. And while there are certainly many benefits that come from hosting and storing data in a legacy system or on-site, there are even more advantages with storing big data in a third-party solution like Google Cloud.Google Cloud Storage is a public cloud storage system built for housing and storing large sets of unstructured data. In addition to ease of access, teams that store extensive data in Google Cloud Storage also alleviate potential compliance and governance issues while gaining all of the benefits of Google Cloud data privacy and security standards.As a reservoir for data, Google Cloud storage offers virtually unlimited capacity for any kind of data. Application developers can create “buckets” where data will be stored. Data can be categorized, separated by project, secured, and moved when necessary. Storage in the cloud can be unstructured or structured and used as the back end for public or private applications. Advantages of Storing Big Data in Google Cloud Platform One of the key components in creating a data lake is creating a central storage location. Many open-source tools online can act as central storage to your data lake, but Google Cloud Storage offers secure and cost-effective solutions for storing big data.Instead of building in-house infrastructure to support big data, the cloud offers an easier way to scale for additional storage capacity and support for newer technology. Big data is unstructured and requires truly scalable storage resources. Google Cloud Storage will scale up or down as necessary to support big data.Because it’s the cloud, data is always available. Real-time analytics and reporting require constant storage connectivity. Any outages could affect output and analytic functionality. With Google Cloud Storage, the data is always available, and failover storage can be used to cover the low chance of cloud disruption.One of the most significant advantages was the cost. Smaller organizations do not have the budget for high-end house technology. Google Cloud Platform provides storage and affordable ways to leverage the latest technology that would otherwise be unavailable due to building and maintaining infrastructure. In addition, security tools are readily available, and organizations can scale ample data storage at a fraction of the cost. Availability is also higher for remote workers since all data is located in the cloud. Creating a Google Cloud Platform Data Lake While the guidance for creating a Google Cloud Platform data warehouse differs slightly from creating a Google Cloud Platform data lake, the open-source tools available within the Google Cloud Platform host all of the capabilities necessary for building whatever data repository your organization needs and can help organizations get over the hurdles of scalability, governance, and analytics management that can make on-premise tools an inhospitable environment for your most important data.  Timeline & PrioritiesTo create a Google Cloud Storage data lake or move an existing data lake to the Cloud, it’s always best to create a timeline and list priorities for the effort. A plan is necessary to ensure that data can be moved to the cloud smoothly. The plan should include the data stored in the cloud, the security to protect it, and the applications and users accessing the data.Timelines depend on the amount of data and the project plan. Data is often the last component to migrate, but sample data is often moved during tests to ensure that applications function when all data is migrated over to the cloud. Migrations often happen slowly and during off-peak hours so that productivity is not affected. Choosing the Right Tool for the WorkloadModernizing data lakes and data warehouses with Google Cloud Platform requires teams to know workload patterns and profiles. The type of workload will determine the kind of cluster you should run to handle the different layers of your Google Cloud Platform data lake. In addition, Google has several applications to help migrate data, maintain it, secure it, and create archives and backups.For big data, the BigQuery Data Transfer Service will move a few gigabytes of data or terabytes if necessary. This tool will let organizations move only the data they want without migrating unused files that waste resources. In addition, it can be scheduled so that information is synchronized in the future between on-premise infrastructure and the cloud.Smaller volumes of data can be transferred using the gsutil command-line utility. Administrators can move data using the gsutil utility on the fly or when some scheduled data did not copy over to the cloud. It’s mainly used when fewer than a few terabytes must be migrated over to cloud storage manually. Using Google Cloud Platform’s Separate Storage and ComputeBigQuery is a serverless model that allows administrators to migrate and manage data without the expense and overhead of virtual machine instances. It can be used to schedule batch jobs so that the organization pays per project, reducing costs of bandwidth and resources. The storage silos used in Google Cloud are separated from the computing power used in BigQuery migrations. As a result, the organization can run multiple data migration projects that move data from one location to cloud storage without affecting applications or individual projects. Modernize Data OperationsAfter your data has been prepared for the migration, it’s also essential to optimize deployment operations by pooling clusters and rewriting deployments as code. This will help with platform rendering as you create the Google Cloud Storage data lake and organize big data. In addition, the serverless nature of BigQuery execution and project migration reduces the overhead on computing power. It gives administrators the ability to write code without worrying about server costs and configurations. Governing Your Google Cloud Platform Data Lake Once existing workloads and applications have been migrated to the Cloud, teams benefit almost immediately from Google Cloud Platform data ingestion and the suite of analytics and management tools at their disposal for interpreting and synthesizing raw data. DataprocDataproc will process, query, stream, and output to machine learning applications for organizations using Hadoop and Spark. Dataproc can be used to create clusters and automate data migration dynamically. Automation will take over any synchronization maintenance, freeing up administrators to focus on other projects. Cloud Data FusionYou need a pipeline to move data to the cloud and work with them within an application. Cloud Data Fusion can build these pipelines using either the Google Cloud console or a UI like Pipeline Studio or Wrangler. Pipelines built with Cloud Data Fusion will transform, clean, and transfer data to be ready for integration into your applications. Smart AnalyticsWith your big data stored in the cloud, you can now use it for real-time analytics. Google Smart Analytics is a platform that will give you actionable insights into your data. It can help drive future revenue initiatives and provide direction on new products and services. It integrates directly with BigQuery and your cloud data to gain insights into how business is thriving and changes that can make it even more productive.


Scaling an Enterprise Software: Autoscaling Applications and Databases (full video)

Michael Pytel (@mpytel), co-founder and CTO at Fulfilld, shares stories from the team’s wins and losses in building out this intelligent managed warehouse solution.The recording from this Deep Dive includes:(2:20) Introduction to Fulfilld (4:10) The team’s buildout requirements for a cloud-based application, including language support, responsiveness, and data availability (9:15) Fulfilld’s Android-based scanner’s capabilities and hardware (12:25) Creating the digital twin with anchor points (14:50) Microservice architecture, service consumption, and service data store (19:35) Data store options using BigQuery, Firestore, and CloudSQL (23:35) Service runtime and runtime options using Cloud Functions (28:55) Example architecture (30:25) Challenges in deciding between Google Cloud product options (31:40) Road map for the warehouse digital assistant, document scanning, and 3D bin packing algorithm (39:00) Open community questions Community Questions AnsweredWhat does the road map include for security? Did using Cloud Functions help with the system design and partitioning codings tasks by clearly defining functions and requirements? Do you give your customers access to their allocated BigQuery instance? What type of data goes to Firestore versus CloudSQL?Other ResourcesGoogle Cloud Platform Architecture Framework Google Cloud Hands-On Labs on Coursera Google Cloud Release Notes by ProductFind the rest of the series from Fulfilld below: 

Categories:InfrastructureGoogle Cloud StrategyIndustry SolutionsDatabasesSupply Chain and LogisticsSession Recording

8 Reasons Why IoT Operations (IoTOps) Is the Future of Developer Productivity

The Internet of Things (IoT) affects everything from street lighting, smart parking, air quality, ITS systems, IP cameras, waste collection, and digital signage. When IoT is managed, monitored, and maintained effectively, it changes everything from our cities to utilities, but we need to be on top of its challenges.These include: How do you connect thousands of IoT devices to back-office systems?  How do you manage IoT platforms from multiple vendors? How can you install and maintain the various IoT devices, some more complex than others? If you run a team, how can you guide those workers through step-by-step workflows and diagnostics? That’s where IoTOps, short for IoT operating systems, comes in. In our Google stratosphere, we’re given Fuchsia OS to use. Think of it as a cloud-based SaaS solution built specifically for IoT.  IoTOps integrates the data. IoTops helps you manage millions of IoT components, such as smart streetlights, traffic signals, power line sensors, garbage, parking, and air quality from one pane of glass. It integrates the various connected devices and with back-end systems quickly, easily, and efficiently. IoT management also includes the devices and the gateways that the IoT devices are connected to: a humongous enterprise. All step-by-step diagnostics can be managed and monitored from this one pane of glass for a finished product to reach smartphones and tablets.  IoTOps speeds up the progress.By making your workflow configurable, IoTOps help you manage the entire life cycle of your IoT operations. You have your IoT planning, inventory, installation, maintenance, and work orders in one place, making the process operational and fast. IoTOps simplifies IoT management.IoTOps tames your IoT explosion by helping you manage its escalating volume of data from one place. Not every project needs the same level of management or the same degree of care and attention. IoTOps brings to the forefront those facets that need special attention and help you design, regulate, and monitor your network performance from one pane of glass. IoTOps connects the workforce.The IoT operating system helps IT and operations work together, much the same way as DevOps does. With all data displayed in one crucible, engineers can stream data to diverse IoT applications and update all back-office systems (e.g., GIS, CMS, asset management, network management, CRM, and billing). At the other end, local plant technicians can use the OS to monitor and troubleshoot the industrial network. In the event of a device failure, technicians can work on rapid device replacement.  IoTOps gives you actionable results.IoT operations serve as an Analytics as a Service (AaaS) dashboard, giving you the insight to build on your IoT data for actionable results. Put another way, these operating systems provide you with visibility into your IoT projects’ inner workings and help you analyze the endless volume of data emitted from your connected devices.  IoTOps detects threats.IoTOps alerts you to anomalies and changes in IoT response time, characteristics, and behavior. It’s like an integration Platform as a Service (iPaaS), which standardizes how applications are integrated across the workflow. When differences are detected, it brings them to your attention promptly, so you can act on these cues instantly and prevent mishaps, such as network breaches. IoTOps saves human labor, costs, and productivity.IoTOps reduces downtimes by catching mishaps right away. Its timely intervention saves you the expense of fixing or replacing components. And its lean workflow does away with data drift, giving operations and data scientists the creativity and motivation to continue their work. You also don’t need to hire the many other specialists you would otherwise have required for deployment. IoTOps helps with incident management. The IoT impact on infrastructure and operations (I&O) can be significant, which is why it's crucial to catch mishaps in their beginning stages. An IoT OS helps you see the mass of connected facets and incoming data across environments, making the learning process error-resilient, stopping software from getting lost, and keeping your team on the same page. Very simply, IoT operating systems are platforms that help you manage, monitor, and maintain your IoT operations. Their value is immense. They allow you to complete important IoT projects from start to finish, where all components are configured precisely according to manufacturing network and security specifications with faster completion times. There are no more missing items or IoT elements that misbehave once they're configured in the network. Costly downtime and sunk expenses are a thing of the past since platforms like Google Fuchsia OS automate your IoT projects in a streamlined CI/CD process. Let’s Connect!Leah Zitter, Ph.D., has a master’s in philosophy, epistemology, and logic and a Ph.D. in research psychology.  

Categories:Cloud OperationsDatabases

Key Insights From the Cloud Database Report, Cloud Wars

 In the premiere edition of the Cloud Database Report, John Foley lays the foundation for his ongoing coverage and analysis of the cloud database market—the vendors, cloud database platforms, emerging technologies, trends, and business use cases.“The traditional database market is quickly morphing into the cloud database market, and it’s a game-changer not just for the tech industry, but for developers, database managers, data scientists, business users, and the entire data ecosystem,” Foley wrote.Foley, a veteran reporter covering all things tech and cloud, provides the in-depth analysis as well as weekly news and insights published on CloudWars.co.Why cover Cloud databases? Easy. They’re on the rise, as Foley points out:  Gartner estimates that 75% of all databases will be deployed or migrated to the cloud by 2022. Snowflake reported revenue growth of 115%, to $148.5 million, in Q3 of FY2021, compared to a year ago.  AWS’s Aurora database is the fastest-growing service in the history of AWS.  “The pace of change in data management is accelerating, both because more data than ever is being generated and because business and IT decision-makers are keenly aware that “big data” represents tremendous value if they are able to capitalize on it,” Foley wrote. “Their goal is to gain insights and drive innovation and actions—product development, customer acquisition, supply chain execution, sales.”Subscribe to the Cloud Database Report for free here.  Key Insights From The Inaugural IssueThe first issue covered a lot, so we break it down for you here. But, don’t forget to subscribe to also gain all the insights that Foley tirelessly pens. Insight 1: The Cloud Database Report creates a new Top 20 List for Cloud Database ProvidersThe Cloud Database Report has identified 20 cloud database providers that, in their analysis, are the leaders. They represent a cross-section of the market—the incumbents, the cloud service providers, and the challengers. They used four criteria in choosing the Cloud Database Top 20:  Enterprise capabilities. Vendors with a complete range of services and support that enterprises may want or need. Fully-managed services are a plus.  Platform adaptability. Tools, services, and APIs for data integration/migration and application compatibility are must-haves.  Innovation. A steady pipeline of new, modern, differentiating capabilities.  Demonstrated business value. Customer success is the #1 proof point.  Take a look at the full Top 20 list here, the top five are as follows:  AWS Cloudera Cockroach Labs Crunchbase Databricks Insight 2: It’s not easy to uproot installed databases or vendors.“Conventional wisdom has it that “old guard” vendors like Oracle, IBM, and Teradata, which have been selling database systems for 30-plus years, are vulnerable to being displaced by newer, cloud-native technologies,” Foley wrote. But, it’s not easy to change. Even though, as Foley writes, Snowflake is going after AWS, Microsoft, and Google Cloud, they all want a piece of Oracle’s established customer base. And Oracle, according to Larry Ellison, is in a class of its own and isn’t going down without a fight. Insight 3: Cloud database startups are driving innovation. “With new products and business models, startups like Teradata, once the epitome of an on-premises data warehouse vendor, recently announced the availability of its cloud data analytics platform, Teradata Vantage.”You can find it on Microsoft’s and Google Cloud’s marketplaces, it creates added support for more data sources, and also introduced a free cloud trial of Vantage. Insight 4: Google Cloud leans in the direction of purpose-built databases.As Foley writes in his report:“Google’s Cloud Bigtable excels at high-scale analytics and operational workloads, while Firestore is a NoSQL document database for new applications. “Most developers do not say, ‘I want one database for everything,’” Google Cloud director of product management for databases, Kelly Stirman, told me in a briefing for this report. A particular database may be ideal for one application, but not well suited for others, he said. “I don’t think you can engineer one database that serves all of them well.”That said, Google’s Cloud SQL and Cloud Spanner databases are capable of handling a widening array of workloads in the same way multi-model databases do.”As organizations expand their use of cloud service providers, creating an environment with a curated selection of vendors that best meet the overall needs, the trick is to avoid complexity. Insight 5: Create an adaptable cloud database architecture Foley wrote that it is “vitally important to reaping the benefits of the cloud database model without recreating the problems of the past. “He identified four key capabilities:  Hybrid. Few organizations are 100% cloud. The ability to connect existing, on-prem systems with the cloud database is a vital intersection point. This is IBM’s big play in the market with its RedHat stack. Other vendors, including AWS and Oracle, are expanding their hybrid cloud offerings.  Multicloud. The ability to share data and connect databases across clouds is a practical requirement, and there are strategic benefits as well, such as being able to operate in the cloud of your customer’s choosing. Google with Anthos and MongoDB with multicloud clusters are among those promoting multicloud as a differentiating capability.   Multimodel. Many cloud databases support multiple data types, but they do not all support the same data types. It’s important to assess strengths and weaknesses.  Fully managed. Some cloud databases are self-managed by the user, some managed by the service provider, and a few such as Oracle Autonomous Database are fully automated.  Using those capabilities, customers can evaluate the right architecture for their business. Foley ends his report with use cases. We recommend checking them out at CloudWars.co. __John Foley is founding editor of the Cloud Database Report. As a tech journalist at InformationWeek and other publications, he covered databases and enterprise software, open systems, analytics, and the cloud market. John also established and led editorial teams driving strategic communications at Oracle, IBM, and MongoDB.

Categories:Hybrid and MulticloudDatabases

C2C Talks: Using Google Cloud’s BigQuery to Move from a 48-Hour Cycle Time to a Mere 7 Minutes

Author’s Note: C2C Talks are an opportunity for C2C members to engage through shared experiences and lessons learned. Often there is a short presentation followed by an open discussion to determine best practices and key takeaways.Juan Carlos Escalante (JC) is a pioneering member of C2C and a vital part of the CTO office at Ipsos. Escalante details how he and his team handled data migration powered by Google Cloud and shares his current challenges, which may not be unlike anything you’re also facing. As a global leader in market research, Ipsos has offices in 90 countries and conducts research in more than 150 countries. So, to say its data architecture is challenging barely covers the complexity JC manages each day. “Our data architecture on our data pipeline challenges gets complex very quickly, especially for workloads dealing with multiple data sources, and what I describe as hyper-fragmented data delivery requirements,” he said in a recent C2C Talks: Data Migration and Modernization on December 10, 2020.So, how do they manage a seamless data flow? And how does JC’s data infrastructure landscape look? Hear below.What was the primary challenge?  Even though the design JC described is popular and widely used in the space, it isn’t without its own set of challenges and siloed data infrastructure rises to the top.“The resilience of siloed data infrastructure platforms that we see scattered across the company translates to longer cycle times and more friction to pivot and react to changing business requirements,” he said. Hear JC explain the full challenge below. What resonates with you? Share it with us!  How did you use Google Cloud as a solution?  By leveraging Google Cloud, JC and his team have unlocked new opportunities to simplify how different groups come into a data infrastructure platform and serve or solve their specific needs.“We all have different products and services that we have available within Google Cloud Platform,” he said. “Very quickly, we've been able to test and deploy proofs of concept that have moved rapidly towards production.”Some examples of the benefits JC and his team have found by using the Google Cloud Platform product, BigQuery include: Reduced cycle time or processing time from 48 hours to seven minutes Data harmony across teamsHear JC explain how BigQuery helped reach these successful milestones. Since it's going so well, what's next?  The goal is to think bigger and determine how JC and his team can transform end-to-end data platform architecture. “The next step we want to take in our data architecture journey is to bring design patterns that are common and are used widely in software development and bringing those patterns into our data engineering practices,” he said. On that list is version control for data pipelines—hear JC explain why. Also, JC is working with his team to plan for the future of data architecture and analytics on a global scale, which he says will be a multi-cloud environment. Hear him explain why below. Questions from the C2C Community 1. Are the business analysts running their daily job through the BigQuery interface? Or do they use a different application that's pulling from BigQuery?For JC’s organization, some teams got up to speed very quickly, while others need a little more coaching, so they’ll be putting together some custom development with Tableau. Hear JC’s full answer below. Hear how they use Google Sheets to manage the data exported from Big Query. 2. I have the feeling that my databases are way more similar than yours because my database is not talking about those things. It's just a handful of tables. So it's easier for us to monitor a handful of tables. But how do you monitor triggers?This question led to a more in-depth discussion, so JC offered to set up a time to discuss further separately, which is just one of the beautiful benefits of being a part of the C2C community. Check out what JC said to attack the question with some clarity below. We’ll update you with their progress as it becomes available! 3. What data visualization tools do JC and his team use?“Basically, the answer is we're using everything under the sun. We do have some Spotfire footprint, we have Tableau, we have Looker, and we have ClixSense. We also have custom development visualization developments,” he said.“My team is gravitating more towards Tableau, but we have to be mindful that whatever data architecture design we come up with, it has to be decoupled, flexible, and it has to be data engine and data visualization agnostic because we do get a request to support the next visualization,” he warned. Hear about how JC manages the overlap with Looker and Tableau and why he likes them both.   Extra Credit JC and his team used the two articles from Thoughtworks, linked below, to inform their decision-making and what they used as a guide for modernizing their data architecture. He recommends checking them out. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, Thoughtworks, May 2019 Data Mesh Principles and Logical Architecture by Zhamak Dehghani, Thoughtworks, December 2020 We want to hear from you! There is so much more to discuss, so connect with us and share your Google Cloud story. You might get featured in the next installment! Get in touch with Content Manager Sabina Bhasin at sabina.bhasin@c2cgobal.com if you’re interested.Rather chat with your peers? Join our C2C Connect chat rooms! Reach out to Director of Community Danny Pancratz at danny.pancratz@c2cglobal.com. 

Categories:Data AnalyticsC2C Community SpotlightHybrid and MulticloudCloud MigrationStorage and Data TransferDatabasesSession Recording