Browse articles, resources, and the latest product updates.
Cloud storage is an environment where digital data is stored in logical pools. It allows users to save data in an off-site location that is accessed through the internet or a private connection. Cloud storage approaches help businesses to securely save data, and can be accessed through multiple locations to creates on-demand access through any device. Cloud storage can be used to archive data that requires access but is not always used frequently; an example of this could be financial records. A cloud storage system can also specialize in storing specific types of data, including photos, audio files, and text documents. The cloud storage environment has multiple models and benefits, which are described below. Types of Cloud Storage Public Cloud Storage A public cloud can be a popular option for businesses looking to store their data efficiently and quickly, because it can be accessed online by the user they provide access to. It is hosted by different providers, and is similar to tenants living in a big apartment building––the apartment building being the company, and the landlord being the service provider. Public cloud storage options are typically used for non-critical tasks, such as file sharing or application testing. This approach meets the collaborative needs of many organizations today, offering scalability and flexibility by helping businesses with management efforts. The cloud provider is responsible for managing the system. Public cloud storage offers scalable RAM, which creates flexible bandwidth and makes it easier for businesses to scale their storage needs. Public clouds also offer a pay-per-use model and can be used by different customers simultaneously, as this is a cost effective approach for many organizations. Private Cloud Storage This model operates by installing a data center and is privately hosted within a company’s own infrastructure. The approach offers an added layer of security and protection since all the services are only accessible to those in the organization. Private cloud storage is also scalable, with more customization and dedicated resources than public cloud storage, because it is single-tenant based. Through company firewalls and internal hosting, private cloud storage options ensure that data cannot be accessed by any third parties. Private cloud environments are especially helpful to industries that have strict compliance policies: companies in healthcare or within the government can feel more secure that their data is stored in a secure environment. Hybrid Cloud StorageA hybrid cloud option uses both cloud and on-premise resources, combining the qualities of a private and a public cloud. It is an integrated storage architecture and offers options that benefit businesses of all sizes and budgets. Businesses are offered the choice of how much data to store in which cloud setting. For example, they may need to use elements of a private cloud to store data that is confidential, while operating and still working within a public cloud for needs related to branding or marketing. The hybrid cloud storage model offers an organization the best of both worlds, and still has the scalable qualities that are described in both cloud storage options mentioned above. Community Cloud Storage Community cloud infrastructure allows multiple organizations to share resources based on common operational requirements. The purpose of the community cloud storage option is to offer organizations a modified form of a private cloud, where the needs of different organizations are taken into consideration when constructing the architecture and solutions offered are specific to an industry. It is a shared platform of resources for different businesses to work on their own shared personal goals. This model can also be described as an integrated setup that combines the benefits of multiple clouds to address the needs of a particular industry. Each user of the cloud is allocated a fixed amount of data storage and bandwidth. The community cloud environment is a great solution for growing organizations in the healthcare, education, and legal sectors. Benefits Security Cloud storage saves data on redundant servers. Meaning, if the data collapses or gets lost in one cloud, it will be managed by the other servers. The data is protected by firewalls and through password access or authorization. Many cloud storage platforms also implement multifactor identification upon login. These programs verify a user more than once prior to giving them access to any data that may be confidential. An example of this would be verifying the user through multiple devices and syncing their phone number to their email address to confirm their identity. Additionally, there are also many customization options for added layers of security when working with cloud storage depending on the goals of a business. Scalability With cloud storage, users are able to scale their storage based on the needs of the organization. If users are no longer accessing certain pieces of data, but a business has a large amount of storage, the amount of storage can be scaled down. In contrast, if more storage is required, there can also be updates made to accommodate this requirement. Additional space that is provided in the environment will have the same capabilities, so there is no need to migrate any data from one place to another. Accessibility Files are accessible from any device with an internet connection. For those in a remote work environment, there is no need to be on a work laptop to gain access to a specific file as it can all be accessed through the cloud. Cloud storage also offers the option of remote workers to share files with one another in real time. Cost- Effective Cloud storage is an affordable approach for organizations, because providers can distribute the costs of their infrastructure and services across many businesses. There is no requirement to purchase separate servers or any other associated network technology. The cost is also dependent on business needs. Therefore, organizations do not need to pay for storage that they aren’t using. Cloud storage approaches also take away the requirement of purchasing hard disks, electricity, and hardware warranty services. Cloud storage environments are equipped with monitoring options and reduce the need for extensive capacity approaches. Data Redundancy Cloud storage provides in-built capabilities that handle data repetition or redundancy. Cloud storage environments have multiple copies of data, which will allow organizations to prevent concerns of data loss. Users can also utilize geographic replication options, which help to make multiple copies of data across regions. This also helps with disaster recovery when data is lost as various copies are stored within the cloud.
C2C’s first event for developers took place on April 26th, 2023 in Sunnyvale, CA. The event focused on data analytics and how organization can optimize their data. Below are some data buzzwords and their definitions, an overview of Dataplex, a product that was demonstrated at the event, and a summary of the key topics discussed. Data warehouse: A system that is used for reporting and data analysis. A data warehouse is a large storage of data that has been accumulated from a range of sources and helps businesses with decision-making processes.Data lake: A centralized infrastructure that is designed to store and process large amounts of data. A data lake can store data in its original form and process it in any variety. Data lakes are scalable platforms that allow organizations to ingest data from any source at multiple speeds. Data Lakehouse: A modern data platform that is a combination of a data warehouse and a data lake. BigQuery: Serverless architecture built as a data warehouse that works across clouds while scaling with your data. BigQuery allows users to pick the right feature set for workload demands and can match these needs in real time. It can also analyze data across multiple clouds and securely exchange data sets internally or across businesses, making it a platform with scalable analytics. BigLake: A storage engine that unifies data warehouses and lakes through BigQuery to gain access to data. DataplexDataplex is a lake administration and data governance tool. It enables organizations to discover, manage and evaluate their data across data lakes and data warehouses. Dataplex also has a variety of features that allow organizations to choose specific items to easily manage data. For example, the tag management feature ensures that specific users have access to the right data by setting policy templates and tags with different sets of data. Dataplex also has automated data quality management features. For example, if a report quotes incorrect numbers, the data can be corrected with automated data tools rather than manually. Data and and Real Time Analytics A major point raised at the developer’s event was that data is rooted in an event-driven architecture. For instance, customers who work in finance get highly interested in data in real time during specific periods. This interest is event-based, as it usually occurs when the industry reaches a quarter close. Moving data around can be a difficult task; however, there are certain cloud features that can solve this issue, such as Dataplex. The main concern surrounding organizing data is access control and governance. Customers want to know that steps have been taken to ensure that unauthorized users do not gain access to private data. Visibility and transparency are also core tenets when discussing access to data and its governance tools.
This week, Bruno shares lessons in entrepreneurship from Netflix, insights from the latest Gartner research and brings a special guest to talk about the Data Mesh.This CarCast covers:Why Netflix story is an example for culture and success of entrepreneurship. The lessons of different thinking and perseverance. Gartner's latest research shows that budgets are up and the latest Gartner data trends point to 3 themes: 1) "from platforms to ecosystems”, “don’t forget the humans” and “think like a business” Finally, if you'd like to connect with Bruno live, you can meet him this Wednesday at the Commonwealth Club at the Everyday AI event. For more, check out Bruno's blog here.Have a great week!
A company becomes the victim of ransomware every 11 seconds. Despite billions of dollars spent to thwart ransomware attacks, an astonishing 66% of companies fell victim to these attacks in 2021, according to Sophos's State of Ransomware 2022 report. Organizations must take precautions to stop attacks before they happen, because recovering from ransomware takes a minimum of 30 days.Ransomware numbers are rising everywhere—by attack volume, ransom demands, and average ransom payments. And as threat sophistication increases, virtually every industry is experiencing growing incident rates. No organization is immune. Although attacks may seem inevitable, defensive measures should always be in place, and they're most effective when paired with a strong ransomware recovery plan.Google, NetApp, and Workspot are working together to help customers create a ransomware recovery plan. By using a proven storage platform, innovative clean cloud, and global cloud PCs, they're able to restore productivity for thousands of users around the globe within minutes. At a recent 2Chat event, speakers from these companies discussed the impact of ransomware on organizations and how you can improve your storage options by: Creating an isolated project Preparing regions for capacity Provisioning cloud PCs globally Connecting to NetApp CVS for secure access to files and data Watch a full recording of the conversation here:
This week, Bruno talks about the 3 key attributes of modern Data Products, covers best practices in Data and points you to leaders you should know and follow so you too can grow!This CarCast covers:The Good, The Bad and The Ugly of Data. The data organization is now a value organization (70% of data leaders report to the company's president, CEO, COO or CIO) and it gives you, the data leader, the opportunity to align on business objectives, not just technical ones. Read more on VentureBeat here. How To Succeed as a Data Leader. Bruno reviews the do's and don'ts of Jaguar and Land Rover's former Data Chief. He talks about accountability and thoughtful planning. Data is more than just tech! Data products in action. For Bruno, Data Products are about Data, Time and People. Listen in to get the breakdown!And finally, you all liked Bruno's MAD interview so much, he's created a playlist! Check out the snippets and behind the scene short videos here! (btw - MAD stands for Machine Learning, Artificial Intelligence & Data Landscape!
An engaged audience eagerly listens as Sanjay Chaudhary, Vice President of Product Management at Exabeam explains how hackers are able to use MFA bombing to hack employee emails in order to gain confidential company information. This is one of many topics surrounding data optimization discussed at the 2Gather event in Sunnyvale, California on February 3rd. “Not coming from a technical background, I wasn’t sure what to expect at my first event. However, the panel’s rich and engaging narrative made data security into an amazing story to listen to!” said June Lee, Senior Program Manager at Workspot. The first C2C event of the year embodied the essence of forming meaningful connections. At the beginning of the event, all attendees were asked to introduce themselves to two other individuals they have not spoken to. This created a strong sense of openness and going beyond comfort zones to spark personable interactions. Through peer to peer conversation, guests connected on driving advocacy and feedback surrounding how to use Google Cloud in regards to data analytics. The event was composed of a diverse panel of Google partners including NetApp, Exabeam, Lytics as well as Cisco systems. “Everything starts with a customer,” stated Bruno Aziza (@BrunoAziza), the Head of Data and Analytics at Google. This approach is the driving force behind Google building close relationships with their customers, understanding their journeys and what challenges can arise, one of these being receiving value from data that has been collected. “A large amount of organizations are struggling to turn data into value and money is being spent on data systems, yet companies are not always benefiting from it” says Bruno. Organizations now have access to large sets of data, however, critical pieces of data are not typically within their internal environment. A step in the right direction is to create data products that assist with tackling this issue. One of the major keynote speakers, Vishnudas Cheruvally, Cloud Solution Architect at Netapp provided insight on solutions that the organization is working on. “One of the main goals of Netapp is to build an environment that is rooted in trust and to create an infrastructure where users do not have to worry about basic tasks associated with optimizing data,” says Vishnudas. Through billing API’s and resizing data volume with Google Cloud services, customers have accessible tools that allow them to make informed decisions. This includes creating a customized dashboard to observe what is happening within their environment. Along with data optimization, emerging global trends and the impact it has on data sovereignty was also a recurring topic that captivated the audience. “Data sovereignty and upcoming global trends within data security were key topics discussed at the event and are also motivating factors of solutions developed by Netapp,” stated Vishnudas. “Everything starts with a customer.” “An emerging trend is using excessive resources through multiple clouds and essentially creating a wasteland,” says Jascha Kaykas-Wolff (@kaykas), President of Lytics. This conversation sparked the topic of global trends, data sovereignty and cloud strategy. With high amounts of data being stored by organizations, questions begin to arise in regards to ownership. “Data has to live in a specific area and there has to be control or sovereignty over it,” says Jascha. The panel engaged in a conversation that covered dealing with shifting global trends and how it impacts customers. Sanjay Chaudary brings in a product management perspective, which is rooted in solving customer problems. “With more regulations being created, data cataloging is essential in order for customers to understand what is critical in terms of their data and security threats. The core principle of data is the same, the most important thing is being able to detect a problem with the data and how fast it can be addressed.” says Sanjay. From ownership to data security, the discussion highlighted a variety of fresh perspectives. What stood out amongst guests is the diversity of the panel that brought in differentiating views. “The event had extremely thought-provoking insights stemming from the issues of modern day data analytics and how it impacts a customer base as well as a panel that discussed their personal experiences with data,” said Dylan Steeg (@Dylan_Steeg), VP of business development at Aible. Both speakers and guests then attended a networking session following the event. Over refreshments and drinks, guests were able to mingle with one another to further expand the conversation. Most importantly, they were able to create meaningful connections. Connections that may lead to future collaborative efforts as well as identifying solutions that can take data optimization to new heights.You and your organization can also build these connections. To start, join C2C as a member today. We’ll see you at our next 2Gather event! Extra Credit:
C2C partner NetApp will be appearing at upcoming 2gather events in Sunnyvale, New York City, and Zurich in February of 2023, and also in Paris in March. Each event will feature a presentation focusing on a specific use case for Cloud Volumes, NetApp’s proprietary solution for data storage on Google Cloud. The events will offer unique opportunities to learn about the product and discuss its capabilities with peers onsite, but for those whose interest is already piqued who want to learn more about it, we sat down with Brian Wink, NetApp’s Director of Google Cloud Architects, to talk the basics of Cloud Volumes and what it can do for your business’s data. First, give us a little background on you, your role at NetApp, and what you do there. My name is Brian Wink. I’m currently the Director of our Google Cloud solution architects, which means anyone in the field that’s talking tech, designing systems, doing any of that type of work, those guys roll to me. I’ve been in data storage since 1997. I was employee number 302 with NetApp. I worked here for 13 years. I left for a decade, and I was doing another cloud-backed storage company, so always storage, but that was an entree into cloud and distribution, and then when NetApp really wanted to build our cloud business, a friend of mine said, “Hey you should look at this, consider coming back,” and I did, and I’m having a great time, a lot of fun. At our upcoming events in Sunnyvale, New York, and Zurich, representatives from NetApp will discuss specific use cases for NetApp’s Cloud Volumes. What is Cloud Volumes, and how does it work on Google Cloud? Cloud Volumes is simply a container that’s running in the cloud, and its job is to hold bits. NetApp is very famous for NAS, which stands for network-attached storage, and in that there’s two real ways to do storage. There’s what’s called file, and what’s called block. In order to store a file, you speak a protocol. Your workstation is saying to a network-attached device, “Here’s a file, it contains some bits, please store it for me.” When I want it back, I’m going to ask for that file by name. I now get to decide what’s the best way for me and my software, my hardware, and my environment to actually store it so that I can make sure it’s going to be there when you ask for it, you’re going to get it in the amount of time you want, and if god forbid some disaster happens, I can either mathematically recalculate it, or go fetch it from a secondary or third copy somewhere else.NetApp’s been doing that for thirty years on prem. Now we’re taking that thirty-year legacy and saying, “How do I present that to you in the cloud?” We have two ways of doing that. We can say, “Listen, I’ll give you the keys to the kingdom, you can run it as a software, you can turn all the knobs and dials.” This is what’s called our Cloud Volumes on tap, CVO (we love acronyms). You get to run our software in all of its glory. The other one is called Cloud Volume service, CVS. This is where NetApp and our SRE team is running it. We’re operationalizing it for you, we’re making sure it has the right security, we’re making sure all the settings are correct, and we’re offering it to you as a service, so it’s a quick and go. You say, “Hey, I want a volume, I want a container to store some files,” and in about three clicks, you get it. A lot of companies who are running cloud volumes are using it in conjunction with Google Cloud VMWare. How does VMware fit into that picture? Here’s the thing: Storage isn’t always the sexiest thing in the world, but if you think about it, everything we do is either producing or consuming data, and so you have to have good quick access to data. VMware is just an application. It’s going to produce or consume data. There are two key ways that VMs do that. One is called guest mode. You have your VMware, which looks like a machine to the operating system running on top of it, Linux or Windows, and then whatever you’re doing with that operating system, you’re mounting a volume. It looks just like any other volume that you would have if this wasn’t VMware. There’s nothing overly special that we have to do for that from a protocol or communications standpoint. It’s still very important to make sure that that data is quick and accessible and in the right region and durable and reconstructable, but we’re presenting it as a guest.The other way is to say, “How do we present it as a data store?” This is where we’re saying that VMware is using the operating system where it’s getting its actual brain from. It’s living on us. That’s called data store mode. We do both. VMware is a really critical use case for us. I think the big advantage there is we do have a tremendous amount of customers in a traditional sense that are running VMware on top of NetApp and on prem, and when those guys want to migrate to the cloud, because we’re also in the cloud, it is the true definition of a lift and shift. I’m going to take it from here and I’m going to run it from there, end of learning curve. Security is also not the sexiest topic in the world, but it’s still a topic everybody has to think about. What sets Cloud Volumes’ security capabilities apart from everyone else’s? There are multiple layers of security. First of all, there’s “How do I allow people into what they should see and keep them out from what they shouldn’t see.” That’s access control. We’re going to plug into all the major access control providers. AD is a big deal in Google these days. We’re going to make sure that all the permissions and properties––can you see it, can you view it, can you edit it, can you execute it––all that stuff is there. What’s important is, how are we actually storing that? Maybe I’m protecting everybody from coming in the front door, but what if I’ve got a back door or side door that people would just run through? This is where we do a couple of things. How are we storing the data? We’re not storing it in terms of files, we’re breaking it up and chunking it up and compressing it and deduplicating it and obfuscating it effectively in our format, but then when we actually lay it down to some kind of media that Google is hosting, we’re encrypting that as well.Everything’s encrypted both at rest and in flight, and this is part of the security model. We maintain that security posture from the moment we see the bits. We’ve been certified by every possible organization known to man. We’ve got plenty of federal customers that I’m sure somebody would come and kill me if I told you about. We’ve passed all those audits, and we’re applying that all the way to cloud. One thing we implemented for a large financial within the last year was what’s called CMAK: customer-managed encryption keys. They can have a separate repository just for the keys, so we don’t even see the key, and we’re querying that repository to get it. We support things like that as well. You just gave me a great one, but outside of security, what are some other ways that Cloud Volumes could be used for a FinServ organization? A lot of the FinServs are really big, and so you get a couple different things. They’re going to run some of their key apps on it, they’re going to do data mining and things like that, because we can now. We can expose it to their AI and ML engine of choice, whatever that might be. The other thing that we’ve seen them do a lot, and the example I use––I’m trying not to accidentally tell you the customer name––what they wanted to do is create their own internal marketplace, so their IT organization evaluated the product, and then they put it on their marketplace. Now anybody inside their organization who needs storage, they go to the portal and say, “I need storage. I need this much. It needs to be this fast.” Boom. It gets lit up, and they don’t have to go through that evaluation every single time, because they’ve already done it.Some of the other things that they value are our high availability options and the various things we offer there, again pulling on that thirty-year legacy. I bring that up often because it is very important. So many cloud companies just started last year, and there’s nothing wrong with that. I’ve worked for startups before in the past, and I like that, but when you’re dealing with storage, there’s something to be said to say, “Listen, I know what I’m doing. We haven’t always been perfect, but we’ve had thirty years to figure out how to get to perfect, and we’re leveraging that every single day with our customers.” We’re a community of cloud users. NetApp is coming from this legacy history, but recently moved into the cloud space. What’s the value for you of getting in front of a room full of people who are all coming from cloud, not just to talk about Cloud Volumes, but to have an actual peer-to-peer conversation? It kind of goes back to what I said at the beginning: storage isn’t the most sexy thing. A lot of times it’s not thought of first, or even until the very end. Somebody goes out and designs a wonderful application that solves world peace, but if they haven’t considered how to properly use the storage, they could be compromised on any one of many things. It could be security like we’ve already talked about, it could be pricing, it could be performance, it could be efficiency. It’s like saying, “Hey I want to build the house, and then at the very end I want to pour the foundation.” No. You have to lay the foundation first, and know that the ground is compacted and you’ve got your sewer connections and all the various things that you need, and now you can build a really great house on top of it.How do I approach the problem? How do we allow you to identify what your data is, how you’re going to use it, to use it efficiently. I’ve had customers come to me and they’ve made decisions up front that don’t let us do certain things with the data, like maybe they want to encrypt their data in their application layer. They can do that, but maybe they’re making that decision because they want security of data at rest. If the application doesn’t encrypt it up front that allows us to do certain things with it. We can compress it, dedupe it, encrypt it, add that layer of efficiency to it, but also allow us to back it up and move it around efficiently. It’s all about that efficiency up front. Extra Credit: If you’d like to take part in a larger, in-person discussion about Cloud Volumes and its many capabilities, come to one of these upcoming 2Gather events:2Gather: Sunnyvale 2Gather: New York City 2Gather: Zurich 2Gather: Paris
Whether you are a data scientist or analyst, understanding BigQuery architecture provides insight on how organizations control costs and analyze data with built-in features. If you want to optimize your data sets through the scalable capabilities of BigQuery, listen to the stories of these four growing startups: Tinyclues, Aible, Connected-Stories, and Snorkel AI. The speakers discussed the following at this 2Learn event: How companies are able to scan a high volume of topics among users while also improving the user experience Understanding how BigQuery allows organizations to leverage data to develop strategies and optimize campaign performance Accessing knowledge on data-centric AI and its integration within a workflow Analyzing specific data sets that will provide the most valuable insight into market conditions and consumer behaviorWatch a full recording of this event below: Extra credit:
You have options if you want to reduce the time to value for SAP deployments on GCP. Google Cloud solutions such as BigQuery, CloudSQL, AutoML, and Spanner—among others—are available to onboard and will accelerate insights on SAP data. Mike Eacrett, a senior product manager at Google Cloud, and Chai Pydimukkala, Google Cloud Head of Product Management, recently joined C2C for a technical session for SAP architects, data integrators, and data engineers to cover important options for SAP deployments on GCP. The session provided an overview of available solutions, technical requirements, and customer use cases. Watch the video below to see the live presentations, and use the following timestamps to navigate to the segments most relevant to you:(1:50) Mike Eacrett Introduction and Reference Architecture (3:20) BigQuery Connector for SAP: SAP Data Integration (4:25) BigQuery Connector for SAP: Highlights & Value (7:30) BigQuery Connector for SAP: Solution Overview (10:30) BigQuery Connector for SAP: How does it work? (14:35) Data Type Mapping Overview (17:40) Supported Software Requirements (19:55) Chai Pydimukkala Introduction and Cloud Data Fusion (23:15) Cloud Data Fusion Key Capabilities and Personas (31:25) SAP Table Batch Source (34:50) SAP SLT Replication Plugin (36:45) SAP ODP Plugin (38:45) SAP OData Plugin Extra Credit:
In early 2021, Rich Hoyer, Director of Customer FinOps for SADA, published an opinion piece in VentureBeat that refuted the findings of an earlier published article about the cost of hosting workloads in the cloud. In his rebuttal, Hoyer called the article (which was written by representatives of Andreessen Horowitz Capital Management) “dead wrong” with regard to its findings about cloud repatriation and costs.Hoyer’s expertise and his views on doing business in the cloud make him an ideal participant for a C2C Global panel discussion taking place on January 20, at which he will appear alongside representatives of Twitter and Etsy to talk about whether or not enterprises should consider moving workloads off the cloud and into data centers. Hoyer predicts the panel conversation will lean away from the concept of repatriation and more toward the concept of balancing workloads.“I don’t think repatriation is the right term,” Hoyer says. “To me, it’s much more a decision of what workloads should be where, so I would phrase it as rebalancing—as more optimally balancing. Repatriation implies that there’s this lifecycle. That’s just not the way it works. How many startups have workloads that are architected from the ground up and not cloud native? You don’t see that. If you’re cloud native, you start using the stuff as cloud native.” The panel discussion will focus on hybrid workloads, he says, with a specific eye toward what works from a cost standpoint for each individual customer. “We want cloud consumers to be successful, and if they have stuff in the cloud that ought not to be there, they’re going to be unhappy with those workloads,” Hoyer says. “That’s not good for us, it’s not good for Google, it’s not good for anybody. We want only things in the cloud that are going to be successful because customers know they’re getting value from it, because that’s what’s going to cause them to expand and grow in the cloud.”From his FinOps viewpoint, Hoyer says he will be advocating for the process of making decisions around managing spend in public cloud, and the disciplines around making decisions in the cloud. “The whole process of trying to get control of this begins with the idea of visibility into what the spend is, and that means you have to have an understanding of how to report against it, how to apply the tooling to do things like anomaly alerting,” he says. I expect the discussion to be less about whether there should be repatriation, and the more constructive discussion to be about the ways to think about how to keep the balance right.” The overall goal of the panel is to present a process for analyzing workloads. And according to Hoyer, that’s not a one-time process—it’s iterative. “I’ll encourage anyone who has hybrid scenarios—some in the data center and some in the cloud—to be doing iterated looks at that to see what workloads should still be in the cloud,” Hoyer says. “There should be an iteration: Here’s what’s in the cloud today, here’s what’s in the data center today, and in broad terms, are these the right workloads? And then also, when stuff is in the cloud, are we operating it efficiently? And that’s a constant process, because you’ll have workloads that grow from the size they were in the cloud. And we’ll hear that same evaluation from the technology standpoint—are we using the best products in the cloud, and are there things in the data center that ought not to be there?”Be sure to join C2C Global, SADA, Twitter, and Etsy for this important conversation and arm your business with the tools needed to make intelligent and informed decisions about running your workloads and scaling your business. Click the link below to register.
If you’re a web developer, a software engineer, or anyone else working with small batches of data, you know how to use a spreadsheet. The problem arises when you have massive amounts of data that need to be stored, ingested, analyzed, and visualized rapidly. More often than not, the product you need to solve this problem is Google Cloud’s serverless, fully-managed service, BigQuery. BigQuery deals with megabytes, terabytes, and petabytes of information, helping you store, ingest, stream, and analyze those massive troves of information in seconds.Small stores can use Excel to classify, analyze and visualize their data. What if your organization is a busy multinational corporation with branches across cities and regions? You need a magical warehouse database you can use to store, sort, and analyze streams of incoming information. That’s where BigQuery comes in. What is BigQuery? BigQuery is Google Cloud’s enterprise data cloud warehouse built to process read-only data. It’s fully managed, which means you don’t need to set up or install anything, nor do you need a data-based administrator. All you need to do is import and analyze your data.To communicate with BigQuery, you need to know SQL (Structured Query Language), the standard language for relational databases, used for tasks such as updating, editing or retrieving data from a database. BigQuery in Action BigQuery executes three primary actions: Ingestion: uploading data by ingesting it from cloud storage or by streaming it live from Google Cloud partners, such as BigTable, Cloud Storage, Cloud SQL, and Google Drive, enabling real-time insights Storage: storing data in a structured table, using SQL for easy query and data analysis Querying: answering questions about data in BigQuery with SQL Getting BigQuery up and running is fairly simple. Just follow these steps: Find BigQuery on the left-side menu of the Google Cloud Platform Console, under “Resources.” Choose one or more of these three options: Load your own data into BigQuery to analyze (and convert that data batch into a common format such as CSV, Parquet, ORC, Avro, or JSON). Use any of the free public datasets hosted by Google Cloud (e.g., the Coronavirus Data in the European Union Open Data Portal). Import your data from an external data source. BigQuery ML You can also use BigQuery for your machine learning models. You can train and execute your models on BigQuery data without needing to train and move them around. To get started using BigQuery ML, see Getting started with BigQuery ML using the Cloud Console.Where can you find BigQuery (and BigQuery ML)? Both BigQuery and BigQuery ML are accessible via: Google Cloud Console The BigQuery command-line tool The BigQuery REST API An external tool such as a Jupyter notebook or a business intelligence platform BigQuery Data Visualization When the time comes to visualize your data, BigQuery can integrate with several business intelligence tools such as Looker, Tableau, and Data Studio to help you turn complex data into compelling stories. BigQuery in Practice Depending on your company’s needs, you will want to take advantage of different capabilities of BigQuery for different purposes. Use cases for BigQuery include the following: Real-time fraud detection: BigQuery ingests and analyzes massive amounts of data in real time to identify or prevent unauthorized financial activity. Real-time analytics: BigQuery is immensely useful for businesses or organizations that need to analyze their latest business data as they compile it. Log analysis: BigQuery reviews, interprets, and understands all computer-generated log files. Complex data pipeline processing: BigQuery manages and interprets the steps of one or multiple complex data pipelines generated by source systems or applications. Best BigQuery Features BigQuery has a lot to offer. Here are some of the tools BigQuery’s platform includes: Real-time analytics that analyzes data on the spot. Logical data warehouses wherein you can process data from external sources, either in BigQuery itself or in Google Drive. Data transfer services where you can import data from external sources including: Google Marketing Platform Google Ads YouTube Partner SaaS applications to BigQuery Teradata Amazon S3 Storage compute separation, an option that allows you to choose the storage and processing solution that’s best for your project Automatic backup and easy restore, so you don’t lose your information. BigQuery also keeps a seven-day history of changes. BigQuery Pros It’s fast. BigQuery processes billions of data rows in seconds. It’s easy to set up and simple to use; all you need to do is load your data. BigQuery also integrates easily with other data management solutions like Data Studio and Google Analytics BigQuery is the only data warehouse that handles huge amounts of data. BigQuery gives you real-time feedback that could thwart potential business problems. With BigQuery, you can avoid data silo complications that arise when you have individual teams within your company that have their own data marts. BigQuery Cons It falls short when used for constantly changing information. It only works on Google Cloud. It can become costly as data storage and query costs accumulate. PCMag suggests you go for flat pricing to reduce costs. You need to know SQL and its particular technical habits to use BigQuery. BigQuery ML can only be used in the US, Asia, and Europe. When should you use BigQuery? BigQuery is best used ad-hoc for massive amounts of data, run for longer than five seconds, that you want analyzed in real time. The more complex the query, the more you’ll benefit from BigQuery. At the same time, don’t expect the tool to be used as a regular relational database or for CRUD, i.e., to Create, Read, Update, and Delete data. BigQuery Costs Multiple costs come with using BigQuery. Here is a breakdown of what you will pay for when you use it: Storage (based on how much data you store): There are two storage rates: active storage ($0.020 per GB), or long-term storage ($0.010 per GB). With both, the first ten GB are free each month. Processing queries: Query costs are either on-demand (i.e., by the amount of data processed per query), or flat-rate. BigQuery also charges for certain other operations, such as streaming results and use of the BigQuery Storage API. Loading and exporting data is free.For details, see Data ingestion pricing. This Coupler Guide to BigQuery Cost is also extremely helpful. TL;DR: With BigQuery, you can assign read or write permissions to specific users, groups or projects, collaborating across teams, and it is thoroughly secure, since it automatically encrypts at-rest and transit data.If you’re a data scientist or web developer running ML or data mining operations, BigQuery may be your best solution for those spiky, massive workloads. It is also useful for anyone handling bloated data batches, within reason. Be wary of those costs. Have you ever used BigQuery? How do you use it? Reach out and tell us about your experience! Extra Credit:
Personal development and professional development are among the hottest topics within our community. At C2C, we’re passionate about helping Google Cloud users grow in their careers. This article is part of a larger collection of Google Cloud certification path resources.The Google Cloud Professional Data Engineer certification covers highly technical knowledge concerning how to build scalable, reliable data pipelines and applications. Anyone who intends to take this exam should also be comfortable selecting, monitoring, and troubleshooting machine learning models.In 2021, the Professional Data Engineer rose to number one on the top-paying cloud certifications list, surpassing the Professional Cloud Architect, which had held that spot for the two years prior. According to the Dice 2020 Tech Job Report, it’s one of the quickest growing IT professions, and even with an influx of people chasing that role, the supply can’t meet the demand. More than ever, businesses are driven to take advantage of advanced analytics; data engineers design and operationalize the infrastructure to make that possible.Before you sit at a test facility for the real deal, we highly recommend that you practice with the example questions (provided by Google Cloud) with Google Cloud’s documentation handy. All the questions are scenario-based and incredibly nuanced, so lean in to honing your reading comprehension skills and verifying your options using the documentation.We’ve linked out to plenty of external resources for when you decide to commit and study, but let’s start just below with questions like:What experience should I have before taking this exam? What roles and job titles does Google Cloud Professional Data Engineer certification best prepare me for? Which topics do I need to brush up on before taking the exam? Where can I find resources and study guides for Google Cloud Professional Data Engineer certification? Where can I connect with fellow community members to get my questions answered? View image as a full-scale PDF here. Looking for information about a different Google Cloud certification? Check out the directory in the Google Cloud Certifications Overview. Extra CreditGoogle Cloud’s certification page: Professional Data Engineer Example questions Exam guide Coursera: Preparing for Google Cloud Certification: Cloud Data Engineer Professional Certification Pluralsight: Preparing for the Google Cloud Professional Data Engineer Exam AwesomeGCP Associate Cloud Engineer Playlist Global Knowledge IT Skills and Salary Report 2020 Global Knowledge 2021 Top-Paying IT CertificationsHave more questions? We’re sure you do! Career growth is a hot topic within our community and we have quite a few members who meet regularly in our C2C Connect: Certifications chat. Sign up below to stay in the loop.https://community.c2cglobal.com/events/c2c-connect-google-cloud-certifications-72
On March 11, C2C sponsored and co-hosted an event with Cloud Study Network and Serverless Toronto featuring this presentation by Dan Sullivan.This presentation explores the advantages of highly denormalized data models and how you can take advantage of columnar storage, compression, composite data types, repeated fields, partitioning, clustering, and other features of BigQuery to design scalable data warehouses. Dan Sullivan—Principal Engineer, PEAK6 TechnologiesSoftware Architect with extensive experience in data architecture, data science, machine learning, stream processing, and cloud architecture. He is the author of the official Google Cloud study guides for the Professional Architect, Professional Data Engineer, and Associate Cloud Engineer. View his publications here.
In October C2C hosted a Deep Dive with Product Managers, Data and Analytics Chad Jennings and Robert Saxby entitled What's New in BigQuery, Google Cloud's Modern Data Warehouse. Jennings and Saxby took C2C members though the BigQuery product roadmap and provided use cases relating to efficiency and security. Watch the full Deep Dive below, and you can access the slides shown here.
Known as a prominent programmer and entrepreneur in the tech space, Andi Gutmans today serves as the General Manager and VP of engineering for databases at Google Cloud. He is responsible for overseeing a group whose goal is to support customers with their data journeys and with transforming their businesses.“It’s a three-step journey,” he said. “We take them through migration, modernization, and then transformation. The best part of what we do is being able to innovate on behalf of our customers.”Innovating is something Gutmans does well. He co-created PHP, the programming language that is the most widely used web language for creating dynamic web pages, and he also co-founded Zend Technologies, which continues to do much of the work in further developing PHP. Gutmans doesn’t shy away from new challenges. He instead thrives on finding solutions for them. “All customers want to eventually get to transformation,” he said. “But it’s not always easy to make the full leap in one step. I’m excited about the opportunity to partner with them on that journey and to really enable that transformation.”Watch the whole interview below.
Author’s Note: C2C Talks are an opportunity for C2C members to engage through shared experiences and lessons learned. Often there is a short presentation followed by an open discussion to determine best practices and key takeaways.Juan Carlos Escalante (JC) is a pioneering member of C2C and a vital part of the CTO office at Ipsos. Escalante details how he and his team handled data migration powered by Google Cloud and shares his current challenges, which may not be unlike anything you’re also facing. As a global leader in market research, Ipsos has offices in 90 countries and conducts research in more than 150 countries. So, to say its data architecture is challenging barely covers the complexity JC manages each day. “Our data architecture on our data pipeline challenges gets complex very quickly, especially for workloads dealing with multiple data sources, and what I describe as hyper-fragmented data delivery requirements,” he said in a recent C2C Talks: Data Migration and Modernization on December 10, 2020.So, how do they manage a seamless data flow? And how does JC’s data infrastructure landscape look? Hear below.What was the primary challenge? Even though the design JC described is popular and widely used in the space, it isn’t without its own set of challenges and siloed data infrastructure rises to the top.“The resilience of siloed data infrastructure platforms that we see scattered across the company translates to longer cycle times and more friction to pivot and react to changing business requirements,” he said. Hear JC explain the full challenge below. What resonates with you? Share it with us! How did you use Google Cloud as a solution? By leveraging Google Cloud, JC and his team have unlocked new opportunities to simplify how different groups come into a data infrastructure platform and serve or solve their specific needs.“We all have different products and services that we have available within Google Cloud Platform,” he said. “Very quickly, we've been able to test and deploy proofs of concept that have moved rapidly towards production.”Some examples of the benefits JC and his team have found by using the Google Cloud Platform product, BigQuery include: Reduced cycle time or processing time from 48 hours to seven minutes Data harmony across teamsHear JC explain how BigQuery helped reach these successful milestones. Since it's going so well, what's next? The goal is to think bigger and determine how JC and his team can transform end-to-end data platform architecture. “The next step we want to take in our data architecture journey is to bring design patterns that are common and are used widely in software development and bringing those patterns into our data engineering practices,” he said. On that list is version control for data pipelines—hear JC explain why. Also, JC is working with his team to plan for the future of data architecture and analytics on a global scale, which he says will be a multi-cloud environment. Hear him explain why below. Questions from the C2C Community 1. Are the business analysts running their daily job through the BigQuery interface? Or do they use a different application that's pulling from BigQuery?For JC’s organization, some teams got up to speed very quickly, while others need a little more coaching, so they’ll be putting together some custom development with Tableau. Hear JC’s full answer below. Hear how they use Google Sheets to manage the data exported from Big Query. 2. I have the feeling that my databases are way more similar than yours because my database is not talking about those things. It's just a handful of tables. So it's easier for us to monitor a handful of tables. But how do you monitor triggers?This question led to a more in-depth discussion, so JC offered to set up a time to discuss further separately, which is just one of the beautiful benefits of being a part of the C2C community. Check out what JC said to attack the question with some clarity below. We’ll update you with their progress as it becomes available! 3. What data visualization tools do JC and his team use?“Basically, the answer is we're using everything under the sun. We do have some Spotfire footprint, we have Tableau, we have Looker, and we have ClixSense. We also have custom development visualization developments,” he said.“My team is gravitating more towards Tableau, but we have to be mindful that whatever data architecture design we come up with, it has to be decoupled, flexible, and it has to be data engine and data visualization agnostic because we do get a request to support the next visualization,” he warned. Hear about how JC manages the overlap with Looker and Tableau and why he likes them both. Extra Credit JC and his team used the two articles from Thoughtworks, linked below, to inform their decision-making and what they used as a guide for modernizing their data architecture. He recommends checking them out. How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh by Zhamak Dehghani, Thoughtworks, May 2019 Data Mesh Principles and Logical Architecture by Zhamak Dehghani, Thoughtworks, December 2020 We want to hear from you! There is so much more to discuss, so connect with us and share your Google Cloud story. You might get featured in the next installment! Get in touch with Content Manager Sabina Bhasin at firstname.lastname@example.org if you’re interested.Rather chat with your peers? Join our C2C Connect chat rooms! Reach out to Director of Community Danny Pancratz at email@example.com.
This article was originally published on November 20, 2020.Hailed as one of the “Founding Fathers” of the internet for co-creating PHP, Andi Gutmans is just getting started. To discuss his new role at Google and the future of data, Gutmans joins C2C for a discussion in our sixth installment of our thought leadership series where we don’t hold back on both the fun and challenging questions. As a four-citizenship-holding and engineering powerhouse, Gutmans brings a global perspective to both tech and coffee creation.“I love making espresso and improving my latte art,” he mused. “I always say, if tech doesn’t work out for me, that’s where you’re going to find me.But, when he isn’t daydreaming about turning it all in to own a coffee shop and become a barista, he leads the operational database group as the GM and VP of engineering and databases at Google.“Our goal is building a strategy and vision that is very closely aligned with what our customers need,” he said. “Then, my organization works with customers to define what that road map looks like, deliver that, and then operate the most scalable, reliable, and secure service in the cloud.”It’s an enormous responsibility, but Gutmans and his team met the challenge to three steps: migration, modernization, and transformation. They accomplished this, even though they’ve never met in person—Gutmans started working at Google during the COVID-19 pandemic.Driven to support customers through their data journeys as they move to the cloud and transform their business, he digs into the how, the why, and more during the conversation, video above, but these are the five points you should know:Lift, Shift, TransformThe pandemic has changed the way everyone is doing business. For some, the change comes with accelerating the shift to the cloud, but Gutmans said most customers are taking a three-step journey into the cloud.“We’re seeing customers embrace this journey into the cloud,” he said. "They’re taking a three-step journey into the cloud. Migration, which is trying to lift and shift as quickly as possible, getting out of their data center. Then modernizing their workloads, taking more advantage of some of the cloud capabilities, and then completely transforming their business.”Migrating to the cloud allows customers to spend less time managing infrastructure and more time on innovating business problems. To keep the journey frictionless for customers, he and his team are working on a service called Cloud SQL. The service is a managed MySQL, PostgreSQL, and SQL server, for clarity. They also handle any regulatory requirements customers have in various geographies.“By handling the heavy lifting for customers, they have more bandwidth for innovation,” he said. “So the focus for us is making sure we’re building the most reliable service, the most secure service, and the most scalable service.”Gutmans described how Autotrader lifted and shifted into Google’s cloud SQL service and was able to increase deployment velocity by 140% year-over-year, he said. “So, there is an instant gratification aspect of moving into the cloud.”Another benefit of the cloud is auto-remediation, backups, and restoration. Still, the challenge is determining what stays to the edge and what goes into the cloud, and, of course, security. Gutmans said he wants to work with customers and understand their pain points and thought processes better.Modernizing sometimes requires moving customers off proprietary vendors and open-source-based databases, but the Gutmans team has a plan for that. By investing in partners, they can provide customers with assessments of their databases, more flexibility, and a cost reduction.Finally, when it comes to transformation, the pandemic has redefined the scope. A virtual-focused world is reshaping how customers are doing business, so that’s where a lot of Google’s cloud-native database investments have come in, such as Cloud Spanner, Cloud, BigQuery, and Firestore.“It's really exciting to see our customers make that journey,” he said. “Those kinds of transformative examples where we innovate, making scalability seamless, making systems that are reliable, making them globally accessible, we get to help customers, you know, build for [their] future,” he said. “And seeing those events be completely uneventful from an operational perspective is probably the most gratifying piece of innovating.”Gutmans adds that transformation isn’t limited to customers that have legacy data systems. Cloud-native companies may also need to re-architect, and Google can support those transformations, too.AI Is MaturingGartner stated that by 2022, 75% of all databases would be in the cloud, and that isn’t just because of the pandemic accelerating transformation. Instead, AI is maturing, and it is allowing companies to make intelligent, data-driven decisions.“It has always been an exciting space, but I think today is more exciting than ever,” Gutmans said. “In every industry right now, we’re seeing leaders emerge that have taken a digital-first approach, so it’s caused the rest of the industries to rethink their businesses.”Data Is Only Trustworthy if It’s SecureData is quickly becoming the most valuable asset organizations have. It can help make better business decisions and help you better understand your customer and what’s happening in your supply chain. Also, analyzing your data and leveraging historicals can help improve forecasting to better target specific audiences.But with all the tools improving data accessibility and portability, security is always a huge concern. But Gutmans’ team is also dedicated to keeping security at the fore.“We put a lot of emphasis on security—we make sure our customer’s data is always encrypted by default,” he said.Not only is the data encrypted, but there are tools available to decrypt with ease.“We want to make sure that not only can the data come up, [but] we also want to make it easy for customers to take the data wherever they need it,” Gutmans said.Even with the support through the tools Gutmans’ team is working to provide customers, the customer is central, and they have all the control.“We do everything we can to ensure that only customers can govern their data in the best possible way; we also make sure to give customers tight control,” he said.As security measures increase, new data applications are emerging, including fraud detection and the convergence of operational data and analytical systems. This intersection creates powerful marketing applications, leading to improved customer experience.“There are a lot of ways you can use data to create new capabilities in your business that can help drive opportunity and reduce risk,” Gutmans said.Leverage APIs Without Adding Complexity There are two kinds of APIs, as Gutmans sees it: administration API and then API for building applications.On the provisioning side, customers can leverage the DevOps culture and automate their test staging and production environments. On the application side, Gutmans suggests using the DevOps trend of automating infrastructure as code. He points to resources available here and here to provide background on how to do this.But when it comes to applications, his answer is more concise, “if the API doesn’t reduce complexity, then don’t use them.”“I don’t subscribe to the philosophy where, like, everything has to be an API, and if not...you’re making a mistake,” he added.He recommends focusing on where you can gain the most significant agility benefit to help your business get the job done.Final Words of WisdomGutmans paused and went back to the importance of teamwork and collaboration and offered this piece of advice:“Don’t treat people the way you want to be treated; treat people the way they want to be treated.”He also added that the journey is different for each customer. Just remember to “get your data strategy right.”
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.
Sorry, we're still checking this file's contents to make sure it's safe to download. Please try again in a few minutes.OK
Sorry, our virus scanner detected that this file isn't safe to download.OK