Configure highly available data and real-time analytics to optimize performance, improve decision-making, and unlock new value.
- 139 Topics
- 131 Replies
BigQuery: Configuring and scheduling queries
Are you having trouble creating and running scheduled jobs in BigQuery, and would like to implement it in your production and development pipelines? Check out this video where Google Cloud developer advocate will be discussing BigQuery's scheduled queries feature, and the IAM permissions needed to create and configure them. They will show you how to configure and schedule queries and the configuration options available. Watch this video to learn some important caveats and how to overcome them while using scheduled queries in BigQuery.Click on the video below to watch it in detail. Chapters:0:00 - Intro0:29 - Scheduling Queries: Prerequisites1:25 - Demo on creating and configuring scheduled queries4:30 - Caveat #1 - Scheduled Queries are run as per UTC time5:10 - Caveat #2 - Running scheduled query in a different region compared to destination table5:42 - Caveat #3 - Modifying credentials used to run scheduled queries6:00 - Further reading Scheduling queries → https://goo.gle/3uNKxYG
Google Cloud's Bruno Aziza makes sense of data and analytics in our accelerated times
Earlier this month, Bruno Aziza, Google Cloud's head of data and analytics attended the local Google Cloud Next event in Sydney, and joined us on iTWireTV to explain and analyse big data as we know it today - and tomorrow. The world of data has changed significantly, with data today having become highly distributed. It’s stored on-premises, in data warehouses and data marts, and in the cloud - both singular, and plural. Data is created by devices and applications that didn’t exist a few years ago, and today, needs to be transformed, augmented, analysed, and acted upon in real time.Bruno Aziza, head of data and analytics at Google Cloud, is focused on this in his role, supporting customers to navigate these challenges. In the video interview embedded below, Bruno discusses several pertinent topics around data and analytics, including:The challenges today’s CIOs are faced with when making sense of the data market The biggest shifts in the big data landscape since the pandemic Learnings w
Join at Google Data Cloud Live: Seattle
Google Data Cloud Live is coming to Seattle and will feature technology leaders and data professionals who will share insights on how you can capitalize on the next wave of data solutions. You’ll have the opportunity to learn about the many ways that you and your organization can make smarter decisions and solve complex challenges with key innovations in AI, machine learning, and analytics. Date: November 15, 2022 1:00 PM–5:30 PM PSTEvent Type: In-person event. LocationGoogle Seattle (SLU)1021 Valley StreetSeattle, WA 98109 Click on the link below to join the event.https://inthecloud.withgoogle.com/data-cloud-live-seattle-22/register.html
The art of effective factory data visualization
Suchitra Bose, Director of manufacturing and industrial solutions, and Simon Floyd, Director of manufacturing and transportation, Google cloud, discussed about new innovation implement on manufacturing, transportation and Smart Factory Transformers. In this new video series from Google Cloud, They tell outstanding and exciting stories of how manufacturers are in the midst of a transformation journey, digitizing production processes and reimagining customer experience using data and AI. Check out this episode to learn about the benefits of breaking down data silos and using low/no-code data visualization and analytics tools.They also explain how Scaling from pilot to program and describe how to create a new level of transparency and unlock cross-system use cases by connecting disparate systems from sensors to controllers. SCADA, MES and ERP.Rapidly generate new insights & make data-driven decisions on all levels by democratizing access to data & self-service tools for operators
What is Dataflow Prime?
Google cloud recently announce latest about Dataflow Prime, the next generation of Dataflow: Google Cloud’s truly unified batch and streaming data processing product. Dataflow Prime has two new features: vertical auto scaling and right fitting, which provide automatic fine tuning of memory and stage specific resource allocation. Plus, a new pricing model to simplify your billing.Click on the video below to watch it in details. Also don’t forget to check our C2C team member @ilias outstanding post about Dataflow prime. Chapters:0:00 - Intro1:02 - Understanding Dataflow2:02 - Vertical autoscaling2:52 - Example of vertical autoscaling3:44 - Horizontal autoscaling4:30 - Right fitting6:00 - Example of right fitting7:06 - Recap of new features7:41 - Simplified pricing9:04 - Wrap upExtra credit:What is Dataflow? → https://goo.gle/3RnUEwN Dataflow Product Page → https://goo.gle/3T2FL3Y Dataflow Documentation → https://goo.gle/3evJh7W Switch to Dataflow Prime → https://goo.gle/3MsLP3s Change A
Dataflow, the backbone of data analytics
Data is generated in real-time from websites, mobile apps, IoT devices,and other workloads.But, data from these systems is not often in the format that is conducive for analysis or for effective use, by downstream systems.That’s where Dataflow comes in! Dataflow is a serverless, fast and cost-effective service that supports both stream and batch processing.You can read more hereGoogle Cloud Blog
Get started with Looker SSO Embedding
SSO embedding with the Embed SDK allows users secure access to view the embedded content without manually logging into Looker. In addition, the Embed SDK manages the iframe HTML element for you! Watch along and learn how the Embed SDK creates a seamless embed user experience.Click on the video below to watch it in detail: Chapters:0:00 - Intro0:52 - Knowledge check1:13 - What is the SSO embed URL?2:13 - How to create the SSO embed URL5:10 - SSO embedding flow overview7:29 - How the Embed SDK simplifies implementation9:14 - Wrap up Extra Credit:SSO embedding getting started docs → https://goo.gle/3SpYbLK SSO embedding with Embed SDK getting started docs → https://goo.gle/3StQe8u Embed SDK Repository → https://goo.gle/3E7Bdoz Watch more episodes of Embedding Looker → https://goo.gle/EmbeddingLooker
Troubleshoot streaming job not scaling down in Dataflow
Would you like to learn how to troubleshoot autoscaling issues in Dataflow ? In particular, are you observing that your streaming jobs are not scaling down? Would you like to know more about the autotuning features in Cloud Dataflow? Check out this video below to learn about the Autotuning features Horizontal autoscaling, Vertical autoscaling, and Dynamic Work Rebalancing. Watch this video to learn about the possible causes for streaming jobs not scaling down and how to resolve the same. It also covers the streaming engine and its benefits.Click on video below to watch it in detail: Chapters:0:00 - Intro0:16 - Autotuning features1:44 - Problem: Streaming job not scaling down1:59 - Validation of the criteria for autoscaling2:20 - Limitations of streaming appliance2:34 - Example of limitations imposed by streaming appliance3:05 - Cause for streaming job not scaling down3:20 - Solution for the problem3:33 - Benefits of using the streaming engine3:57 - Documentation and conclusion Extra cr
Lecture on Data Reliability Engineering (DRE) for better Data-driven decision making
Every company wants to be a Data-Driven Organization, but the quality of the Input Data & Data Ingestion Pipelines is too often too low, which puts a CXO's dream of Data-driven decision-making in jeopardy. That is why the interest in Data Observability (a.k.a. Data Reliability Engineering) is on the rise and why we invited Egor Gryaznov – co-founder of Bigeye (leading Data Observability platform), to go deep into DRE topics at the next Serverless Toronto meetup, as we continue to untangle Modern Data Stack “spaghetti”.Egor will cover:DRE historical context – Data Reliability Engineering roots in Google SRE (Site Reliability Engineering), all the Modern Data Stack acronyms – and their relationships, current DIY & vendor choices – for different use cases, similarities & differentiation between Data Observability products – something vendors rarely talk about, so you're in for a treat!Don't be left in the dark; RSVP here for Egor's online lecture on Sep 21 at 6pm EST and bring
Introduction to Datastream for BigQuery
Datastream is a serverless and easy-to-use change data capture and replication service that makes it easy to replicate data from operational databases into BigQuery reliably and with minimal latency. In this video, Gabe Weiss, Developer Advocate at Google, discusses setting up real-time replication from Cloud SQL to BigQuery. Watch along and learn how to get started with Datastream for BigQuery!Click on the video below to watch it in details: Chapters:0:00 - Intro0:10 - What is Datastream?0:59 - Demo: getting started with Datastream4:47 - Wrap up Datastream → https://goo.gle/3d4MWZN
Data visualization with Data Studio
Here to bring you the latest news in the Startup program by Google Cloud are Lydia López and Joris Fayard! In this video they show you Data visualization with Data Studio.Click on the video below to watch it in detail. Chapters:0:00 - Intro0:38 - Agenda1:00 - Overview of Data Lifecycle2:30 - What is Data Studio2:44 - How Data Studio works6:50 - [Demo] Data Studio Template Gallery8:18 - [Demo] Creating a Data Source9:46 - [Demo] Building a Report14:55 - [Demo] Sharing a Report15:13 - Customer Success Story15:45 - Wrap up!16:08 - Want to find out more?16:46 - Coming next Extra Credit:Learn more about Data Studio → https://goo.gle/3DalbJP Check Data Studio Help Center → https://goo.gle/3TXRH8f Join the Data Studio Community → https://goo.gle/3DaSj4d Community Connector gallery → https://goo.gle/3QvVXZo Find more about using Data Studio Themes → https://goo.gle/3Bof6sa Manage Data Freshness → https://goo.gle/3Drt1PJ Data Explorer & Data Blending in Data Studio → https://goo.gle/3qpl7y
Data Lake Modernisation Fireside chat with ESG industry analyst, Nathan McAfee
Fireside chat with ESG industry analyst, Nathan McAfee. Getting the most from big data while reducing costs and complexityData is growing at an explosive rate, both in the velocity of new data creation and the constant addition of new data types and sources. This rapid change in stored data forces organisations to focus on the core fundamentals of data security and cost – too often ignoring the inefficiencies experienced when trying to use the data to enable future revenue. The result: an unrealised value stored in secondary data. Many have adopted on-prem Apache Hadoop to store and process this data, along with Apache Spark. While these solutions provide the base functionality needed, they quickly become limiting. In this video, learn how organizations can increase the value of insights queried in data while lowering costs and complexity by deploying Google Cloud Dataproc. Click on the video below to watch it in detail.
Great feature for #BigQuery Community, kudos #GoogleCloudCommunity.
No pipelines needed. Stream data with Pub/Sub direct to BigQuery. Great feature for #BigQuery Community, kudos #GoogleCloudCommunity.https://cloud.google.com/blog/products/data-analytics/pub-sub-launches-direct-path-to-bigquery-for-streaming-analytics
Google Colab: Issue Updating Data Links In Excel After Python Dataframe Export
SituationI'm working on a data project integrating python in Google Colab and Excel 365 on Win 8.1. My python code collects new data updates on a regimented schedule and then exports/writes (e.g. overwrites, not appends the data) like to a report on an Excel spreadsheet. I have no issue getting this to work going to a standalone spreadsheet.I know I could potentially do all this in Python and not use Excel at all, but I prefer not to reinvent the wheel and not spend hours hardcoding all the formulas and links already existing in Excel. GoalMy goal is to: Use new data from my Colab export to populate/overwrite a data table on Sheet A in an existing Excel workbook. Then I have a separate Sheet B in the same Excel workbook performing calculations via pre-existing links connecting to the original data table on Sheet A. I then want the links to auto update each time my python export updates the data table on the first sheet. ProblemThe issues I am running into are that if I use the df.to_
Data transformation in BigQuery
Hello All, I need to a build transformation to load a target table from 80+ source tables (all source tables are staged into BigQuery). From the 80+ tables, 30+ tables will be refreshed every 1 hr from their source systems and remaining will be once a day. Now the target table should be refreshed every 1 hr as soon as data is staged and the target table will have 350+ columns and hourly source data volume will be 40M ( 90% data will be changes to existing data and 10% or less will be new data) and over all table volume will be around 3B records. Any best practices or suggestion on designing transformation for this scenario ? Thanks,
Uswitch: Digging deeper with Google Vertex AI
Uswitch uses data science to adapt to a competitive marketplace and is passing on the lessons it's learning to its wider RVU group of brands, empowering people to make better purchasing decisions.Uswitch started its migration to Google Cloud in 2015 after deciding to leave its current cloud infrastructure. The company had previously used on-premises Hadoop clusters, but this configuration started to have issues, especially with performance and maintenance.Uswitch had thought that switching to a different cloud provider would alleviate some of these issues, but that turned out to be untrue. The Data team was intrigued with BigQuery's capabilities and decided to give it a try.Find out how USwitch overcome their data problem with google vertex AI:"We would love to transform every part of the product selection journey that our customers go through. Whether that’s understanding more about them or building algorithms that work out the best deals. And Google Cloud is enabling us to do just th
Big query and Looker: How to help to Keeping track of shipments minute by minute for Mercado Libre
Keeping track of shipments minute by minute: How Mercado Libre uses real-time analytics for on-time delivery Learn how the Shipping Operations team at Mercado Libre used BigQuery and Looker to increase delivery coverage and speed by delivering near real-time data monitoring and analytics for their transportation network and enabling data analysts to build, embed, and disseminate relevant insights. Read more :https://cloud.google.com/blog/products/data-analytics/how-mercado-libre-uses-real-time-analytics-for-on-time-delivery
[Article] Query BIG with BigQuery: A cheat sheet
Hey Folks! This article, written by Priyanka Vergadia (Developer Advocate), was originally published on the Google Cloud Tech Blog. For more #GCPSketchnote, follow the GitHub repo. For similar cloud content follow Priyanka on Twitter @pvergadia and keep an eye out on thecloudgirl.dev. Organizations rely on data warehouses to aggregate data from disparate sources, process it, and make it available for data analysis in support of strategic decision-making. BigQuery is the Google Cloud enterprise data warehouse designed to help organizations to run large scale analytics with ease and quickly unlock actionable insights. You can ingest data into BigQuery either through batch uploading or by streaming data directly to unlock real-time insights. As a fully-managed data warehouse, BigQuery takes care of the infrastructure so you can focus on analyzing your data up to petabyte-scale. BigQuery supports SQL (Structured Query Language), which you’re likely already familiar with if you've worked wi
How are you detecting data quality issues?
Hi,We are building an open source Data Observability tool dqo.ai and we wondering what kind of data quality issues we should focus on. So far we are able to monitor timeliness (if the data is fresh), validity (values in columns not meeting requirements), consistency (anomalies how the number of rows is growing over time), uniqueness.I will appreciate the feedback about the typical data quality issues that are worth continuous monitoring.The code is open source on github: https://github.com/dqoai/dqo Best Regards, Piotr
👉 Don't miss the Google Cloud Data Engineer Spotlight on July 20!
Join the Data Engineer Spotlight to connect with your fellow data engineers and Google Cloud experts like Priyanka Vergadia, learn best practices of Google Cloud’s data analytics tools, play a hands-on Cloud Hero game (with prizes at stake!), and more. Want to show how proud you are to be in the data cloud community? Reserve your seat now! Topics include:How Google Cloud is helping the data engineer community solve for increasing data complexity Manage complex SQL workflows in BigQuery using Dataform CLI Build unified batch and streaming pipelines on popular ML frameworks Data Warehouse migrations to BigQuery made easy with BigQuery Migration Service Manage and govern distributed data with Dataplex
How to Learn Data Science Online
Often called the sexiest career of the twenty-first century and the fastest-growing field in tech right now – the effect of Data Science in today's world may seem more mysterious and a difficult field to venture into for those outside the bounds technical field. Let us discover the best sources to learn data science online. To begin let us first explain what data science is and what the job entails.Data Science explainedData science is a subject that entails the gathering and analysis of data – both structured and unstructured – in order to generate insights and information that can be used by businesses to develop effective strategies. Patterns can assist data scientists to detect trends and provide recommendations to stakeholders that will help them locate new market opportunities, improve efficiency, lower costs, and gain a competitive advantage in their sector by collecting and analyzing data over time.Why should you study data science?A large amount of data is generated every day
⭐️ Google Cloud Data Heroes Series: Meet Antonio, a Data Engineer from Lima, Peru 🇵🇪
Hi C2C community! I’m Grace, an Associate Product Marketing Manager on the Google Cloud Data Analytics team, flying in with the second Google Cloud Data Heroes Series blog post ☁️In the Google Cloud Data Heroes series, we share stories of the everyday heroes who use our data analytics tools to do incredible things ⭐️ 🤩 In this edition, you'll meet our hero, Antonio, a lead data engineer from Peru 🇵🇪 who built an e-commerce recommendation system on BigQuery, Dataflow, and Cloud Pub/Sub. Outside of work, he is building the next generation of data enthusiasts in Peru through teaching at universities, organizing Data Day conferences, and writing Medium articles.Interested in being the next Google Cloud Data Hero? Fill out this form!
Already have an account? Login
Social LoginLogin With Your C2C Credentials
Login to the community
Social LoginLogin With Your C2C Credentials
Enter your username or e-mail address. We'll send you an e-mail with instructions to reset your password.