Dataflow and Composer or Compute engine | C2C Community
Solved

Dataflow and Composer or Compute engine

  • 18 November 2021
  • 7 replies
  • 61 views

Hi guys, I would like to get your perspective on my case. I have many jobs that I need to run on apache beam and I need to schedule them with Airflow. Is it better to use dataflow and Cloud Composer or buy a VM on compute engine and run my job through python (apache beam) and install my airflow in that VM? 

icon

Best answer by deok 19 November 2021, 15:44

View original

7 replies

Userlevel 7
Badge +39

Hi @TomNom,

I’m moving your question to the Infrastructure group.

From the two options you give us, I would choose the first one. But let's see what others will suggest.

@antoine.castex @guillaume blaquiere @tom do you have any suggestions?

Userlevel 5
Badge +10

Hi @TomNom , welcome to the forum. Great Question!

@ilias already mentioned an option and tagged some well known members of the community that can help.

I’ll reach some Customer Engineers at Google so they can add other opinions to your question.

Keep you posted!

Hi @TomNom, I’m a customer engineer with Google Cloud. You will save time and energy using serverless(fully-managed) solutions like Dataflow, and Composer.

 

A bit of backstory on this is that in January 2016 Google began the process of fully committing to open-source by donating the Dataflow SDK and the associated computation model to the Apache Software Foundation as Apache Beam. This means that Dataflow as a product is an Apache Beam runner but goes further in providing you with the convenience of not worrying about the infrastructure. Your performance will be better in Dataflow, and you won’t have to manage any infrastructure.

 

The same point is made for Airflow and Cloud Composer. Both of these tools will allow you to focus on the value-added part, the data processing.

Follow this doc to learn more about launching Dataflow pipelines with Composer: https://cloud.google.com/composer/docs/how-to/using/using-dataflow-template-operator

@deok Thank you for your answer. But in terms of price, which one is the cheapest? And I recently discovered the cloud scheduler. This leaves me with 2 options to choose from in terms of performance, ease of use and also price: Dataflow + Composer or Dataflow + Scheduler?

Userlevel 7
Badge +39

Hi @TomNom,

have you read this article? It will help you choose between the two.

@ilias thank you! I think Dataflow + Composer is more suitable for me according to the article you share. Is there a big difference in price between these solutions?

Userlevel 7
Badge +39

@TomNom You can always can get an estimation of your billing using the Google Cloud Pricing Calculator.

 

Reply