Long-running job on GCP cloud run | C2C Community
Solved

Long-running job on GCP cloud run

  • 10 January 2023
  • 8 replies
  • 94 views

Userlevel 1

Hello,

We are running the backend of a web app in Cloud Run. We have a task that takes a lot of time (~15-30 mins) to execute and timeouts. 

Is there any way to run it in Cloud Run?

Other solutions I found on the web:

In a perfect world we could use the same Cloud Run service to solve our two use-cases, the web API and the job. Maybe with a custom http header?

Thanks for your help,

Louis Sanna.
 

icon

Best answer by guillaume blaquiere 10 January 2023, 14:37

View original

8 replies

Userlevel 7
Badge +65

@guillaume blaquiere can you help here? Or maybe you @antoine.castex ?

That's a good problem for you to test your skills @malaminWhat do you think? 

Userlevel 6
Badge +15

Hello Louis.

 

Thanks for your question. A WebApp can serve long request (up to 15/30 minutes) without any problem by setting the correct timeout parameter on Cloud Run.

Most of the time, the question of the trigger is the most important: What will start the processing?

If it’s an external HTTP request, no problem, the requester will wait. You simply have not to send the response before the end of the processing.

BUT, be careful: if you use an HTTPS load balancer, the HTTP/1 session can’t exceed 30 minutes. Prefer HTTP/2 or streaming/websocket solution to avoid the issue.

If it’s a PubSub push subscription (or an eventarc) you are limited to 10 minutes before PubSub consider the message as timeout. There is no problem, the process will continue on Cloud Run, but you won’t be able to handle correctly the failure and the retries.

Same things for Cloud Task, but this time the max timeout is 30 minutes and could feat your requirements.
 

If you need longer timeout, you can imagine more complex solutions (I can detail if you want), you can also use Cloud Batch or Cloud Run Jobs.
But before, did you think about the fact to reduce the processing time? Parallelize processing? Add more CPUs?

Userlevel 7
Badge +35

Hello @Louis Sanna ,

Thank you for the question. 
It is rely on the maximum concurrent requests per instance and request timeout services and how you design the solution if others configuration setup is ok.

The timeout is set by default to 5 minutes and can be extended up to 60 minutes.

For a timeout longer than 15 minutes, Google recommends implementing retries and making sure the service is tolerant to clients re-connecting in case the connection is lost (either by ensuring requests are idempotent, or by designing request handlers in such a way that they can resume from the point where they left off). The longer the timeout is, the more likely the connection can be lost due to failures on the client side or the Cloud Run side. When a client re-connects, a new request is initiated and the client isn't guaranteed to connect to the same container instance of the service.

Thanks,  @guillaume blaquiere said well,

Cloud Batch or Cloud Run Jobs.
But before, did you think about the fact to reduce the processing time? Parallelize processing? Add more CPUs?

 

Also, you can check guillaume blaquiere answer on stackoverflow based on same use case. Hopefully it maybe help you to debug. 

https://stackoverflow.com/questions/68619012/what-is-the-current-maximum-timeout-for-a-gcp-cloud-run-app-invoked-by-cloud-sch

 

Also, Don’t forget to review your configuration, execution-environments, and use case for the implementation.

Userlevel 7
Badge +65

Hey @Louis Sanna 

have you checked both replies? I think that you have your answer!

Thank you @guillaume blaquiere and @malamin! 👏

Userlevel 7
Badge +35

You’re welcome, @ilias .

Userlevel 1

Hey @Louis Sanna 

have you checked both replies? I think that you have your answer!

Thank you @guillaume blaquiere and @malamin! 👏

Yes they are great, thanks @guillaume blaquiere and @malamin !

We had another concern that I forgot to mention in my initial question, namely that during the job execution latency for http requests send to the busy container may drop, so we considered giving the job its dedicated container. But we have never seen that case for the moment, so all is good.

In the end it looks like optimising the SQL queries used in the job is the best way to solve our issue, with no change needed on our infra. 

Userlevel 7
Badge +35

You’re welcome, @Louis Sanna.

Hello am a new member let me know more 

Reply