We have a legacy process, built on Windows, which I’m trying to figure out how to migrate to a GCP service. I’m not sure what GCP process would be best to use and hoping the community here can point me in the right direction.
It essentially calls an API which generates a CSV report, then downloads it.
The process connects to a WCF-based service via https. Currently it does so via a simple app created in .NET, C#, and Visual Studio. The app is called via batch file with parameters passed to
- indicate the endpoint/report on the remote server and
- indicate the folder location on the Windows server to save the CSV file
My questions are
- How can I replicate this process in GCP?
- Can the CSV files be loaded into a database with add’l attributes (e.g. date of report generation) or do the CSV files need to be stored in a storage bucket?
I’m probably missing something but don’t know what I don’t know. Thanks in advance for any help!
Best answer by GabeWeissView original
Forgive my lack of technical description here, but my concept is to use some GCP service/product that will run the basic process of retrieving these CSV files but can take different input parameters. The original CSV files are stored in a storage bucket for retention/backup and the data in the CSV file is loaded into a database so it can be accessed by our BI tool.
No worries, that description is perfect.
So, the answer really is “It depends”. :)
If you wanted to do a full-on transformation of the process, there are a number of serverless options that could replace a couple pieces of the pipeline to give a more Cloud-native approach to the problem. If/when you wanted to go that way that’s probably a much longer conversation.
There’s also the option of just running a virtual machine in Google Compute Engine, and shifting everything you’re running locally into a Cloud-managed VM instead. Then you really do have basically exactly what you have now, but just running in the Cloud.
Then there’s what I think I’d suggest, which is a bit of a middle ground:
The easiest way I think is going to be to run the application in Cloud Run, which uses containers.
If it were me, step one would be to get the process generalized a bit so that it can run in a container on Linux (Cloud Run doesn’t support Windows currently. You could use Kubernetes instead as Google Kubernetes Engine DOES support Windows containers, but the cliff to learn k8s is a lot higher than Cloud Run, so it just depends on your timetable and desire to learn new tech).
Then you can deploy that container to Cloud Run (it’s serverless and scales to zero so you aren’t charged for it as long as it’s not running). Cloud Scheduler is then the replacement for the windows task scheduler. It does the same thing, calling “a thing” on a schedule (it’s basically a managed cron process).
Last step is to get the Cloud Run job to write the CSV out to Google Cloud Storage. Then your BI tool can be configured to pull it down from GCS and read it.
There’s lots of other ways to do something like this as well...like eliminating the CSV entirely, and have the Cloud Run job write to a database, and hook the BI tool up to the database instead. That has the advantage of history, where if you wanted you could start to do long-term analytics on the CSV data if that’s valuable, etc. If it’s in Cloud Storage then it’s also not lost, so you could totally decide at a later date to mass import that into a database (e.g. Cloud SQL which is managed MySQL/PostgreSQL/SQL Server) to be used later.
@GabeWeiss for the detailed response!
The current process is running on a Compute VM already. So, I’m going to investigate Cloud Run and Cloud Scheduler further. I agree that output to a database makes more sense.
I really appreciate the help and direction you’ve provided. Thanks again!