Google Colab: Issue Updating Data Links In Excel After Python Dataframe Export | C2C Community
Solved

Google Colab: Issue Updating Data Links In Excel After Python Dataframe Export

  • 16 August 2022
  • 3 replies
  • 387 views

Situation

I'm working on a data project integrating python in Google Colab and Excel 365 on Win 8.1. My python code collects new data updates on a regimented schedule and then exports/writes (e.g. overwrites, not appends the data) like to a report on an Excel spreadsheet. 

I have no issue getting this to work going to a standalone spreadsheet.

I know I could potentially do all this in Python and not use Excel at all, but I prefer not to reinvent the wheel and not spend hours hardcoding all the formulas and links already existing in Excel. 

Goal

My goal is to:

  1. Use new data from my Colab export to populate/overwrite a data table on Sheet A in an existing Excel workbook.

  2. Then I have a separate Sheet B in the same Excel workbook performing calculations via pre-existing links connecting to the original data table on Sheet A. I then want the links to auto update each time my python export updates the data table on the first sheet.

Problem

The issues I am running into are that if I use the df.to_excel function to export the data and even if I use the spreadsheet name parameter, the export overrides the data table and names the tab okay, but wipes out any other pre-existing sheets within the same workbook.

 

So I attempted a work around by exporting to an external workbook and then trying to update the links in the second workbook automatically. Problem is the links don't appear to update without the source data file and the second workbook with the links both being manually opened and then the updated file manually saved.

 

Assistance

Does anybody know a way to use python in Google Colab to:

  1. Overwrite a specific sheet within an Excel workbook without wiping out the other existing sheets? And then have the links on another sheet automatically update which are connected to the new data?

Or

  1. Auto update external links between separate Excel workbooks while the files are unopened? 

Or

  1. Control an instance of excel that can open both files to allow the links to auto update and then save and close the files automatically? 

 

I found a post from some years ago that identified a win32 package hat appeared to be able to control instances of excel. When I try doing a pip install in Colab I got an error that the package was unrecognized or doesn't exist. 

Ideally, I would prefer not to use VB if at all possible to solve this. 

Any solutions are much appreciated.

Thanks in advance.

icon

Best answer by malamin 16 August 2022, 03:51

View original

3 replies

Userlevel 7
Badge +21

Hi @Osos ,

First of all, I want you a warm welcome in C2C. Then, Thank you for the question. C2C is a google customer community platform. It would be ideal if you introduce yourself to the following C2C template. It would be helpful for the community members to know about you.

 

 

 

 

What about your question:


As so far I understood your questions scenario:

You want to create a new excel sheet while new data push from python code. But the difficulty is your data is pulled sucessfully append into excel sheet but old data is wiped from the sheet.


Basically, I don’t know about your development environment setup and dependence. So,  I will just focus on the programming pseudocode scenario and give you some hints so that you will be able to debug your code and write the appropriate solution on your require python xls package.

Also, you didn’t mention here which python package or library are you using in colab to read and write xls sheet.

I try doing a pip install in Colab I got an error that the package was unrecognized or doesn't exist. 

# Example Importing a library that is not in Colaboratory
To import a library that's not in Colaboratory by default, you can use `!pip install` or `!apt-get install`.
# xlrd is needed to load excel files
!pip install xlrd

However, there are other package libraries that you may use.  you should follow the instructions on which one is best supported in the collab platform.

I'm describing in pseudocode since class and method or function names will change depending on the Python Xls package and library.

 

Auto update external links between separate Excel workbooks while the files are unopened? 

  • You need to add data into the worksheet directly as save data. if you want a new generate sheet will open automatically when you need to define the file open automatically method while generating a new sheet.
  • If you wish to continually add data to an existing sheet, you must add append data or a data function with a determined acceptable range of columns and rows.
  • If you want to generate a new sheet every time and have it open automatically, you should use an array loop with a dynamic variable in the sheet name.
    However, you must ensure that the file name and sheets are distinct.

Also you need to consider following function issue. but i mention here based on pandas.

  1. Add Bulk Data to an Excel Sheet. import pandas as pd. …
  2. Append Data at the End of an Excel Sheet. This code will append data at the end of an excel. ...
  3. Pandas, appending to next available line in excel.
  4. Add rows without overwrite (Python)
  5. Insert a panda's dataframe into existant excel.
  6. Write DataFrame in .xlsx without overwrite the file.

Example function:

  • Pandas 1.2.3
  • Openpyxl 3.0.5
import os
from openpyxl import load_workbook


def append_df_to_excel(filename, df, sheet_name='Sheet1', startrow=None,
truncate_sheet=False,
**to_excel_kwargs):
"""
Append a DataFrame [df] to existing Excel file [filename]
into [sheet_name] Sheet.
If [filename] doesn't exist, then this function will create it.

@param filename: File path or existing ExcelWriter
(Example: '/path/to/file.xlsx')
@param df: DataFrame to save to workbook
@param sheet_name: Name of sheet which will contain DataFrame.
(default: 'Sheet1')
@param startrow: upper left cell row to dump data frame.
Per default (startrow=None) calculate the last row
in the existing DF and write to the next row...
@param truncate_sheet: truncate (remove and recreate) [sheet_name]
before writing DataFrame to Excel file
@param to_excel_kwargs: arguments which will be passed to `DataFrame.to_excel()`
[can be a dictionary]
@return: None

Usage examples:

>>> append_df_to_excel('d:/temp/test.xlsx', df)

>>> append_df_to_excel('d:/temp/test.xlsx', df, header=None, index=False)

>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False)

>>> append_df_to_excel('d:/temp/test.xlsx', df, sheet_name='Sheet2',
index=False, startrow=25)

(c) [MaxU](https://stackoverflow.com/users/5741205/maxu?tab=profile)
"""
# Excel file doesn't exist - saving and exiting
if not os.path.isfile(filename):
df.to_excel(
filename,
sheet_name=sheet_name,
startrow=startrow if startrow is not None else 0,
**to_excel_kwargs)
return

# ignore [engine] parameter if it was passed
if 'engine' in to_excel_kwargs:
to_excel_kwargs.pop('engine')

writer = pd.ExcelWriter(filename, engine='openpyxl', mode='a')

# try to open an existing workbook
writer.book = load_workbook(filename)

# get the last row in the existing Excel sheet
# if it was not specified explicitly
if startrow is None and sheet_name in writer.book.sheetnames:
startrow = writer.book[sheet_name].max_row

# truncate sheet
if truncate_sheet and sheet_name in writer.book.sheetnames:
# index of [sheet_name] sheet
idx = writer.book.sheetnames.index(sheet_name)
# remove [sheet_name]
writer.book.remove(writer.book.worksheets[idx])
# create an empty sheet [sheet_name] using old index
writer.book.create_sheet(sheet_name, idx)

# copy existing sheets
writer.sheets = {ws.title:ws for ws in writer.book.worksheets}

if startrow is None:
startrow = 0

# write out the new sheet
df.to_excel(writer, sheet_name, startrow=startrow, **to_excel_kwargs)

# save the workbook
writer.save()

 

Click on following links that might be help you to overcome your issue.:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_excel.html

https://stackoverflow.com/questions/20219254/how-to-write-to-an-existing-excel-file-without-overwriting-data-using-pandas

 

Otherwise, You need to clarify which python class and library are you using in your case.

 

 

 

Good evening @malamin.

Thank you very much for the warm welcome to the community. That is very kind of you. I am glad to find the community. 

Also, thank you for the links. There are some very thoughtful questions in the introduction template that I will need to complete to introduce myself to the community.

Meanwhile, I really appreciate the prompt response and the excellent in-depth information. I will need to carefully review the information in your kind response and determine which would be a good route for my particular project. 

====

To answer your questions, the package I was trying to pip install was “Win32”. This is the package that was generating the error in Colab.

 

To answer your second question, the main challenge I am trying to figure out is how to automatically update links in one Excel Sheet that connect to data in another Excel Sheet without having to manually open and save the sheets.

For example, Sheet A contains a data table that is frequently updated by a python data push.

Excel Sheet B contains links and performs calculations based on data in Sheet A.

I am trying to figure out how to use python to automatically update the links in Sheet B (after each data push) and make them immediately effective without having to manually open and re-save the file.

Tomorrow I will review your recommendations in detail and see if I can get one of them to work.

Will let you know if I encounter further issues.

 

Thank you again for the warm welcome, prompt response and excellent potential solution(s).

With Much Respect,

 

 

Userlevel 7
Badge +16

Hello, @Osos ! I wanted to welcome you to the community and to get to ask you to go to the C2C Lounge, to present yourself using the classic template and everything, but it looks like @malamin was superfast again! 😅

 

In any case, please let us know if malamin’s answer helped you, when you get around it tomorrow, as you said. And it would be great if we could get to read more about you in the Lounge! 😉

Reply