I have some old xls files that I am parsing. In some of them, when parse them with python-calamine, have wrong values in the first two columns.
workbook = CalamineWorkbook.from_path(str(path))
for sheet_name in workbook.sheet_names:
rows_2d = workbook.get_sheet_by_name(sheet_name).to_python(
skip_empty_area=True,
)
print(rows_2d[0][0:3])
outputs
[0.001, 22.48, ''].
This is not the right content. If one looks at all fields, it shows that the values in the first two columns are always wrong. They should be mixed types, but calamine returns floats that look like they come from elsewhere in the file.
In comparison with xlrd:
book = xlrd.open_workbook(str(path))
for sh in book.sheets():
for rx in range(sh.nrows):
print(sh.row(rx)[0:3])
break
outputs
[text:'<correct text>', empty:'', empty:'']
which is correct (I replaced the string).
When I open the xls in Excel and save it again, this behavior goes away (they also become smaller for some reason). Therefore I can not "censor" it. Those are company files so I can not upload one here uncensored.
Is there something we can do about this?
I have some old xls files that I am parsing. In some of them, when parse them with python-calamine, have wrong values in the first two columns.
outputs
[0.001, 22.48, ''].This is not the right content. If one looks at all fields, it shows that the values in the first two columns are always wrong. They should be mixed types, but calamine returns floats that look like they come from elsewhere in the file.
In comparison with xlrd:
outputs
[text:'<correct text>', empty:'', empty:'']which is correct (I replaced the string).
When I open the xls in Excel and save it again, this behavior goes away (they also become smaller for some reason). Therefore I can not "censor" it. Those are company files so I can not upload one here uncensored.
Is there something we can do about this?