-
Notifications
You must be signed in to change notification settings - Fork 179
Initial TDSX support. #44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
9a8f0d1
bd6d3c9
8385b5c
cbcda8f
813e079
be4c9c8
a885f15
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| import contextlib | ||
| import os | ||
| import shutil | ||
| import tempfile | ||
| import zipfile | ||
|
|
||
| import xml.etree.ElementTree as ET | ||
|
|
||
|
|
||
| @contextlib.contextmanager | ||
| def temporary_directory(*args, **kwargs): | ||
| d = tempfile.mkdtemp(*args, **kwargs) | ||
| try: | ||
| yield d | ||
| finally: | ||
| shutil.rmtree(d) | ||
|
|
||
|
|
||
| def find_file_in_zip(zip, ext): | ||
| for filename in zip.namelist(): | ||
| if os.path.splitext(filename)[-1].lower() == ext[:-1]: | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fairly hacky way to determine the right content type (TWB or TDS) we're looking for.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isn't there a way to get the file stream directly from the zipfile without having to extract it? It might be better to inspect the first tag of the file rather than rely on file extension. Maybe I'll just open an issue for investigating that as an option. There is! zipfile.open allows you to open the file without extracting the zip file.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good call, I'll look into it, along with your later suggestion of when to extract vs open directly. Originally my brain didn't want to wrap itself around how to deal with the read-only stream and writing to files, but now that I've had coffee...
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I came up with this, and it works, should protect against a random XML file and bad file extensions:
Not sure how much better than checking the ext it is though
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I prefer this because it's based on the actual content of the file rather than the filename/extension which I always find to be better by some definition of better. |
||
| return filename | ||
|
|
||
|
|
||
| def get_xml_from_archive(filename): | ||
| file_type = os.path.splitext(filename)[-1].lower() | ||
| with temporary_directory() as temp: | ||
| with zipfile.ZipFile(filename) as zf: | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since we reextract this when saving the changes, why not move this to use zipfile.open on the archive file directly, avoiding having multiple tempdirs and not needing to extract the extract (oi) unless the user is saving the file. |
||
| zf.extractall(temp) | ||
| xml_file = find_file_in_zip(zf, file_type) | ||
| xml_tree = ET.parse(os.path.join(temp, xml_file)) | ||
|
|
||
| return xml_tree | ||
|
|
||
|
|
||
| def build_archive_file(archive_contents, zip): | ||
| for root_dir, _, files in os.walk(archive_contents): | ||
| relative_dir = os.path.relpath(root_dir, archive_contents) | ||
| for f in files: | ||
| temp_file_full_path = os.path.join( | ||
| archive_contents, relative_dir, f) | ||
| zipname = os.path.join(relative_dir, f) | ||
| zip.write(temp_file_full_path, arcname=zipname) | ||
|
|
||
|
|
||
| def save_into_archive(xml_tree, filename, new_filename=None): | ||
| # Saving a archive means extracting the contents into a temp folder, | ||
| # saving the changes over the twb in that folder, and then | ||
| # packaging it back up into a specifically formatted zip with the correct | ||
| # relative file paths | ||
|
|
||
| if new_filename is None: | ||
| new_filename = filename | ||
|
|
||
| # Extract to temp directory | ||
| with temporary_directory() as temp_path: | ||
| file_type = os.path.splitext(filename)[-1].lower() | ||
| with zipfile.ZipFile(filename) as zf: | ||
| twb_file = find_file_in_zip(zf, file_type) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. twb_file should be "tableau_file" or "xml_file" or something. |
||
| zf.extractall(temp_path) | ||
| # Write the new version of the twb to the temp directory | ||
| xml_tree.write(os.path.join( | ||
| temp_path, twb_file), encoding="utf-8", xml_declaration=True) | ||
|
|
||
| # Write the new archive with the contents of the temp folder | ||
| with zipfile.ZipFile(new_filename, "w", compression=zipfile.ZIP_DEFLATED) as new_archive: | ||
| build_archive_file(temp_path, new_archive) | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -3,49 +3,11 @@ | |
| # Datasource - A class for writing datasources to Tableau files | ||
| # | ||
| ############################################################################### | ||
| import contextlib | ||
| import os | ||
| import shutil | ||
| import tempfile | ||
| import zipfile | ||
|
|
||
| import xml.etree.ElementTree as ET | ||
| from tableaudocumentapi import Connection | ||
|
|
||
|
|
||
| @contextlib.contextmanager | ||
| def temporary_directory(*args, **kwargs): | ||
| d = tempfile.mkdtemp(*args, **kwargs) | ||
| try: | ||
| yield d | ||
| finally: | ||
| shutil.rmtree(d) | ||
|
|
||
|
|
||
| def find_tds_in_zip(zip): | ||
| for filename in zip.namelist(): | ||
| if os.path.splitext(filename)[-1].lower() == '.tds': | ||
| return filename | ||
|
|
||
|
|
||
| def get_tds_xml_from_tdsx(filename): | ||
| with temporary_directory() as temp: | ||
| with zipfile.ZipFile(filename) as zf: | ||
| zf.extractall(temp) | ||
| tds_file = find_tds_in_zip(zf) | ||
| tds_xml = ET.parse(os.path.join(temp, tds_file)) | ||
|
|
||
| return tds_xml | ||
|
|
||
|
|
||
| def build_tdsx_file(tdsx_contents, zip): | ||
| for root_dir, _, files in os.walk(tdsx_contents): | ||
| relative_dir = os.path.relpath(root_dir, tdsx_contents) | ||
| for f in files: | ||
| temp_file_full_path = os.path.join( | ||
| tdsx_contents, relative_dir, f) | ||
| zipname = os.path.join(relative_dir, f) | ||
| zip.write(temp_file_full_path, arcname=zipname) | ||
| from tableaudocumentapi import Connection, archivefile | ||
|
|
||
|
|
||
| class ConnectionParser(object): | ||
|
|
@@ -99,34 +61,11 @@ def from_file(cls, filename): | |
| "Initialize datasource from file (.tds)" | ||
|
|
||
| if zipfile.is_zipfile(filename): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this pattern repeated? Seems like this could be a good candidate for a helper function. |
||
| dsxml = get_tds_xml_from_tdsx(filename).getroot() | ||
| dsxml = archivefile.get_xml_from_archive(filename).getroot() | ||
| else: | ||
| dsxml = ET.parse(filename).getroot() | ||
| return cls(dsxml, filename) | ||
|
|
||
| def _save_into_tdsx(self, filename=None): | ||
| # Save reuses existing filename, 'save as' takes a new one | ||
| if filename is None: | ||
| filename = self._filename | ||
|
|
||
| # Saving a tdsx means extracting the contents into a temp folder, | ||
| # saving the changes over the tds in that folder, and then | ||
| # packaging it back up into a specifically formatted zip with the correct | ||
| # relative file paths | ||
|
|
||
| # Extract to temp directory | ||
| with temporary_directory() as temp_path: | ||
| with zipfile.ZipFile(self._filename) as zf: | ||
| tds_file = find_tds_in_zip(zf) | ||
| zf.extractall(temp_path) | ||
| # Write the new version of the tds to the temp directory | ||
| self._datasourceTree.write(os.path.join( | ||
| temp_path, tds_file), encoding="utf-8", xml_declaration=True) | ||
|
|
||
| # Write the new tdsx with the contents of the temp folder | ||
| with zipfile.ZipFile(filename, "w", compression=zipfile.ZIP_DEFLATED) as new_tdsx: | ||
| build_tdsx_file(temp_path, new_tdsx) | ||
|
|
||
| def save(self): | ||
| """ | ||
| Call finalization code and save file. | ||
|
|
@@ -142,7 +81,7 @@ def save(self): | |
| # save the file | ||
|
|
||
| if zipfile.is_zipfile(self._filename): | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ditto, this sees like it would be ripe for a helper function. |
||
| self._save_into_tdsx(self._filename) | ||
| archivefile.save_into_archive(self._datasourceTree, self._filename) | ||
| else: | ||
| self._datasourceTree.write( | ||
| self._filename, encoding="utf-8", xml_declaration=True) | ||
|
|
@@ -159,7 +98,8 @@ def save_as(self, new_filename): | |
|
|
||
| """ | ||
| if zipfile.is_zipfile(self._filename): | ||
| self._save_into_tdsx(new_filename) | ||
| archivefile.save_into_archive( | ||
| self._datasourceTree, self._filename, new_filename) | ||
| else: | ||
| self._datasourceTree.write( | ||
| new_filename, encoding="utf-8", xml_declaration=True) | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'archivefile' is obviously a terrible name
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hah, containerfile? It's only a zip archive because of implementation details. Ultimately, it's a container file format.
(also, I'm horrible at naming things)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containerfile works for me.
<Insert joke about naming things is hard here / >
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what exactly are we trying to convey here? just that there is some kind of container (i.e. TDSX or TWBX) vs. a single file (TDS or TWB)?
if these containers will always end in "x" then let's call this xfile.py :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh damn, @benlower with the winning name! we should go with xfiles