File tree Expand file tree Collapse file tree
Expand file tree Collapse file tree Original file line number Diff line number Diff line change @@ -15,37 +15,22 @@ This module is designed to run as simply as possible. Just provide the file loc
1515
1616```
1717from TextSpitter import TexSpitter as TS
18- import sqlite3
19-
20-
2118folder_loc = 'foo/bar/'
2219
23- # doc_file = folder_loc + 'file_thing.doc'
2420docx_file = folder_loc + 'file_thing.docx'
2521pdf_file = folder_loc + 'file_thing.pdf'
2622text_file = folder_loc + 'file_thing.txt'
2723
2824doc_tup = (docx_file, pdf_file, text_file)
29- # doc_tup = (doc_file, docx_file, pdf_file, text_file)
30-
31- # SQL code to write to database
32- conn = sqlite3.connect('example_db')
33- c= conn.cursor()
34-
35- STMNT = 'INSERT INTO doc_contents VALUE %s'
3625
37- # For Loop code to insert doc content into db
38- for ele in doc_tup:
39- text = TS(ele)
40- c.executemany(STMNT, text)
41- print('Done! Wrote the following to db: %s', (text[:25]))
26+ raw_text_payload = [TS(ele) for ele in doc_tup]
27+ text = '\n'.join(raw_text_payload)
28+ return text
4229```
4330
4431## TO DOs ##
45- * [x] push to github
46- * [x] Remove .doc support due to legacy format's extensive proprietary reqs
47- * [ ] spruce up documentation
48- * [ ] solicit feedback
32+ * [x] spruce up documentation
33+ * [X] Add stream functionality for s3-based file reading
4934* [ ] expand functionality to other file types
5035* [ ] TDB
5136
You can’t perform that action at this time.
0 commit comments