Skip to content

Commit 2168f2d

Browse files
authored
Merge pull request #5 from shahid017/master
patch for extracting text from text files; typo fix in README.
2 parents 4d665d7 + e368ce8 commit 2168f2d

2 files changed

Lines changed: 5 additions & 4 deletions

File tree

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ This is my first python module, so I hope I did this well!
1414
This module is designed to run as simply as possible. Just provide the file location string data into the argument, and get your text returned to you.
1515

1616
```
17-
from TextSpitter import TexSpitter as TS
17+
from TextSpitter import TextSpitter as TS
1818
folder_loc = 'foo/bar/'
1919
2020
docx_file = folder_loc + 'file_thing.docx'
@@ -39,4 +39,4 @@ _*OH MY GOD, PLEASE DO.*_
3939

4040
Just make a pull request and add whatever you want (or fix whatever you want). I'll review and approve if everything seems good.
4141

42-
Thanks, everyone!
42+
Thanks, everyone!

TextSpitter/core.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ def PdfFileRead(self):
5252
import fitz
5353

5454
pdf_file = fitz.Document(stream=contents, filetype="pdf")
55-
raw_text = [ele.getText("text") for ele in pdf_file]
55+
raw_text = [ele.get_text("text") for ele in pdf_file]
5656
text = "".join(raw_text)
5757
# else:
5858
except Exception:
@@ -72,4 +72,5 @@ def DocxFileRead(self):
7272
return text
7373

7474
def TextFileRead(self):
75-
return self.get_contents()
75+
text = open(self.file).read()
76+
return text

0 commit comments

Comments
 (0)