You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: TextSpitter/core.py
+15-9Lines changed: 15 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -15,7 +15,9 @@ def __init__(
15
15
filename: strorNone=None,
16
16
):
17
17
"""
18
-
The extractor wrapper will initialize by assinging the filename to the object's file property; if a file-like object is provided instead of a name, then a file_ext arg will be required.
18
+
The extractor wrapper will initialize by assinging the filename to the
19
+
object's file property; if a file-like object is provided instead of a
20
+
name, then a file_ext arg will be required.
19
21
"""
20
22
iffilename:
21
23
self.file=FileIO(filename)
@@ -26,7 +28,9 @@ def __init__(
26
28
self.file_ext=file_obj.name.split(".")[-1]
27
29
else:
28
30
raiseException(
29
-
"Your file object does not contain a name attribute. Please add a name attribute with a file extension, and try again. Need the file ext. data for mime-typing."
31
+
"Your file object does not contain a name attribute. Please"
32
+
" add a name attribute with a file extension, and try "
33
+
"again. Need the file ext. data for mime-typing."
30
34
)
31
35
32
36
@staticmethod
@@ -40,11 +44,13 @@ def get_contents(self):
40
44
returnf.read()
41
45
42
46
defPdfFileRead(self):
43
-
"""This current code provides a workaround in case MuPDF (a dependency for
44
-
PyMuPDF) is not usable in the development environment. For such instances,
45
-
the module relies on PyPDF2 to extract text data. However, because of the
46
-
likelihood of white spaces being rampant in the extracted string data,
47
-
those characters get filtered out."""
47
+
"""
48
+
This current code provides a workaround in case MuPDF (a dependency
49
+
for PyMuPDF) is not usable in the development environment. For such
50
+
instances, the module relies on PyPDF2 to extract text data. However,
51
+
because of the likelihood of white spaces being rampant in the
52
+
extracted string data, those characters get filtered out.
0 commit comments