CCExtractor version: 0.94
Necessary information
- Is this a regression (i.e. did it work before)? {NO}
- What platform did you use? {Linux}
- What were the used arguments?
{}
Video links
channel5-2018-02-12.ts from the TV Samples page
Additional information
ccextractor tries to load tesseract traineddata from a wrong location then blames it on the TESSDATA_PREFIX. Here's the output it produces:
Opening file: /home/ibrahim/Downloads/channel5-2018-02-12.ts
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
Error opening data file /usr/share/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Failed TessBaseAPIInit4 -1
I checked the logic in ocr.c and found that probe_tessdata_location works fine by tracing the syscalls it makes to each possible tessdata location by running strace -e trace=openat ./ccextractor ~/Downloads/channel5-2018-02-12.ts and the result is as follows:
Opening file: /home/ibrahim/Downloads/channel5-2018-02-12.ts
openat(AT_FDCWD, "/home/ibrahim/Downloads/channel5-2018-02-12.ts", O_RDONLY) = 3
File seems to be a transport stream, enabling TS mode
Analyzing data in general mode
openat(AT_FDCWD, "./tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/tessdata/", O_RDONLY|O_NONBLOCK|O_CLOEXEC|O_DIRECTORY) = 4
openat(AT_FDCWD, "/usr/share/eng.traineddata", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/eng.traineddata", O_RDONLY) = -1 ENOENT (No such file or directory)
Error opening data file /usr/share/eng.traineddata
Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory.
Failed loading language 'eng'
Tesseract couldn't load any languages!
Failed TessBaseAPIInit4 -1
It checks the paths correctly and stops when finding it at /usr/share/tessdata/ so I suspect the problem is possibly in the TessBaseAPIInit4 call.
Also for full reference, here's the complete output of ccextractor --version on my setup:
Version: 0.94
Git commit: b1cbfcea9b9c687143bf0d80bc179b563e99d025
Compilation date: 2023-03-10
CEA-708 decoder: Rust
File SHA256: 03bf3b76ff69b73e18166558675278cae9b91f52acce532b80a480c6920b87f4
Libraries used by CCExtractor
Tesseract Version: 5.3.0
Leptonica Version: leptonica-1.82.0
libGPAC Version: 1.0.1
zlib: 1.2.11
utf8proc Version: 2.4.0
protobuf-c Version: 1.3.1
libpng Version: 1.6.37
FreeType
libhash
nuklear
libzvbi
CCExtractor version: 0.94
Necessary information
{}Video links
channel5-2018-02-12.ts from the TV Samples page
Additional information
ccextractor tries to load tesseract traineddata from a wrong location then blames it on the TESSDATA_PREFIX. Here's the output it produces:
I checked the logic in
ocr.cand found thatprobe_tessdata_locationworks fine by tracing the syscalls it makes to each possible tessdata location by runningstrace -e trace=openat ./ccextractor ~/Downloads/channel5-2018-02-12.tsand the result is as follows:It checks the paths correctly and stops when finding it at
/usr/share/tessdata/so I suspect the problem is possibly in theTessBaseAPIInit4call.Also for full reference, here's the complete output of
ccextractor --versionon my setup: