New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Interpreter takes very long on page with rotated text #231
Comments
+1 |
I've used the profiler to find out which lines take most time to execute. It is this '.find()' method in the inline 'isany()' method, in the 'group_textboxes()' method of 'LTLayoutContainer' that takes 65% of the time! This method takes so long because the So you can fix your problem by using |
ok thanks, I just ran some unit tests on normal (non-rotated) pages with the detect_vertical=True and didn't seem to get much of a performance loss, so I wonder why this is not enabled by default? Can be closed though. |
I'm not sure either why False is the default. Maybe it deterroriates the quality of the output? |
Hey there guys. Thanks for spending time on that issue. For some of my PDFs I do see slight improvement if I use For this document for example page 17 takes 6.6 seconds and page 18 takes 5.71 seconds with and without the flag. Any further suggestions? Maybe I can play with other LAParams properties? |
|
I don't understand why is this issue closed. There is currently no solution to the problem. |
Hello, any other solutions ? |
@SVasilev @migliorati the issue described by @thf24 is fixed by enabling detection of vertical text boxes. I consider this issue closed because this specific question is answered. I get that this solution does not work for all PDF's and for all code. If you have performance issues with specific PDF's or if you think pdfminer.six is slow in general for some subset of all PDF's, feel free to open a new issue. |
Hi, thanks for the great project.
I noticed that the interpreter takes very long (5 minutes+ on my machine) to process a page when there is a lot of rotated text on it (90 degrees in the document, see attached).
this is the (shortened) code for reference:
rotated.pdf
The text was updated successfully, but these errors were encountered: