* This is applied to two global variables in environ.h.
* The VS C++ compiler mangles the names of extern variables, so this
guarantees that these global variables are found unmangled.
* Use of Pixaa in jbig2enc library can exceed the 1M Pixa limit under certain
conditions for very large books. Each Pixa represents a separate
character class.
* The reason for the very large number of character classes is that we
recently increased the correlation threshold from 0.85 to 0.92 to
avoid confusion errors with very small cyrillic characters.
It is likely that with the threshold at 0.92, each Pixa in the Pixaa
array has either one or a very small number of bitmaps in the class.
* This caused an innocuous error message attempting to access invalid index
* Clean up notes in pixFillMapHoles() and in pixGetBackgroundGrayMap()
* Improve documentation in prog/croppdf.c and prog/compresspdf.c
* New parameter 'minw' is roughly the minimum size textblock to be kept
* The function is tuned up:
MinDistFromPeak is reduced to 30 (allow closer text lines to be found)
PeakThresholdRatio is increased to 80 (allow smaller blocks to be found)
* 2 more test cases added to prog/baseline_reg
* This is in relation to Issue #766.
* If no textbox is found, we do not know the end points of the baseline.
It is almost certainly very short, so it is removed from output.
* Change order of operation: for each baseline, save all textboxes that
describe text at that y-location. There can be multiple textboxes
for each baseline if the line of text has large horizontal breaks.
* As a result of this change, all reported baselines have x-value
endpoints of text that can optionally be returned.
* Remove bogus textblocks that are really just part of a real textblock
but have very small height and are above or below the actual textblock.
* Continue to allow more than one textbox for each baseline.
This is because large gaps between textblocks in a line make it
difficult to join safely.
* Could also add a minimal vertical closing (c1.2) to filter in order to
join bogus textboxes; not done yet because it may not be necessary.
This fixes warnings from clang:
warning: implicit conversion increases floating-point precision: 'l_float32' (aka 'float') to 'double'
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes warnings from the clang compiler:
../src/bmpio.c:169:37: warning: taking address of packed member 'bmpih' of class or structure 'BMP_HEADER' may result in an unaligned pointer value [-Waddress-of-packed-member]
169 | compression = convertOnBigEnd32(bmpih->biCompression);
| ^~~~~
../src/bmpio.c:147:17: note: expanded from macro 'bmpih'
147 | #define bmpih (&bmph->bmpih)
| ^~~~~~~~~~~
../src/bmpio.c:186:31: warning: taking address of packed member 'bmpih' of class or structure 'BMP_HEADER' may result in an unaligned pointer value [-Waddress-of-packed-member]
186 | width = convertOnBigEnd32(bmpih->biWidth);
| ^~~~~
../src/bmpio.c:147:17: note: expanded from macro 'bmpih'
147 | #define bmpih (&bmph->bmpih)
| ^~~~~~~~~~~
[...]
Signed-off-by: Stefan Weil <sw@weilnetz.de>
This fixes a bug reported by the clang compiler on macOS:
CC scale2.lo
src/scale2.c:257:28: warning: floating-point comparison is always false; constant cannot be represented exactly in type 'float' [-Wliteral-range]
257 | } else if (scalefactor == 0.16667) {
| ~~~~~~~~~~~ ^ ~~~~~~~
Signed-off-by: Stefan Weil <sw@weilnetz.de>
* When there is an oversized media box, the pdftoppm renderer can get
confused about the image size and generate a larger image that
is embedded in a black background.
* Added a test to misctest2.c for this functionality.
* This implements a function used in several programs, that takes a
set of input pdf files that are assumed to be page images,
and renders every page in a temp directory at a requested resolution.
* Modified prog/cleanpdf to use this function; checked with previous
implementation; used valgrind.
* The new implementation is better because it writes the output files
in a temp directory that is cleaned out with each invocation of
the function. In the previous implementation, it was necessary to
remove previously rendered files by hand.
* The new implementation allows output images to be rendered at
resolutions between 50 and 300 ppi, independent of the actual
resolution of the input images wrapped in the pdf files. This is
done by assuming the input pages are 612 x 792 printer points.
* pdf files generated by applications like cleanpdf, that use this
function, will print normally on 8.5 x 11 inch paper.
* This allows filling to the full width of 8.5 x 11 or A4 paper when
the h/w ratio is too large to do this without extra horizontal stretching.
* This is not needed for displaying the pdf, because the viewers display
the image as it is, without consideration of printing.
* This is from the left and right sides, and is invoked in prog/croppdf
by using a negative input paramater for edgeclean.
* New prog/misctest2.c to have page cropping and cleaning examples,
with two sample music notation images. Existing cropping and cleaning
examples have been removed from prog/misctest1.c.