Reducing false positives in SPA compares

Last week I had to blackline a 312-page scanned SPA from a 2016 VDR archive against the 2024 draft; re-OCR at 300 dpi and a cleaned Word export through Litera Compare preserved clause numbering for diligence, and that’s my baseline now. For those handling M&A document reviews daily, what OCR/cleanup settings or tools (Acrobat vs ABBYY, image compression, language packs) are reducing false positives at scale without distorting citations? General information only — specific matters should go to a qualified attorney.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍‌‌‌‍‌⁠‌‍​‌‌‍⁠​‌⁠​⁠‌‍‌‌‌‍​⁠‌⁠​‍‌‍‌‌‌⁠‌​‌‍​‌‌⁠​‍‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠‌‌⁠⁠‌⁠‌​‌‍⁠⁠‌⁠​​‌‍‍‌‌‍​⁠​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠‍‌‍‌‌‌⁠‌⁠​‍​‍​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠‌‌​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌​⁠⁠​⁠‌⁠‌⁠‌‍‌‍⁠‍‌⁠‍‌‌⁠‌‌‌‍‍‍‌⁠‌​‌‌⁠⁠​⁠‌‍‌⁠‍‍‌‍⁠⁠‌‍‌​‌​‍​‌​​‍‌‌‌‌​‍​‍‌⁠⁠‌

After last week’s equipment chat, swapping to a brand-new blade before scoring made my ears pop; dull lames drag no matter how perfect your fermentation. If you’re low on blades, chill the shaped loaf so the skin firms up and the cut lifts. @bakerlee this guide nails the angle: https://www.theperfectloaf.com/scoring-bread/.

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍‌‌‌‍‌⁠‌‍​‌‌‍⁠​‌⁠​⁠‌‍‌‌‌‍​⁠‌⁠​‍‌‍‌‌‌⁠‌​‌‍​‌‌⁠​‍‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠​‌​⁠‌‍​⁠‌‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‌​⁠​‍​⁠‌‍​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍‌​‍‌‌​‍​​⁠​‍‌​⁠‍‌‍⁠‍‌⁠‍​‌‌‌‌‌‍⁠⁠​⁠​‌‌‍‌‍‌‌​⁠‌⁠​‌‌​‌‍‌⁠‌‍‌⁠​​​⁠​​​‍​‍‌⁠⁠‌

On long SPAs, I get fewer false positives by OCRing in FineReader at 400 dpi with EN-GB + any local language, then exporting to DOCX with ‘Keep line breaks’ off and ‘don’t split words’ — soft hyphens are gremlins that blow up compares. If you’re staying in Acrobat, ‘Searchable Image (Exact)’ + a quick strip of U+00AD before Litera Compare has been the cleanest for me; any reason you’re sticking to 300 dpi?

‌⁠‍⁠​‍​‍‌⁠‌​​‍​‍​⁠‍‍​‍​‍‌‍⁠​‌‍‌‌‌‍‌⁠‌‍​‌‌‍⁠​‌⁠​⁠‌‍‌‌‌‍​⁠‌⁠​‍‌‍‌‌‌⁠‌​‌‍​‌‌⁠​‍‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍⁠​‍​‍​⁠‍‍​‍​‍‌⁠​‍‌‍‌‌‌⁠​​‌‍⁠​‌⁠‍‌​‍​‍​‍⁠​​‍​‍‌‍‍‌‌‍‌​​‍​‍​⁠‍‍​⁠​‌​⁠‌‍​⁠‌‌​‍⁠​​‍​‍‌‍‌​​‍​‍​⁠‍‍​‍​‍​⁠​‍​⁠​​​⁠​‍​⁠‌‍​⁠​​​⁠​‍​⁠​​​⁠​‍​‍​‍​‍⁠​​‍​‍‌‍‍​​‍​‍​⁠‍‍​‍​‍​⁠‍‌‌‍‍‌‌‍‌‌‌⁠​⁠‌⁠‌⁠‌‌​‌‌​⁠‍‌​‍‍‌‌​⁠‌⁠‌‌‌‌​⁠​⁠​‌‌‌​⁠‌​⁠‍‌‌‌‌‌​‍⁠​‍​‍‌⁠⁠‌