OCR Tools for Uni and Research Notekeeping

its_me_xiphos@beehaw.org · 11 months ago

OCR Tools for Uni and Research Notekeeping

Hundun@beehaw.org · 11 months ago

Handwriting has been proven to enhance learning in humans, so you are doing great by keeping the habit!

I don’t have much to recommend, but so far this little tool was very useful for me and my math studies: https://github.com/lukas-blecher/LaTeX-OCR

I am not a student, but I learn like a student all the time. I also enjoy handwriting (got an e-ink tablet for that) and knowledge management. I am often dreaming of a “perfect setup” where all I write gets pushed automatically through OCR into my knowledge vault (Obsidian, Logseq or whatever I/my peers happen to use). Even came up with a plan. I hope this new year will leave me enough energy to execute something useful.

Would you like to collaborate on that perhaps?

its_me_xiphos@beehaw.org · 11 months ago

I appreciate your answer; give me time to research what you’ve laid before me and get back to you. Feel free to ping me (PM? DM? I’m new here) to discuss further. I am starting from 0 on this one, including having little to no knowledge of Obsidian beyond “Hey, that exists.”

Samsy · 11 months ago

I work in a digitalisation environment, we use OCR in different ways, sometimes with tesseract and sometimes with adobe. Both are differently effective. Tesseract needs training and adobe has mostly a propetary better recognition. Handwriting is mostly a special part which needs manual control.

In my private environment I use a mix with paperless-ngx (which only does tesseract-ocr if it doesn’t is already OCR recognised). Paperless is able to change and export the output of the PDFs in a json database which I partly convert to trilium (a database based notebook).

Didn’t found a better solution yet and it isn’t mostly not handwritten.

its_me_xiphos@beehaw.org · 11 months ago

I have some reading and learning to do, and I appreciate your reply.

its_me_xiphos@beehaw.org · 11 months ago

Thank for for the great responses so far. I’ve encountered some limitations due to university provided laptop (Power/OS of Windows 11) and my own coding inexperience. However, I am exploring a setup that employs Docker and Paperless NGX. I’ve yet to upload hand written notes in PDF format, but as captured via a phone camera the OCR is abysmal. For typed PDF, the OCR is perfect. It parsed through, with no errors, a 100 page contract document and provided the text for import into an analytical program.