Good news regarding the DharmaWheel scraper (see: Table of Contents for Malcolm Dharmawheel Posts + Astus, Krodha (Kyle Dixon), Geoff (Jnana), Meido Moore):

 
The issue with handling nested quotations in forum posts has been successfully resolved. It now accurately captures all layers of quotations, rather than only the final portion. The scraper correctly processes multiple layers of quoted text, maintains proper speaker attribution, and handles formatting inconsistencies. I have re-compiled the DharmaWheel forum posts of Acarya Malcolm Smith, Krodha (Kyle Dixon), and Astus.
 
Several months ago, someone requested that I address this issue, but I procrastinated and have unfortunately forgotten who it was. I’d like to reconnect with that person and share the results on the blog for anyone else who might find it helpful.

 

Uploaded updated versions yesterday:

Malcolm posts in 12 files (docx and pdf and table of contents provided): https://app.box.com/s/ju3gothq09bmzzpcehv045ylwegvfzaj

Malcolm posts in 3 files (docx and pdf and table of contentsprovided): https://app.box.com/s/pwn72amv07cptm1wekvc2twv3k980iiv

Malcolm posts in one file (docx and pdf and table of contents provided): https://app.box.com/s/ibii96pyxps6nlhy71pj76s5mi92qxr1

Krodha (Kyle Dixon) Dharmawheel Posts: https://app.box.com/s/k0frsynnhxkivdsvjiqyhvt0zc8blbsl

Astus Dharmawheel Posts: https://app.box.com/s/ln2rvagp8u7xx0uytci78defdawgctsm

 

 

I have also updated the code to GitHub. (see: Table of Contents for Malcolm Dharmawheel Posts + Astus, Krodha (Kyle Dixon), Geoff (Jnana), Meido Moore)

0 Responses