Showing posts with label Ācārya Malcolm Smith. Show all posts
Showing posts with label Ācārya Malcolm Smith. Show all posts

A continuation of Compilation of Acarya Malcolm Smith's 42157 Dharmawheel Posts

Added Table of Contents.

Also added compilation of posts by Astus, Krodha (Kyle Dixon), Geoff (Jnana), Zen Master Meido Moore, all of whom John Tan and I likes.


Astus Dharmawheel Posts: https://app.box.com/s/ln2rvagp8u7xx0uytci78defdawgctsm

Krodha (Kyle Dixon) Dharmawheel Posts: https://app.box.com/s/k0frsynnhxkivdsvjiqyhvt0zc8blbsl

Geoff (Jnana) Dharmawheel Posts: https://app.box.com/s/axsh5qeue8vdy47p7enztv9oryjr5qf7

Zen Master Meido Moore Dharmawheel Posts: https://app.box.com/s/lbiudrkbiueq89y2z4xvx499d6st95y8


Also, from previous post:

Malcolm posts in 12 files (docx and pdf and table of contents provided): https://app.box.com/s/ju3gothq09bmzzpcehv045ylwegvfzaj

Malcolm posts in 3 files (docx and pdf and table of contentsprovided): https://app.box.com/s/pwn72amv07cptm1wekvc2twv3k980iiv

Malcolm posts in one file (docx and pdf and table of contents provided): https://app.box.com/s/ibii96pyxps6nlhy71pj76s5mi92qxr1



....

Update: I noticed in 2011, some of the early posts by Krodha (Kyle Dixon) may be prone to nihilism. That was before he was guided by Acarya Malcolm Smith. Hence do take note that everyone's understanding matures over time and their current understanding may not reflect what was spoken more than a decade ago. Nonetheless all of Kyle's postings are worth going through and very insightful, besides a few of those earliest postings which may be a little misleading.

Kyle spoke about this later as well, possibly in 2012, 


And to clarify, I only harp on this issue like I do because I used to carry the same view: that everything is already perfect... there's nothing to realize... there's no one here to do anything... there's no such thing as "correct" or "incorrect"... or that concepts were the enemy, and so on, and so on, and so on. All the same narratives you see being spun by most neo-nondual teachers and systems. I remember I used to argue with a friend/mentor all the time about how he doesn't get it, and he's just fooling himself with practice and so on. And I used to cite the same quotations from Longchenpa and others that were speaking from the point of view of the ultimate, and I (in my delusion) provided them as proof that I was correct etc.

Then one day that changed, and I experientially tasted what all of these masters are pointing to. And I was shown directly that I had been wrong, and that was very humbling.

That made these teachings real for me. And surprisingly, instead of continuing to reject practice, and all of these other aspects of these systems that I had previously thought to be extraneous and a waste of time... I saw their value and their place for the first time. It became clear how and why they are applied, where they fit into the scheme of things... and I saw the sheer wisdom behind the structures that I had once mistakenly rejected.

So I only speak out against those who attempt to propagate the same mistakes because I've been there. I was so certain that I was right, and that I "got it", and that others didn't understand. And I was so wrong... unbelievably wrong. 

I'm no teacher or messiah, I don't have a superiority complex or have some strange need to be "right", it's nothing like that. I simply speak out because when I see others who appear to be passionate about these teachings, making the same mistakes I made, I see myself, I can't help but to want to say "hey, it really isn't that way." And if all I accomplish is at least planting some shred of a seed of a possibility that X person may think twice and consider being open to the fact that they don't have it completely figured out, then that is good enough for me. If not, that is alright too, but at least I can say I tried......
” – Kyle Dixon

 

Kyle Dixon: Stian, Mr. J is implying that there is nothing to do, because all notions of 'anything to do', 'emptiness', 'right view', 'wrong view', 'ignorance', 'defilement' etc., are nothing more than concepts which arise and fall within the space of 'awareness' which cannot be improved upon or defiled... that is his view he is proposing. I beg to differ... to me this view is nothing more than a license for stagnation and complacency which only serves to perpetuate the issue. It is a false sense of security that one has already 'arrived' so to speak.

 

The quote applies to Mr. J, because he claims precisely what Jigme Lingpa is describing in that statement to be true, and did so directly above that quotation: Jackson's view being, nothing need be done, because all concepts (including those of the dharma such as emptiness etc.), are nothing more than thoughts which arise in what is already complete, as expressions of what is already complete. His logic therefore being, there is no need to even entertain such notions, one is already innately realized. Jigme Lingpa is stating that such a notion is an incorrect view which actually severs one from the profound dharma. Mr. J’s assertion that 'nothing needs fixin' is a view he has touted for a very long time now, it is very unskillful and misleading.

Yesterday at 1:41pm · Like

 

Mr. J: My view Kyle, is not that "nothing need be done". Bringing an end to conceptualizing is a huge task. Buddha stated "conceptualizing is a cancer". It is the sole source of samsara. If the cause of samsara ceases we only have nirvana as experience. However when that task has come to completion, then we know the space in which Nagarjuna lived along with all the masters of the Zen tradtion.

Yesterday at 1:48pm · Like

 

Kyle Dixon: Stian, Yes, right and wrong should surely be understood as a necessary and indispensable duality when it comes to the dharma. Right view is that which will lead to realization, wrong view is that which will perpetuate delusion.

 

Right and wrong are conventional as well, any conceptual structure we are implementing here is conventional.

 

'Full' can only be a conventional designation, the ultimate nature of 'full' is it's emptiness.

Yesterday at 1:51pm · Like · 1

 

Kyle Dixon: It's nothing more than a task that requires skillful recognition. At any rate though, it isn't simply a case of ending conceptualization... only conceptualization rejects conceptualization.

Yesterday at 1:53pm · Like · 1”


......

 https://www.facebook.com/groups/AwakeningToReality/posts/9212914758749872/?__cft__[0]=AZWQEcbdzmqyXZF9wPYroQntKm6ES_Y34ziixUsI-eHUJTyJqWy5wn_BOlh_dDhyN32SGEvaS9PoHfEc5NCclbPzXLHbOohuvh2Co_Jvn7RdSPpVz3L7RXcnpjbYjXO6ylDU16Zt2LZnPIgygEy9MmnKrnvueRKqWFPIYRlfkOg890SPpbjObGTWlIruFWVAxt4&__tn__=%2CO%2CP-R

Soh Wei Yu

Author

Admin

I will update to include table of content page later

Reply1w

Michael Bridge

It might be interesting to use OpenAI embeddings via Langchain to create a question/answer bot based on the content. Embeddings are a lot cheaper than fine tuning, but there would be costs involved.

Reply5d

Robert Dominik Tkanka

Joyful Ebbot 🧡👓

Reply3d

Soh Wei Yu

Author

Admin

Michael Bridge John Tan was doing coding for similar things and for the AtR gpt project. He just said “I have completed writing the program querying on books using opena, langchain and chroma....but needs lots of refinentment.

I wonder how to try the AI. The answers are weird...ahha” “It is like querying a book and talking to a book. The books answer back.”

Reply2dEdited

Soh Wei Yu

Author

Admin

Created the third program:

User's Guide for Dharmawheel Table of Contents Generator Program

Introduction

The Table of Contents Generator Program is a tool designed to process PDF documents and automatically generate a table of contents (TOC) based on the titles in each document. Each entry in the TOC includes the title and the corresponding page number(s). The TOC for each processed document is saved as a separate PDF file.

This guide provides instructions for using the Table of Contents Generator Program.

Prerequisites

.NET Runtime installed on your system. This program is built with .NET, and you will need the .NET Runtime to run it. You can download it here.

The program file (DharmawheelTableOfContentsCreator.exe or similar). You should have received this file separately.

Your PDF documents should follow a specific format where each title is prefixed with "Title: ".

Running the Program

To run the program, open a command prompt, navigate to the directory containing the program file, and enter the following command:

DharmawheelTableOfContentsCreator.exe

This will start the program, and you will see a prompt in the command line interface.

Using the Program

When you run the program, it will first ask you to make a choice:

Enter '1' to create table of contents for all files in a directory and its subdirectories, or '2' to create a table of contents for a specific file:

Enter 1 or 2 depending on your needs, and press Enter.

Option 1: Process All Files in a Directory and Its Subdirectories

If you chose 1, you will be prompted to enter the path of the directory containing the PDF files you want to process:

Enter the directory path:

Enter the path to the directory, and press Enter. The program will then count the PDF files in that directory and its subdirectories and ask for confirmation before proceeding:

This will process [number] files. Are you sure you want to proceed? (y/n)

Enter y to proceed or n to cancel, and press Enter. If you chose to proceed, the program will start processing the files one by one, providing progress updates as it goes.

Option 2: Process a Specific File

If you chose 2, you will be prompted to enter the path of the specific PDF file you want to process:

Enter the file path:

Enter the path to the file, and press Enter. The program will then process this single file.

Output

After processing each file, the program will generate a new PDF file containing the TOC. The TOC file will be saved in the same directory as the original document, with the same name followed by _TableOfContents. For example, if the original document is named MyDocument.pdf, the TOC file will be named MyDocument_TableOfContents.pdf.

The TOC consists of lines of text. Each line includes a title and the corresponding page number(s) in the format Title: [page numbers].

Troubleshooting

If the program encounters an error, it will display a message in the command line interface. Note the message and consult the program's documentation or support resources. If the problem persists, contact the program's support team with the details of the error message and what you were doing when the error occurred.

Reply3hEdited

Aaron Bohannon

It boggles my mind to think that those who have attained non-dual awareness can write so much. When I had attained a relatively stable, longer-term state of what we might call "non-conceptual awareness", I would always need at least an hour or so to fully find my way back to that state after writing or conversing for more than a few hours. I didn't use sitting meditation to do that, but I just needed time alone to shake off the pull away from the temptation to use language in my thinking. And on the whole, I wasn't very interested in using lauguage much back then. One of the reasons that I haven't attempted to reestablish my relationship with that state of mind is that these days I spend most of my time writing, so I feel as though it would be relatively futile for me to make any great effort. (Now my main interest is in the philosophy of language. 🙂 )

Reply2h

Soh Wei Yu

Author

Admin

Updated with Table of Contents:

Soh Wei Yu

Author

Admin

Generated output of Malcolm's forum posts in PDF, word and text files: https://app.box.com/s/ju3gothq09bmzzpcehv045ylwegvfzaj

Malcolm12PartsForumPosts.zip | Powered by Box

APP.BOX.COM

Malcolm12PartsForumPosts.zip | Powered by Box

Malcolm12PartsForumPosts.zip | Powered by Box

ReplyRemove Preview1w

Soh Wei Yu

Author

Admin

Warning: The following word and PDF documents are super huge. Save all your files before attempting to open them, as I cannot guarantee that opening the files will not crash your system.

It is recommended that you run the files in Malcolm12PartsForumPosts.zip instead from the previous post.

Malcolm posts in 3 files(docx and pdf provided): https://app.box.com/s/pwn72amv07cptm1wekvc2twv3k980iiv

Malcolm posts in one file (docx and pdf provided): https://app.box.com/s/ibii96pyxps6nlhy71pj76s5mi92qxr1

Warning: the one file contains about 10,000 pages of texts and will likely crash your system if you do not have enough RAM and processing power.

Malcolm12PartsForumPosts.zip | Powered by Box

APP.BOX.COM

Malcolm12PartsForumPosts.zip | Powered by Box

Malcolm12PartsForumPosts.zip | Powered by Box

ReplyRemove Preview2h

Soh Wei Yu

Author

Admin

Longest table of contents I've ever seen. 187 pages long table of contents. About the length of the power of now (191 pages).

Reply2h

 

 

Update:

 Created an app which splits Malcolm's posts into PDFs and Word documents based on keywords and themes. 

  You can now download the compilation of Malcolm's writings split up into hundreds of themes and topics here: https://drive.google.com/file/d/1eAVyuxiw4xx8qIZ3-kydO_jSKCQsktqB/view?usp=share_link


Two more programs that I created: 


DharmawheelKeywordExtractor and KeywordsCategorisedWordPDF are two console-based applications designed to work together to streamline the process of organizing and analyzing text data.

Firstly, DharmawheelKeywordExtractor is responsible for identifying and quantifying the presence of certain keywords within a collection of text files. It takes a file with specified keywords and their possible synonyms, then examines the text files in a user-defined directory to track the frequency of these keywords and their synonyms. The result of this process is a dataset highlighting the occurrence count of each keyword.

Following this, the KeywordsCategorisedWordPDF application comes into play. This program takes the output of the DharmawheelKeywordExtractor - the keywords - and further categorizes the text data from the specified directory according to these keywords. It then compiles these categorized posts into Microsoft Word and PDF documents. By doing this, KeywordsCategorisedWordPDF enhances data organization and simplifies the task of reviewing a significant volume of text data. Each output file generated contains posts that are sorted by date and grouped according to their associated keyword or synonyms.


 Update: Table of Contents now available! Also compilation from other dharmawheel forummers available at: Table of Contents for Malcolm Dharmawheel Posts + Astus, Krodha (Kyle Dixon), Geoff (Jnana), Meido Moore



Soh Wei Yu

Author
Admin
Generated output of Malcolm's forum posts in PDF, word and text files: https://app.box.com/s/ju3gothq09bmzzpcehv045ylwegvfzaj
Malcolm12PartsForumPosts.zip | Powered by Box
APP.BOX.COM
Malcolm12PartsForumPosts.zip | Powered by Box
Malcolm12PartsForumPosts.zip | Powered by Box
  • Like
  • Reply
  • Remove Preview
  • 46m
Soh Wei Yu
Author
Admin
Warning: The following word and PDF documents are super huge. Save all your files before attempting to open them, as I cannot guarantee that opening the files will not crash your system.
It is recommended that you run the files in Malcolm12PartsForumPosts.zip instead from the previous post.

Malcolm posts in 3 files(docx and pdf provided): https://app.box.com/s/pwn72amv07cptm1wekvc2twv3k980iiv
Malcolm posts in one file (docx and pdf provided): https://app.box.com/s/ibii96pyxps6nlhy71pj76s5mi92qxr1

Warning: the one file contains about 10,000 pages of texts and will likely crash your system if you do not have enough RAM and processing power.
Soh Wei Yu
Author
Admin
Next in line to compile: Astus, Krodha (Kyle Dixon), Geoff . Both John Tan and I like these authors too.
  • Like
  • Reply
  • 1m
Malcolm3PartsForumPosts.zip | Powered by Box
APP.BOX.COM
Malcolm3PartsForumPosts.zip | Powered by Box
Malcolm3PartsForumPosts.zip | Powered by Box
  • Like
  • Reply
  • Remove Preview
  • 1m


Compilation of All Dharmawheel Posts by Dzogchen teacher Acarya Malcolm Smith
So I programmed a web crawler to compile all of malcolm's almost fifty thousands of posts in dharmawheel into text files (500 per text file, then later i might combine them automatically into one big word document)
initially the plan was to produce it like
"ThreeVows wrote on Thu May 04, 2023 9:21 pm:
Prasangika Madhyamaka is the doorway to the realization of the path of seeing.
Kai lord replied to ThreeVows on Thu May 04, 2023 9:58 pm:
If that is the case, then Bhāvaviveka, Asanga, etc, wouldn't have attained Arya bodhisattvahood.
ThreeVows replied to Kai lord on Thu May 04, 2023 10:03 pm:
No, I don't think that is necessarily the case. The essential point has to be assimilated, which can be done in a non-analytical manner as well, and one can still discuss other things. Dudjom Rinpoche for instance discusses how there is basically a lower and higher mode of understanding mind-only, and the intention of Asanga and Nagarjuna are aligned in truth. But I don't know that I have the capacity to discuss this precisely at this time, so I won't try overly hard.
Malcolm replied to ThreeVows on Thu May 04, 2023 10:16 pm:
Dudjom’s arguments about an inner madhyanaka are not convincing, partially predicated as they are on a mangled citation from Bhavaviveka that does not mean what he takes it to mean"
But then i realised it gets a little complex due to the way the html is structured, and i'm a little lazy
So now i'm just outputting it into this format:
"Author: Malcolm
Date: Thu May 04, 2023 9:40 pm
Title: Re: Clarification on Nyingma view of emptiness
Content:
Agreed, so is the second, but the Buddha had to repeat himself because folks were not getting it. However not every sutra called “third turning” can be regarded as definitive."
If anyone wants to work on the code to improve it to include the original format, i can send you my code in c# and in python.
I will upload Malcolm's compilation once its done. Currently at 1000 post out of 50k. I set the timer to wait 2 seconds per post to avoid getting banned.
Reason i made this: just like compiling kyle dixon's posts from reddit, for my listening while i'm driving (text to speech on iphone) https://www.awakeningtoreality.com/.../a-compilation-of... . And for sharing of course. And ease of reference (can search faster). In 2019 when I met Acarya Malcolm in California he asked me if I had any backup or records of his E-Sangha forum posts. He even said we could collaborate (perhaps to help him out with compilation?). E-Sangha was the primary Buddhist forum before Dharmawheel but it was hacked and all postings were just gone. Unfortunately, I don't have any records except maybe a few, hardly any.
This time we can save all posts.


6 comments
Like
Comment
Send

Comments

All comments

  • Soh Wei Yu
    Author
    Admin
    And of course I can also easily compile other dharmawheel members by changing the target member name
  • Soh Wei Yu
    Author
    Admin
    not exactly web crawler but a scraping software
  • Soh Wei Yu
    Author
    Admin
    User guides for the programs:
    Introduction
    Dharmawheel Post Scraper is a console application written in C# that logs into the Dharmawheel forum (https://dharmawheel.net
    ), collects all the posts written by a specific author, and saves them to text files. The program uses the AngleSharp library for web scraping and the RestSharp library for handling HTTP requests.
    Purpose
    The purpose of this program is to gather all posts by a specified author on the Dharmawheel forum and save them to text files, with a maximum of 500 posts per file.
    Prerequisites
    • .NET SDK 5.0 or higher installed on your system
    • NuGet packages:
    • AngleSharp
    • RestSharp
    How to Operate
    1. Open your preferred text editor or IDE and create a new C# console application project.
    2. Replace the default Program.cs file with the provided code.
    3. Install the required NuGet packages (AngleSharp and RestSharp).
    4. Update the loginData object with your Dharmawheel forum credentials:
    csharpCopy code
    var loginData = new { username = "your_username", password = "your_password", login = "Login", redirect = "index.php" };
    5. Compile and run the program. It will log in to the Dharmawheel forum, scrape the specified author's posts, and save them to text files named AuthorName_posts_X.txt, where X is the file number.
    Understanding the Code
    • BaseUrl: The base URL of the Dharmawheel forum.
    • AuthorName: The name of the author whose posts you want to collect.
    • AuthorId: The ID of the author whose posts you want to collect.
    • CurrentPage: The current page of search results being processed.
    • FileCounter: The current file number being saved.
    • CollectedPosts: A list containing the collected posts.
    The Main method:
    1. Logs into the Dharmawheel forum.
    2. Iterates through the search result pages containing the author's posts.
    3. Scrapes the relevant information from each post.
    4. Saves the collected posts to text files, with a maximum of 500 posts per file.
    The CreateRestClient method:
    • Creates and configures a RestSharp RestClient instance.
    The SaveToFile method:
    • Saves the collected posts to a text file.
    The GetCorrectPost method:
    • Retrieves the correct post from the post page.
    The FormatPostContent method:
    • Formats the post content by removing HTML tags and unnecessary content.
    The GetQuoteLevel method:
    • Determines the quote level of a given element.
    Notes
    • The program may take some time to run, depending on the number of posts to be collected.
    • Make sure to replace the sample login credentials with your own Dharmawheel forum credentials before running the program.
    • The program uses a delay of 2 seconds between processing search result pages to avoid overwhelming the server.
    DHARMAWHEEL.NET
    Dharma Wheel - Buddhism Discussion Forum
    Dharma Wheel - Buddhism Discussion Forum
    • Like
    • Reply
    • Remove Preview
    • 50m
  • Soh Wei Yu
    Author
    Admin
    Text To Word Conversation App:
    Purpose
    This program is designed to combine multiple text files containing forum posts into a single Word document (.docx) and PDF file. The program can process a single folder or multiple subfolders, and you can choose to sort the files in ascending or descending order.
    How to use
    1. Ensure you have Microsoft Word installed on your computer, as this program relies on Microsoft.Office.Interop.Word to convert Word documents to PDF files.
    2. Compile and run the program.
    3. Follow the on-screen prompts to operate the program:
    a. Enter the folder path containing the text files you want to combine.
    b. Choose whether you want to process each subdirectory within the parent folder. Enter "y" to process subdirectories and "n" to process only the parent folder.
    c. If you chose to process only the parent folder, you will be prompted to specify the order of files. Enter "A" for ascending order (starting with the smallest file number) or "D" for descending order (starting with the largest file number).
    4. The program will then process the specified folder(s) and create a Word document and a PDF file in each folder. The newly created files will be named in the format "MalcolmCombinedPosts_SortByAscendingDate_X.docx" or "MalcolmCombinedPosts_SortByDescendingDate_X.docx" (and their corresponding PDF files), where X is a unique number.
    Important notes
    • The program expects text files to be named in the format "Malcolm_posts_#.txt", where # is a number. Ensure your text files follow this naming convention.
    • The program sorts the entries in each text file by date. Make sure the entries follow the format "Author: ...", "Date: ...", and so on, with each field on a new line.
    • The program uses the default paragraph style with custom spacing settings to adjust the line spacing between entries in the output Word document.
    • The program requires Microsoft Word to be installed on your computer to convert Word documents to PDF files. If Word is not installed, the program will not be able to create PDF files.




p.s. does anyone want to volunteer their time to work on this?

Soh to ChatGPT:
now, you understand that Dharmawheel Post Scraper extracts all the posts into text files right. how do you propose that we can sort them by topic? something like, get ai to understand the content, tag by topic, then sort. or how do you suggest to do it. like for example, posts about rigpa and what it means are being automatically placed into one text file... etc

ChatGPT
To sort the collected posts by topic, you can use a combination of natural language processing (NLP) techniques and machine learning models like topic modeling. One such popular topic modeling technique is Latent Dirichlet Allocation (LDA). Here's a step-by-step guide on how to achieve this:

Preprocess the collected posts:

Remove any unnecessary information such as author name, date, etc., and retain only the content of the posts.
Tokenize the text: Split the post content into individual words.
Remove stop words: Common words that don't carry much meaning (e.g., "the", "and", "in").
Perform stemming or lemmatization: Reduce words to their base forms (e.g., "running" becomes "run").
Create a document-term matrix:

Convert the preprocessed text into a numerical representation using techniques like Bag of Words (BoW) or Term Frequency-Inverse Document Frequency (TF-IDF). This will result in a matrix where rows represent the posts and columns represent the terms (words).
Apply the LDA model:

Choose the number of topics you want to extract. This might require some experimentation to find the optimal number of topics.
Train the LDA model on the document-term matrix.
The LDA model will generate topic distributions for each post, which can be used to assign a primary topic to each post.
Sort the posts by topic:



Create a separate text file for each topic.
Iterate through the posts, determine their primary topic based on the LDA-generated topic distributions, and append each post to the corresponding topic text file.
Here's an outline of how you can integrate this process into your existing code:

After collecting all the posts, preprocess the text using an NLP library like NLTK (for Python) or Stanford NLP (for C#/.NET).
Create a document-term matrix using a library like scikit-learn (for Python) or Accord.NET (for C#/.NET).
Train an LDA model on the document-term matrix using a library like Gensim (for Python) or Accord.NET (for C#/.NET).
Modify the SaveToFile method to accept an additional parameter for the topic and create separate text files for each topic.
When saving the posts, use the LDA-generated topic distributions to determine the primary topic and save each post to the corresponding topic text file.
Keep in mind that this approach may not always produce perfect results, as topic modeling can be sensitive to the choice of parameters and the quality of the input data. You may need to experiment with different settings and preprocessing techniques to achieve satisfactory results.





User
how many mandays are needed to accomplish that
ChatGPT
The number of mandays required to accomplish this task depends on several factors, including the developer's expertise, familiarity with the required technologies, and the complexity of the dataset. Here's a rough estimate of the time required for each step of the process:

Preprocessing the text: 1-2 mandays

This step involves researching and implementing the appropriate NLP techniques to clean and preprocess the text data.
Creating the document-term matrix: 1 manday

This step involves converting the preprocessed text into a numerical representation using an appropriate library.
Training the LDA model: 1-2 mandays

This step includes researching the LDA model, selecting an appropriate library, and experimenting with different parameters to find the optimal number of topics.
Sorting the posts by topic and modifying the SaveToFile method: 1 manday

This step involves modifying the existing code to save the posts to separate text files based on their assigned topics.
Testing and refining the model: 2-3 mandays

This step includes testing the model on the dataset, evaluating its performance, and fine-tuning the preprocessing techniques and model parameters to achieve better results.
In total, this task might take approximately 6-9 mandays to accomplish. However, this is a rough estimate and may vary depending on the developer's skill level and experience with the required technologies. Additionally, unforeseen challenges or complications might arise, which could also affect the time required to complete the task.





User
is there a pre existing software that can do this so i dont have to develop a new software or do so much work
ChatGPT
Yes, there are pre-existing tools and libraries that can help you analyze and categorize the text data without developing a new software from scratch. Two popular Python libraries for natural language processing and topic modeling are Gensim and Scikit-learn.

Here's a high-level overview of how you can use these libraries to categorize the text data:

Preprocess the text data: You can use the Natural Language Toolkit (NLTK) library to tokenize, remove stopwords, and stem/lemmatize the text data.

Create a document-term matrix: You can use the Gensim library to convert the preprocessed text into a bag-of-words representation or use Scikit-learn's TfidfVectorizer to create a TF-IDF representation.

Train a topic model: Gensim provides the Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA) models, while Scikit-learn provides the LDA model. You can choose the appropriate model and train it on your document-term matrix.

Assign topics to the text: Once the model is trained, you can use it to assign topics to each post. Based on the assigned topics, you can then sort the posts and save them to separate text files.

There are many tutorials and resources available online that can guide you through the process of using these libraries for topic modeling. By leveraging these pre-existing tools, you can significantly reduce the development time and effort required to categorize the text data.





User
and how many mandays if we use those tools? what is the simplest path of least resistance and shortest mandays required to accomplish my tasks. can use external tools, or not. just tell me what solutions
ChatGPT
Using the pre-existing tools mentioned above will considerably shorten the time required to accomplish your task. The total number of mandays needed depends on your familiarity with these tools and your programming skills. However, I will provide a rough estimate for each step:

Preprocess the text data (0.5-1 day): You'll need to familiarize yourself with the NLTK library and preprocess the text data accordingly.

Create a document-term matrix (0.5-1 day): Learn how to use Gensim or Scikit-learn to create a bag-of-words or TF-IDF representation of your text data.

Train a topic model (1-2 days): Understand how to use Gensim or Scikit-learn's LDA/LSA models, choose the appropriate model, and train it on your document-term matrix.

Assign topics to the text and sort them (0.5-1 day): Once you have a trained model, use it to assign topics to each post, sort them based on their assigned topics, and save them to separate text files.

Considering the estimates above, you may need approximately 3-5 mandays to complete your task using pre-existing tools. These estimates assume that you have some programming experience and will need time to learn how to use these libraries effectively.

If you are not familiar with programming or prefer a more user-friendly approach, you can consider using topic modeling tools with graphical user interfaces (GUIs) like Orange or RapidMiner. These tools might be easier to use but may not provide the same level of customization as using Python libraries directly.

In any case, the simplest path of least resistance would be to use pre-existing tools like Gensim, Scikit-learn, or GUI-based tools to minimize the development time and effort required to accomplish your task.


......



Dharmawheel Post Scraper User Guide: Listening to PDFs on iPhone and Android

This user guide will provide you with instructions on how to download and listen to the PDF files generated by the Dharmawheel Post Scraper on your iPhone or Android device. The text-to-speech feature allows you to listen to the content while traveling, driving, or performing other tasks.

For iPhone Users:

  1. Download the PDF files: a. Open the Safari browser on your iPhone. b. Navigate to the box.com link provided, which contains the zip file with the PDF files. c. Tap on the zip file to download it. d. Once the download is complete, tap on the zip file to extract its contents. The PDF files will be visible in the Files app.

  2. Add the PDF files to the Books app: a. Open the Files app on your iPhone. b. Locate the folder containing the extracted PDF files. c. Tap on the "Select" button in the top-right corner. d. Select all the PDF files you want to add to the Books app. e. Tap on the "Share" button at the bottom-left corner. f. Scroll through the list of apps and select "Books." The PDF files will now be added to your Books library.

  3. Use the speech control panel to listen to the PDF files: a. Open the Settings app on your iPhone. b. Scroll down and tap on "Accessibility." c. Tap on "Spoken Content." d. Toggle on "Speak Screen." e. Open the Books app and navigate to the PDF file you want to listen to. f. Swipe down from the top of the screen with two fingers to activate the speech control panel. Your iPhone will start reading the text from the PDF file aloud.

For Android Users:

  1. Download the PDF files: a. Open the Chrome browser on your Android device. b. Navigate to the box.com link provided, which contains the zip file with the PDF files. c. Tap on the zip file to download it. d. Once the download is complete, use a file manager app to locate the downloaded zip file. e. Tap on the zip file and select "Extract" to extract its contents. The PDF files will be visible in the file manager app.

  2. Add the PDF files to a PDF reader app (e.g., Adobe Acrobat Reader, Foxit PDF Reader, etc.): a. Open the file manager app on your Android device. b. Locate the folder containing the extracted PDF files. c. Tap on a PDF file to open it. Your Android device may prompt you to select a PDF reader app to open the file. Choose your preferred app and select "Always" to set it as the default app for opening PDF files.

  3. Use a text-to-speech app to listen to the PDF files: a. Download a text-to-speech app from the Google Play Store (e.g., Voice Aloud Reader, Talk Free, etc.). b. Open the text-to-speech app and grant any necessary permissions. c. Use the app's file browser to locate and open the PDF file you want to listen to. The app will start reading the text from the PDF file aloud.

You can now listen to the content of the Dharmawheel Post Scraper PDFs on your iPhone or Android device while traveling, driving, or performing other tasks.





Update: Table of Contents now available! Also compilation from other dharmawheel forummers available at: Table of Contents for Malcolm Dharmawheel Posts + Astus, Krodha (Kyle Dixon), Geoff (Jnana), Meido Moore



Astus Dharmawheel Posts: https://app.box.com/s/ln2rvagp8u7xx0uytci78defdawgctsm

Krodha (Kyle Dixon) Dharmawheel Posts: https://app.box.com/s/k0frsynnhxkivdsvjiqyhvt0zc8blbsl

Geoff (Jnana) Dharmawheel Posts: https://app.box.com/s/axsh5qeue8vdy47p7enztv9oryjr5qf7

Zen Master Meido Moore Dharmawheel Posts: https://app.box.com/s/lbiudrkbiueq89y2z4xvx499d6st95y8


Also, from previous post:

Malcolm posts in 12 files (docx and pdf and table of contents provided): https://app.box.com/s/ju3gothq09bmzzpcehv045ylwegvfzaj

Malcolm posts in 3 files (docx and pdf and table of contentsprovided): https://app.box.com/s/pwn72amv07cptm1wekvc2twv3k980iiv

Malcolm posts in one file (docx and pdf and table of contents provided): https://app.box.com/s/ibii96pyxps6nlhy71pj76s5mi92qxr1