Comments are collected twice daily through the Reddit API and results of the analysis are updated after each close. Pre-processing covers duplicates, bot commentary, daily summary and an application of regex to improve identification of ticker mentions and performance of machine learning libraries. NLTK, Textblob and Spacy are used to process stop words, tokenise, lemmatise and generate sentiment and subjectivity scoring. Vectorisation of comments occurs via TF-IDF. Thus far clustering and dimension reduction have been most successful at three clusters when subjectivity is not considered.
  • 0
  • 0
Interest Score
5
HIT Score
0.00
Domain
theerrantanalyst.com

Actual
www.theerrantanalyst.com

IP
185.199.108.153, 185.199.109.153, 185.199.110.153, 185.199.111.153

Status
OK

Category
Company
0 comments Add a comment