Back to browse
GitHub Repository
6 starsTypeScript

HN Bot Detector - Detects LLM-Generated Comments on Hacker News

by umairnadeem123·Mar 2, 2026·7 points·3 comments

AI Analysis

●●SolidBig BrainDark Horse

Clever n-gram TF-IDF detection of LLM paraphrases catches smart evasion; solves real HN problem but narrow use case.

Strengths
  • N-gram TF-IDF similarity threshold (0.75) for phrase detection catches paraphrases, not just regex matches—genuine evasion resistance
  • Multi-level heuristics: Unicode markers (curly quotes, em-dashes), structural patterns (3-para thesis), semantic clustering across user comments create redundant signals
  • Post-level scanner with JSON export and optional LLM verification pass shows thoughtful feature layering for different use cases
Weaknesses
  • Scope is single-platform (HN only); no generalization to Reddit, Twitter, or other forums limits reusability
  • Heuristics will drift as LLMs improve and users adapt; no clear maintenance plan or dataset versioning for recalibration
Target Audience

Hacker News moderators, comment quality researchers, bot-detection enthusiasts

Similar To

OpenAI text-davinci-003 detector (deprecated) · Turnitin plagiarism detection · Academic paper originality checkers

Similar Projects