Each sub-metric is scored 0–10 by the LLM. Final video score is 0–100.
base = (title_similarity)×5 + (focus_ratio)×3 + (time_to_content)×2
penalty = (deception)×2 + (sponsor)×1
score = base − penalty [0–100]
Channel score = average of its evaluated video scores.
What the Metrics Mean
Title-Content Similarity (0–10)Does the video deliver what the title promises? 0 = complete bait-and-switch, 10 = precise match.
Focus % + Time to Main Content (0–10)What fraction of the video stays on topic, and how quickly it gets there. Higher = more focused, less preamble.
Deception Penalty (0–10)Whether the title makes factual claims the video contradicts or never addresses. Up to −20 points.
Sponsor Penalty (0–10)Proportion of sponsor/ad content relative to total video runtime. A 30-second ad in a 3-minute video is penalised more heavily than in a 30-minute video. Up to −10 points.
LLM Evaluation Process
Each video is evaluated using two parallel LLM calls (GPT-4o-mini):
1
Title Analysis - The LLM reads the video title and full transcript together. It identifies what the title promises in plain language, then scores title-content similarity and deception.
2
Content Analysis - The LLM analyzes the transcript with timestamps to measure focus ratio (% on-topic), time to main content, and sponsor interruption level.
For long transcripts that exceed the model’s context window, we chunk the transcript and aggregate metrics deterministically across all chunks.
Model: GPT-4o-mini. Chosen for cost, with acceptable accuracy for 0–10 scored outputs.
Video Selection
15 videos are evaluated per channel - a mix of recent uploads and all-time popular videos, so the score reflects both current behaviour and historical patterns.
YouTube Shorts are excluded.
Videos longer than 90 minutes are excluded - transcripts at that length exceed the LLM’s context limit.
If a channel’s videos are predominantly over 90 minutes, the channel is skipped entirely.
Non-English channels are excluded - evaluation accuracy depends on English transcripts.
Visually-driven channels are excluded - if the content relies primarily on visuals rather than narration, transcript-based scoring isn’t a fair measure.
Why This Exists
Our feeds overstimulate us more than ever. Exaggerated titles, sensational previews, fake urgency, fake rarity - it all works amazingly well on our monkey brains. It’s just a natural consequence of capitalism and how algorithms have evolved in rewarding our dopamine circuits quickly.
But there should be a counterforce. A platform that’s trusted and keeps content creators accountable while rewarding the honest ones.
Current Flaws
Transcripts are the only content input -visuals, tone, editing, and pacing are not evaluated.
AI scoring can be inconsistent across runs.
15 videos per channel is a small sample -outlier videos can skew a channel’s score significantly.
Only English-language channels are currently supported.
Channels are selected from each of the 10 categories based on subscriber count - channels without transcripts, primarily visual content, or focusing primarily on long-form content (>90 min) are excluded due to current limitations.
Future Plans
Re-evaluate with more capable LLMs for higher scoring accuracy (pending demand)
Thumbnail analysis (visual clickbait detection)
Chrome extension with real-time score overlays on YouTube
More channels across categories and languages
Feedback
Found a bug? Disagree with a score? Have a feature suggestion? Let me know.