Skip to main content

Overview

Toffee classifies every visitor as either human or agent. It does this through two complementary systems:
  1. Client-side heuristics — 6 detector categories run in the browser
  2. Server-side ML model — a SAINT classifier that runs on accumulated behavioral data when the session ends
Both systems produce a probability. The final classification uses whichever system has the most data available.

Detectors

The SDK runs 6 categories of client-side checks:
CategoryWhat it looks for
User-AgentKnown bot/agent patterns in the user-agent string
HeadlessSignals that indicate a headless browser (missing plugins, permissions quirks, etc.)
AutomationAutomation frameworks like Puppeteer, Playwright, Selenium, Cypress
NavigatorInconsistencies in navigator properties (Client Hints mismatches, unusual hardware values)
FingerprintBrowser fingerprint anomalies (WebGL renderer, canvas behavior, extension count)
BehavioralMouse movement patterns, click behavior, scroll patterns, keystroke dynamics
Each detector contributes evidence that is combined using Bayesian fusion — a probabilistic method that produces a calibrated probability (0.0–1.0) rather than an arbitrary weighted score.

Progressive scoring

Detection isn’t a one-shot check. The SDK scores progressively as more signals become available:
PhaseWhenWhat happens
InstantPage load (t=0)User-agent, headless, automation, navigator, fingerprint checks
Early~3 secondsFirst behavioral signals (mouse movement, scrolling)
Session~10 secondsRicher behavioral patterns emerge
Extended~30 secondsHigh-confidence behavioral analysis
ContinuousEvery ~15 secondsOngoing rescoring as new events arrive
InteractionOn click/scrollImmediate rescore after user interactions
Early phases catch obvious bots (headless browsers, known automation). Later phases catch sophisticated agents that mimic human behavior.

ML classification

When a session ends, the server extracts 4 behavioral features from the session’s event stream and runs them through a SAINT classifier. The model returns a human or agent classification with confidence probabilities. The ML classification is the final word when available — it has access to the full session of behavioral data, not just what was visible at any single point in time.

Risk tiers

Every detection result includes a riskTier based on the probability:
Risk TierProbabilityInterpretation
definite-bot≥ 0.95Almost certainly automated
likely-bot≥ 0.80High confidence non-human
suspicious≥ 0.50Could go either way
likely-human≥ 0.20Probably human
definite-human< 0.20Almost certainly human
Risk TierSuggested action
definite-bot / likely-botBlock, challenge (CAPTCHA), or rate-limit
suspiciousSoft challenge, log for review
likely-human / definite-humanAllow through
const toffee = init({
  apiKey: 'YOUR_API_KEY',
  endpoint: 'https://api.toffee.at',
  onDetection: (result) => {
    switch (result.riskTier) {
      case 'definite-bot':
      case 'likely-bot':
        blockOrChallenge()
        break
      case 'suspicious':
        showCaptcha()
        break
      default:
        // allow through
    }
  },
})