Lead scoring: the complete guide
A complete, practical guide to lead scoring with forms — what it is, why it matters, how to build rule-based, AI-powered, and hybrid scoring models, and how to route qualified leads to the right place. Includes industry playbooks, benchmarks, and best practices.
What is lead scoring?
Lead scoring is the practice of assigning a numeric value — usually 0 to 100 — to every inbound lead to represent how likely they are to become a paying customer. The score compresses everything a form can collect (who the person is, what company they work for, what they need, how urgently they need it) into a single number that a sales team can act on without reading every submission line-by-line.
In practice, a lead scoring model answers three questions on every submission:
- Fit— does this person's company profile match our ideal customer profile (ICP)?
- Interest — how serious is their intent to buy, and how soon?
- Viability — can they actually afford and adopt the product?
The result is routed: high scores go to sales for immediate outreach; medium scores enter nurture sequences; low scores are either filtered out entirely or pointed at self-serve resources. Scoring moves qualification out of the human and into the form, which is dramatically more scalable, more consistent, and more measurable than manual triage.
Lead scoring vs. lead grading
Older marketing platforms distinguished two separate numbers:
- Lead score — measured interest (pages viewed, emails opened, forms filled). Behavior-driven, changes over time.
- Lead grade — measured fit (firmographic match to ICP). Mostly static once the company is identified.
Modern tools — including AnimationFunnel — merge both into a single composite lead score. Keeping two numbers added process without meaningfully improving decisions; sales reps want one prioritized queue, not a grid of A/B/C × 1 through 100. The signals that used to go into "grade" (role, team size, industry, budget) become weighted components of the score itself.
Classic scoring frameworks
Lead scoring predates software by decades. Most modern rubrics descend from one of the following sales-qualification frameworks:
- BANT — Budget, Authority, Need, Timeline. The oldest and still the most widely used. Every form field should ultimately inform one of these four dimensions.
- MEDDIC / MEDDPICC— Metrics, Economic buyer, Decision criteria, Decision process, (Paper process), Identify pain, Champion, Competition. Heavy enterprise framework; more than a form can capture on its own, but the "Economic buyer" and "Identify pain" dimensions map cleanly to form fields.
- CHAMP — Challenges, Authority, Money, Prioritization. A reordering of BANT that leads with the pain point, which tends to produce more honest answers in self-serve forms.
- GPCTBA/C&I— Goals, Plans, Challenges, Timeline, Budget, Authority / Consequences & Implications. A HubSpot-popularized framework for consultative SaaS sales.
- FAINT — Funds, Authority, Interest, Need, Timing. A budget-first reordering, popular with high-ticket services.
You don't need to pick one. Most good scoring models are eclectic — they borrow the "Budget" and "Authority" axes from BANT, the "Identify pain" axis from MEDDIC, and add product-specific signals that none of the frameworks anticipated.
MQL, SQL, SAL & PQL — where scoring fits
Scoring powers the handoff between marketing and sales. The standard stage model looks like this:
- Raw lead — anyone who completed the form. No qualification yet.
- MQL (Marketing Qualified Lead) — score ≥ 40. Enters nurture email sequences, retargeting audiences, and educational content drip campaigns. Still owned by marketing.
- SQL (Sales Qualified Lead) — score ≥ 70. Enters the CRM with an owner assigned and an SLA clock. Owned by sales.
- SAL (Sales Accepted Lead) — an SQL that a sales rep has formally accepted as workable after reviewing. Captured as a CRM stage transition.
- Hot / P0 lead — score ≥ 85. Routed to a real-time channel (Slack, SMS) for immediate follow-up within a 15-minute SLA.
- PQL (Product Qualified Lead) — a lead scored based on in-app usage signals, not form answers. Out of scope for a form scoring model, but AnimationFunnel can forward the form score to a CRM where it blends with PQL data.
Types of lead scoring
There are four dominant approaches in the market today. Most platforms use a mix:
- Demographic / firmographic scoring — scores based on who the lead is (role, company size, industry, country). Deterministic, easy to explain.
- Behavioral scoring — scores based on what the lead does (pages visited, emails opened, resources downloaded). Requires tracking infrastructure beyond the form.
- Predictive scoring — an ML model trained on historical close/won data predicts likelihood to convert. Requires a large labeled dataset and is hard to explain to stakeholders.
- Generative AI scoring— a large language model reads the submission and returns a score with natural language reasoning. Covers free-text inputs that the other three approaches can't handle. The newest approach, and what AnimationFunnel's AI mode uses.
AnimationFunnel focuses on demographic + generative AI, which together cover 90% of what a form can see. Behavioral and predictive approaches require longitudinal data that's better handled inside a CRM or analytics platform downstream.
A brief history of lead scoring
The evolution of lead scoring over the last two decades tells the story of sales operations itself:
- 2000s — Manual qualification. A sales development rep (SDR) reads every inbound form and decides who to call. Cheap at low volume, catastrophic at scale.
- 2005–2015 — Rule-based scoring.Platforms like Marketo, Eloqua, and HubSpot introduce numeric rules: "+10 points for VP title, +20 for visiting pricing page". Scalable but brittle; required a full-time marketing operations person to maintain.
- 2015–2023 — Predictive scoring. Tools like Einstein (Salesforce), 6sense, and Madkudu bring classical ML. Good at pattern-matching but require a huge historical dataset and produce black-box outputs that are hard to justify in deal reviews.
- 2024 onward — Generative AI scoring. LLMs read the full context of a submission — including free-text — and return scores with natural-language reasoning. Small teams without a data scientist can now run a qualitatively-better model than a mid-market company could five years ago.
AnimationFunnel was designed from the start around the generative-AI era: the form, the scoring model, and the routing engine all live in the same product so there's nothing to integrate.
Why lead scoring matters
Lead scoring is one of the highest-ROI changes a sales organization can make. The benefits compound across three dimensions: sales efficiency, marketing ROI, and customer experience.
Sales efficiency
In most B2B funnels, 60–80% of inbound leads will never buy. They are students, job seekers, competitors, agencies pitching services, or companies too small to afford the product. A fully-loaded sales rep costs $80–$200 per hour. Ten minutes of manual triage per submission, multiplied by hundreds of submissions per week, is a full SDR headcount wasted on sorting rather than selling.
Scoring flips that economy. The form runs the first pass in milliseconds; reps only ever see leads that have been pre-qualified. Typical outcomes after rolling out scoring on a mid-market SaaS funnel:
- AE time-to-first-touch for hot leads drops from hours to seconds. An ending-screen Calendly embed plus a Slack ping means the AE can be on a call before the visitor has closed the tab.
- Sales calls per rep per week drop 40–70%. Pipeline value typically grows, because every call is on a qualified account.
- Marketing CAC is measurable per tier. The cost of a hot lead is stable; the cost of the junk is separately measurable and can be used to defend or kill a channel.
- Rep burnout drops. Reps who spend the day on qualified accounts report materially higher job satisfaction than reps who spend half the day reading forms.
The lead response time effect
Decades of research (famously the Harvard Business Review study on 1.25M leads) show that contacting a lead within 5 minutes of their form submission makes them 100× more likely to convert than contacting them after 30 minutes. By 24 hours, the conversion rate is a fraction of the 5-minute baseline.
Lead scoring is the only practical way to hit a 5-minute SLA at scale. Without scoring, reps read every form and respond in order; with scoring, the hot 15% gets instant routing and the rest is handled asynchronously. Even if the absolute hot-lead volume is the same, scoring shifts where rep time is spent toward the moments where it matters most.
Marketing ROI & channel decisions
With raw lead counts, every channel looks similar — you get leads, the sales team complains they're bad, and nobody can prove anything. With scored leads, channel-level quality becomes measurable:
- Cost per hot lead per channel (not just cost per lead). A channel with $200 CPL but 40% hot rate is dramatically better than $50 CPL at 5% hot rate.
- UTM-sliced score distribution. Facebook leads might skew cold; Google Search leads might skew hot; a specific content asset might punch above its weight.
- Creative performance beyond clicks. Two ads with the same CTR can produce wildly different score distributions. Scoring lets you kill the visually-impressive creative that brings in junk traffic.
Customer experience
Scoring improves the visitor's experience too. The worst form flow is a hot lead being told "someone will be in touch within 48 hours", followed by silence for three days. With tier-based ending-screen routing, hot leads can book a call immediately, warm leads get a personalized next step, and cold leads get actually-useful self-serve content instead of a false promise.
How lead scoring works in AnimationFunnel
Scoring in AnimationFunnel runs server-side the moment a submission is marked complete. It's a non-blocking pipeline: the visitor sees the thank-you screen instantly while scoring runs in the background and writes its result back to the submission record.
The scoring lifecycle
- Submission received. The visitor reaches the final step; the submission is written to the primary table with status
completed. - Ending screen renders. The visitor sees the thank-you screen immediately — scoring does not add latency to the user experience.
- Scoring job enqueued. A background worker picks up the submission. Jobs are idempotent by submission ID, so retries never double-score.
- Rules evaluated. Rule-based scoring runs first (fast, deterministic). Matching rules are collected with their point contributions and reasons.
- Early-exit check. If rules produced a hard disqualifier gate, the pipeline skips AI scoring to save cost.
- AI evaluated (if enabled). The submission is sent to the configured OpenRouter model with the scoring prompt. Response JSON is parsed and validated against the schema, with up to two retries on invalid output.
- Combined & gated. The combiner merges rule and AI scores using the chosen strategy; gates apply overrides.
- Persisted.
_score,_tier,_score_reason,_scored_at,_score_version, and a full_score_auditblob are written to the submission. - Integrations fire. Webhooks, Pipedrive, Google Sheets, and Slack notifications run — all with the score included in the payload.
Performance & cost characteristics
- Rule-only scoring — 10–50ms per submission, zero marginal cost. Runs in the same worker that processed the submission.
- AI-only scoring — 1–3 seconds per submission depending on model and response length. Cost varies: $0.005–0.03 per submission on Claude Opus 4.7 or GPT-4o; under $0.001 on Haiku or GPT-4o-mini.
- Hybrid — the rule pass also acts as a cheap pre-filter. If a submission hits a hard-cold gate from rules alone, AI scoring is skipped entirely and the API call cost is saved.
- Throughput — up to 10 requests per second per workspace to OpenRouter. Higher bursts are queued and processed within seconds; there is no backpressure on form submissions themselves.
Failure handling
Scoring is designed to never block data capture. If the AI call fails (OpenRouter unavailable, timeout, invalid JSON after retries), the submission is still saved and downstream integrations still fire — the score field is simply left null and the submission is tagged _score_status: failed. Bulk re-scoring is available from Submissions → Bulk actions → Re-score once the underlying issue is resolved.
Scoring methods
There are three ways to score in AnimationFunnel. They are not mutually exclusive — most teams start with one and graduate up the ladder as the form matures and traffic grows.
Which method should you start with?
A rough decision tree:
- Form only asks structured questions (dropdowns, numbers, multi-select) → rule-based. AI would add latency and cost without meaningful accuracy lift.
- Form asks one or two free-text qualification questions ("tell us about your use case") → hybrid. Rules for structured fields, AI for free-text.
- Form is mostly free-text or conversational (open-ended discovery, unstructured interviews) → AI-based. Rules cannot do much here.
- You need a model you can explain to stakeholders → rule-based or hybrid with a rule-dominant weight. AI scoring is harder to justify in deal review.
- You run < 100 submissions per month → rule-based. AI cost is negligible, but the simplicity wins at low volume.
- You run > 50,000 submissions per month → rule-dominant hybrid. AI scoring only when a rule pre-filter confirms fit, to control cost.
What to score on — signal taxonomy
The quality of a scoring model is a function of the signals it has access to. A model with five strong signals will outperform a model with twenty weak ones. Before building rules or prompts, map every form field to one of the categories below — the strongest signals should drive the biggest weights.
Firmographic signals
Attributes of the company the lead works for. Typically the single highest-signal category for B2B.
- Company size (headcount, revenue band)
- Industry / vertical
- Country / region
- Year founded (stage)
- Public vs. private
- Parent company / subsidiary
Demographic signals
Attributes of the individual filling out the form.
- Role / job title
- Seniority (IC, manager, director, VP, C-level)
- Functional department (engineering, marketing, sales, HR)
- Email domain type (corporate vs. personal)
- LinkedIn profile presence & connection count
Technographic signals
What the lead's company currently uses. Especially valuable for SaaS products that displace specific competitors.
- Current tool in the category (direct competitor?)
- Stack adjacencies (are they on tools that integrate well?)
- Developer-first vs. non-technical organization
- Cloud provider / hosting
Intent signals
How ready the lead is to buy now, as opposed to eventually.
- Stated timeline (this week / this quarter / this year)
- Urgency language in free-text ("we need this ASAP")
- Budget range
- Decision process stage (evaluating / narrowing / buying)
- Current pain quantification ("costing us $X/month")
- Referenced a specific product feature or use case
Engagement signals
How the lead got to the form and what they did along the way. AnimationFunnel captures these automatically on every submission.
- UTM source / medium / campaign
- Referring URL
- Time on form (rushed vs. considered)
- Field revisit / re-edit counts
- Device type (mobile drive-by vs. desktop research session)
Negative signals
Just as important as positive signals. A clean scoring model catches obvious disqualifiers explicitly rather than hoping the positive weights don't accidentally promote them.
- Personal email domain (gmail, hotmail, yahoo)
- Role or company name containing "student"
- Competitor domain (from your block list)
- Geography where you don't sell / can't support
- Team size below your minimum viable customer size
- Free-text containing spam patterns, profanity, or obvious time-wasters
Explicit vs. implicit signals
Explicit signals are what the lead tells you directly (budget, timeline, role). Implicit signals are what you infer from metadata (UTM, referring page, time-on-form). Explicit is more trustworthy but rarer; implicit is abundant but noisier. A good model uses both.
First-party vs. third-party data
First-party signals come from the form itself or from your own systems. Third-party signals come from enrichment services (Clearbit, Apollo, Zoominfo) that fill in firmographics from an email address alone. Third-party data can dramatically improve fit scoring for forms that only ask for an email, but has privacy and cost implications — treat it as a conscious choice, not a default.
Rule-based scoring
Rule-based scoring is a weighted sum of answer conditions. Open the form, go to Logic → Scoring, and add rules. Each rule adds (or subtracts) points when its condition matches. The final score is the sum of all matching rules, clamped to [floor, cap].
Anatomy of a rule
A rule has four parts: a when condition, a points delta, a reason string, and optional stop / tag / requires modifiers. Conditions support equality, comparison, inclusion, string operators, and boolean composition.
rules:
# Firmographic — who is the company?
- when: "team_size >= 50"
points: 25
reason: "Enterprise-size team (50+)"
- when: "team_size >= 200"
points: 15 # stacked on top of the previous +25
reason: "Large enterprise (200+)"
- when: "industry in ['SaaS', 'Fintech', 'E-commerce']"
points: 10
reason: "Priority vertical"
# Fit — do we solve their problem?
- when: "budget >= 10000"
points: 30
reason: "In-budget"
- when: "has_existing_tool = true AND tool in ['Typeform', 'Jotform']"
points: 15
reason: "Switching from a direct competitor"
# Intent — how ready are they?
- when: "timeline in ['this month', 'this quarter']"
points: 20
reason: "Immediate timeline"
- when: "role in ['CEO', 'CTO', 'VP', 'Head of Growth']"
points: 15
reason: "Decision-maker role"
# Negatives — hard down-scores
- when: "email ends_with 'gmail.com' OR email ends_with 'hotmail.com'"
points: -20
reason: "Personal email"
- when: "country in ['CN', 'RU']"
points: -30
reason: "Geo we don't sell into"
- when: "company contains 'student' OR role contains 'student'"
points: -40
stop: true # don't evaluate further rules
reason: "Student — out of ICP"
cap: 100
floor: 0
default: 0Available operators
- Equality —
=,!= - Comparison —
>,>=,<,<= - Inclusion —
in,not in - String —
contains,starts_with,ends_with,matches(regex) - Emptiness —
is empty,is not empty - Composition —
AND,OR,NOT, parentheses - Arithmetic — you can use basic math inside
points:points: rating * 3,points: team_size / 10(clamped to integer).
Special rule modifiers
stop: true— if this rule matches, no later rules are evaluated. Use for hard disqualifiers so positive signals later in the list can't rescue them.tag— attach a tag to the submission when the rule fires (tag: "gmail"). Useful for later filtering even when the rule contributes zero points.requires— only evaluate this rule if a list of earlier rules also matched (requires: ["in_budget"]). Lets you compose "bonus for decision-makers, but only if in budget" rules.id— give the rule a stable identifier so you can reference it fromrequiresand so the audit log remains meaningful even after re-ordering.
Per-field-type tips
- Dropdown / Multiple choice — ideal for scoring since the set of values is finite. Treat this as the primary signal structure.
- Number — great for budget, team size, revenue bands. Use
>=thresholds rather than exact matches so future values still score cleanly. - Email — use
ends_withto down-score free mail domains; usematcheswith a regex for custom domain allow/block lists. - Rating / Scale (1–5, 1–10) — multiply the rating directly into the score (
points: rating * 3). - Date — score urgency: a date within 30 days is worth more than one six months out.
- Phone — scoring on phone presence alone is a strong intent signal. Visitors who give a phone number have typically made a purchase decision.
- Short text / Long text — skip for rule-based scoring and hand to AI. Regex rules on free text tend to be brittle and miss synonyms.
- File upload — presence of a file (e.g. attached RFP) is often a strong buying-intent signal worth +10 to +30.
Testing your rule set
Before shipping, run the current rule set against historical submissions from Logic → Scoring → Replay. The replay shows the resulting score distribution, tier counts, and side-by-side comparisons with your previous scoring model. Aim for a distribution where hot is 10–20% of leads, warm is 30–40%, and cold is the rest. If everyone is hot, the rules are too loose; if nobody is, they are too tight or miscalibrated.
AI-based scoring
AI scoring passes the submission to a large language model via the OpenRouter integration and asks it to return a structured score. This is the only way to score free-text answers like "Describe your use case" with any nuance.
Choosing a model
OpenRouter gives you access to 100+ models. The right one depends on cost, latency, and how subtle the judgments are. Recommendations:
- High-stakes B2B scoring —
anthropic/claude-opus-4-7. Strongest reasoning, best instruction-following, highest cost (~$0.02 per submission). Ideal when a single hot lead is worth thousands of dollars. - Mid-market, high volume —
anthropic/claude-sonnet-4-6oropenai/gpt-4o. 5–10× cheaper than Opus, still very accurate on qualification tasks. - Very high volume or simple classification —
anthropic/claude-haiku-4-5oropenai/gpt-4o-mini. Sub-$0.001 per call, ideal when the rubric is simple (e.g. pure intent classification into 3–5 buckets). - Self-hosting or data residency hard constraint —
meta-llama/llama-3.1-70b-instructvia a private OpenRouter endpoint, or point the integration at your own inference server.
Writing a scoring prompt
Good scoring prompts have four parts: a persona (who is judging), a rubric(what "good" looks like at each score band), the data (the submission), and the output schema (exactly what JSON to return).
System:
You are a senior B2B sales qualifier at a PLG SaaS company.
Your job is to score inbound leads from 0-100 on likelihood to close
within the next two quarters.
Rubric:
- 85-100: Enterprise-fit, decision-maker, in-budget, urgent timeline,
specific use case that matches our product strengths.
- 70-84: Clear mid-market fit, decision-maker or influencer, budget
plausible, 1-2 quarter timeline.
- 50-69: Early-stage interest, some fit signals, long or unclear
timeline, or one major concern (budget, geo, industry).
- 30-49: Weak signals, exploratory, or outside our ICP but not
obviously disqualified.
- 0-29: Students, job seekers, competitors, wrong geo, or personal
use.
Be strict. Most leads are NOT 85+. Reserve high scores for clear fit.
Return ONLY valid JSON, no prose:
{
"score": integer 0-100,
"tier": "hot" | "warm" | "cold" | "disqualified",
"reason": string (max 200 chars, sales-rep-facing),
"red_flags": array of strings (empty if none),
"strengths": array of strings (empty if none)
}
User:
Company: {{field:company}}
Role: {{field:role}}
Team size: {{field:team_size}}
Industry: {{field:industry}}
Budget: {{field:budget}}
Timeline: {{field:timeline}}
Use case: {{field:use_case}}
Current tool: {{field:current_tool}}
Email: {{field:email}}Prompt design principles
- Anchor the rubric in percentiles, not vibes. "Reserve 85+ for the top 10% of leads you actually see" produces more consistent scoring than "score enterprise leads high".
- Tell the model to be strict.LLMs are naturally generous. Without an explicit "be strict" instruction, score distributions skew toward 60–80 and everyone looks warm.
- Require reasoning on both sides.
red_flagsandstrengthsarrays force the model to look for negative signals, not just build a case for the score. - Keep the prompt version-controlled. Store the prompt in the integration config, not in ad-hoc edits. Bump a version string when you change it.
- Never let the prompt leak PII to the logs. AnimationFunnel's integration log masks email addresses and phone numbers by default; keep the feature on.
JSON schema & validation
Enable JSON mode in the OpenRouter integration and paste a schema to force valid output. AnimationFunnel validates every response against the schema — if a field is missing or the score is out of range, the response is retried up to two times before falling back to rule-only scoring.
{
"type": "object",
"required": ["score", "tier", "reason"],
"properties": {
"score": { "type": "integer", "minimum": 0, "maximum": 100 },
"tier": { "enum": ["hot", "warm", "cold", "disqualified"] },
"reason": { "type": "string", "maxLength": 240 }
}
}Temperature & determinism
Default temperature for scoring is 0.2 — low enough that the same submission scores consistently, high enough that the reason field reads naturally. Lower to 0 if you need full determinism for audit; raise to 0.5 only if you actively want varied reasoning text.
Cost management
Three built-in controls prevent runaway bills:
- Per-submission cap — requests that would exceed a dollar threshold are rejected before hitting OpenRouter.
- Monthly cap — pauses the integration and alerts workspace admins when reached. Integrations fall back to rule-only scoring until the cap resets or is raised.
- Pre-filter gates — rules can short-circuit AI scoring. If a submission is clearly disqualified by firmographics alone, skip the API call entirely.
Hybrid scoring
Most mature teams end up here: rules for deterministic signals (budget, geo, role) and AI for the nuanced signals (intent, fit, urgency from free-text). The two are combined with a configurable formula, and hard gates sit on top of the combined output.
Combine strategies
weighted_sum— final score isrules_weight * rules + ai_weight * ai. Most common; use0.6 / 0.4(rule-dominant) when stakeholders need explainability,0.4 / 0.6(AI-dominant) when the form is mostly free-text.max— take whichever model scored higher. Useful when you want either signal to be sufficient.min— take the lower of the two. A conservative default — both signals must agree before calling a lead hot.rules_with_ai_bonus— rules set the floor, AI can add 0–20 bonus points for nuance. Good when your rules already work well and AI is a refinement, not a replacement.ai_with_rule_gate— AI sets the score, but rules can clamp or disqualify. Good for mostly-free-text forms where structured fields only exist as sanity checks.
Gates — hard overrides
Gates run aftercombination and override the tier (and optionally the score). They are your safety net — the place to encode "no matter what the model says, X should never happen".
combine:
strategy: "weighted_sum"
weights:
rules: 0.6
ai: 0.4
gates:
# Hard disqualifiers — rules already caught these, but just in case.
- when: "rules.score < 20"
clamp_tier: "cold"
reason: "Rules failed hard disqualifier"
# Even if AI is thrilled, no budget = no hot tier.
- when: "budget < 1000 AND budget is not empty"
clamp_tier: "warm" # cap at warm, don't let AI push to hot
# Consistency check — if rules say cold but AI says hot, trust
# rules and flag for review.
- when: "rules.tier = 'cold' AND ai.tier = 'hot'"
clamp_tier: "warm"
tag: "needs-review"
# Named-account override — if the lead's company is on our
# strategic accounts list, always route to the named AE regardless
# of score.
- when: "company in named_accounts"
clamp_tier: "hot"
owner: "named-account-owner"Common anti-patterns
- Stacking too many weights. Three signals combined with different weights is fine; seven is an unexplainable soup. If you find yourself tuning a fifth weight, rethink the rubric.
- Using AI for hard filters."Is this person a student?" should be a rule on the company / role fields, not an AI judgment. AI is fuzzy by design; disqualifiers should be crisp.
- Letting AI override negative rules.If your rules strongly down-score a signal (personal email, wrong geo), set a gate so AI can't rescue it. Otherwise you'll ship "hot" leads with @gmail.com addresses to your top AE.
- Ignoring the drift. Prompts drift. Rule sets drift. Without monthly review, a model that worked in Q1 is subtly wrong by Q3. Block 30 minutes a month.
Using the score
Once scored, the submission exposes a set of new fields that can be used across AnimationFunnel and any connected system.
The score fields
_score— integer 0–100._tier— string:hot,warm,cold,disqualified(thresholds configurable)._score_reason— human-readable explanation for the sales rep._scored_at— ISO timestamp of when scoring ran._score_status—ok,failed, orskipped._score_version— string identifier for the scoring model that ran (e.g.v2026-04). Used to group historical submissions by the model that scored them._score_audit— JSON blob with the full trace: matched rules, AI raw response, gates applied, combine result. Used for debugging and the UI's explainability pane.
Where the score is available
- Ending screens — show a different thank-you message per tier. Recall tokens
{{field:_tier}}and{{field:_score}}work in any copy field. - Pipedrive mapping — write
_scoreto a custom deal field; drive pipeline, stage, and owner assignment by_tier. - Webhook payloads — every scored field appears at the top level of the signed webhook body.
- Google Sheets sync — a column per scored field, automatically added when scoring is enabled.
- Dashboard filters— filter, sort, and save views by tier. Slack digests can be scoped to "hot leads from the last 24 hours".
- CSV / Excel exports — included in every export as separate columns.
- API — the
GET /v1/submissionsendpoint supports?min_score=70and?tier=hotfilters. - Email automation — downstream tools can use the tier as a segmentation criterion in email sequences.
Dashboards & analytics
The form's Analytics → Scoring tab gives you:
- Score distribution histogram (how many leads at each bucket)
- Tier trend over time (are things getting better or worse?)
- Top rules by fire count and average contribution
- Source breakdown — score distribution sliced by UTM source, medium, or campaign
- Conversion rate by tier — once you wire close-won data back from the CRM, you can see "hot leads close at 34%, warm at 8%"
- Model diff — side-by-side comparison of two scoring versions on the same submissions
Routing by score
The most common use of the score is routing to the right team, pipeline, and experience based on tier. Configure under Logic → Routing.
Basic tier routing
routes:
- when: "_tier = 'hot'"
ending: "book-demo-now"
pipedrive:
pipeline: "Sales — Inbound"
stage: "Qualified"
owner: "round-robin:AEs"
notify:
- "slack:#sales-hot"
- "email:[email protected]"
sla_minutes: 15
- when: "_tier = 'warm'"
ending: "thanks-we-will-reach-out"
pipedrive:
pipeline: "Nurture"
stage: "New"
email_sequence: "nurture-14-day"
sla_minutes: 1440 # 24 hours
- when: "_tier = 'cold'"
ending: "self-serve-resources"
pipedrive: null # don't create a deal
email_sequence: "self-serve-welcome"
- when: "_tier = 'disqualified'"
ending: "not-a-fit-thank-you"
pipedrive: null
suppress_marketing: trueAdvanced routing patterns
- Geo + tier — hot US leads go to US AEs; hot EU leads go to EU AEs with different SLA windows aligned to working hours.
- Round-robin with OOO skip — owner lists respect vacation calendars; out-of-office reps are skipped automatically.
- Load-balanced assignment — owners are weighted by current open-deal count so no single AE gets buried.
- Named-account override— if the submitter's company appears on the strategic-accounts list, bypass tier routing and go directly to the named AE regardless of score.
- SLA escalation— if a hot lead isn't touched within the SLA window, re-notify a manager channel.
- Tier-based enrichment — only run expensive third-party enrichment (Clearbit, Apollo) on hot or warm leads to control cost.
- Tiered ending experience — hot leads see a Calendly embed; warm see a video intro and promised follow-up; cold see product docs and pricing.
Industry playbooks
B2B SaaS demo request
A typical end-to-end setup for a SaaS company with a "Request a demo" form.
- Form— 4 steps: contact info; company & role; budget & timeline; open-ended use case.
- Rules (60% weight) handle structured fields: +30 for budget ≥ $10k, +25 for team size ≥ 50, +20 for decision-maker roles, +15 for priority verticals, −20 for personal email, −30 for disqualified geos.
- AI (40% weight) reads use-case free-text and returns 0–100 for specificity, urgency, and product fit.
- Gates: rules < 20 clamps to disqualified; explicit budget < $1k caps at warm.
- Tiers: hot ≥ 75, warm 45–74, cold 20–44, disqualified< 20.
- Hot routing — Calendly ending, Pipedrive deal in Sales-Inbound/Qualified, round-robin to AEs, Slack ping, 15-minute SLA.
- Warm / cold routing — nurture sequence + self-serve respectively, no AE touch.
Expected outcome after 3 months: AE time shifts to the top ~15% of leads; hot-tier close rate moves from mid-teens to low-30s; warm self-serves or nurtures its way up; cold volume stops hitting the AE queue entirely.
E-commerce wholesale application
A DTC brand opens a wholesale application form for retail partners.
- Form collects: store name, store type (boutique / chain / online-only), annual revenue band, number of locations, current brands carried, order volume expectation, target markets, website URL.
- Rules: +25 for boutique or chain, +20 for revenue ≥ $1M, +15 for carrying complementary brands, −30 if carrying direct competitors, −25 if website URL is empty or a marketplace listing.
- AI evaluates the complementary-brands text and scores positioning fit 0–25.
- Tiers: hot ≥ 70 (account manager follow-up within 48h), warm 40–69 (self-serve wholesale portal), cold< 40 (polite decline + retail-store recommendations).
Education cohort admissions
A bootcamp or cohort-based course scoring applicant fit.
- Form: background, current role, goal, time commitment (hours/week), reason for applying, referral source.
- Rules: +25 for role alignment with track (e.g. "junior developer" for a senior-eng track scores 0), +20 for committed time ≥ 10h/week, +15 for a clear career goal stated.
- AI evaluates the "reason for applying" long-text for earnestness, specificity, and likelihood to complete the program.
- Tiers: hot → direct interview invite; warm → asynchronous screening exercise first; cold → rejection with feedback and resource recommendations.
Healthcare patient intake
A clinic or telehealth service prioritizing acute cases.
- Form: symptoms (multi-select), duration, severity (scale 1–10), existing diagnoses, insurance provider, preferred appointment window.
- Rules: +30 for acute-symptom categories, +25 for severity ≥ 8, +15 for duration > 2 weeks, +10 for in-network insurance.
- AI evaluates the free-text symptom description for red-flag phrases indicating urgency ("chest pain", "shortness of breath") — subject to strict prompt-safety guardrails; AI is never the sole basis for clinical decisions.
- Tiers: urgent → same-day slot auto-held; standard → next-available; consultation → asynchronous message-based review first.
Agency client intake
An agency qualifying inbound project inquiries.
- Form: company, industry, project type, budget range, timeline, project description, decision-maker status.
- Rules: +30 for budget ≥ agency minimum, +20 for in-service industries, +15 for decision-maker contact, −25 for "still gathering quotes" mentality.
- AI evaluates project-description text for scope clarity, realistic timeline vs. scope, and buying posture.
- Tiers: hot → partner email within 2 hours; warm → account manager follow-up; cold → resource-sharing email only.
Professional services
Consulting firms, law firms, accounting firms scoring prospective clients.
- Form: industry, company size, service area, urgency, existing provider, referral source, brief problem description.
- Rules: +30 for in-practice-area, +20 for company size match, +25 for "no existing provider" (greenfield), −20 for referral source being low-intent (e.g. Google Ads remarketing).
- AI reads the problem description and scores matter-fit (does this match what we're expert in?) and complexity (can we staff it?).
- Tiers: hot → partner introduction call; warm → associate consultation; cold → referral to partner firm or resource library.
Measuring success — KPIs
Rolling out scoring is only half the work. Measuring whether the model is calibrated against actual outcomes is what turns scoring from a feature into a competitive advantage.
Leading indicators (weekly)
- Score distribution shape — histogram across 100 buckets. Sudden skews signal either a prompt regression or a traffic-source shift.
- Tier proportions — percentage of submissions in each tier. Healthy ratio is roughly hot 10–20%, warm 30–40%, cold 40–50%, disqualified 5–10%.
- AI cost per submission — tracked against your per-submission cap.
- AI failure rate — percentage of submissions where AI scoring failed or returned invalid JSON.
- Rule fire counts — which rules are pulling weight and which never match? Rules that never fire are dead code.
Lagging indicators (monthly, quarterly)
- Close rate by tier — the definitive test. Hot should close at 3–5× warm; warm at 2–3× cold. If the slope is flat, the model is miscalibrated.
- Average deal size by tier — hot should produce bigger deals, not just more of them.
- Time-to-first-touch by tier — are SLAs being hit?
- Time-to-close by tier — hot leads should close faster (shorter sales cycles).
- Lost-reason analysis— of the hot leads that didn't close, what was the reason? If it's consistently "no budget" or "not a decision-maker", the scoring model is overweighting the wrong dimensions.
The monthly calibration loop
- Pull the prior month's scored submissions and CRM outcomes.
- Compute close rate per tier.
- Identify the 5–10 hot leads that didn't close. Read them individually. Why did the model rate them hot?
- Identify 5–10 warm/cold leads that did close. Why did the model underrate them?
- Adjust rules or prompt. Bump
_score_version. - Re-score the prior month via Replay. Compare tier distributions and observe whether the miss-cases move into their correct tier.
Best practices
Start with rules, add AI later
Rule-based scoring is deterministic, free, and explainable. You can iterate in minutes and justify the model to a CRO with a spreadsheet. AI scoring is powerful but harder to debug, costs per call, and is more fragile to prompt drift. Ship rules first, watch a month of real submissions, then decide whether AI adds enough lift to be worth the complexity.
Calibrate tiers to real conversion data
A score of 80 should mean "this bucket historically closes at 30%+", not "the model feels good about this one". Wire close-won data back from your CRM monthly and recheck the conversion rate per tier. If hot closes at the same rate as warm, the thresholds are wrong or the signals are wrong — fix it before blaming the reps.
Always capture _score_reason
Sales reps trust scores 10× more when they can see the reasoning. A score of 82 is noise; a score of 82 with "CEO of a 500-person SaaS company with a $50k budget and a 30-day timeline" is gold. It also makes miscalibration trivial to debug — you can scroll through the hot leads that didn't close and see exactly what signals misled the model.
Gate hard disqualifiers with rules, not AI
AI is great at nuance but unreliable at "this lead has no budget and cannot buy". Keep hard filters in rule form — they run first, they're deterministic, and a change to the prompt can't accidentally disable them.
Don't show the raw score to the visitor
Visitors seeing "You scored 34/100" is rarely a good experience. Use tiers for downstream routing but keep the raw number internal. If you want to adjust the visitor experience, do it through ending-screen logic ("book a call" vs. "check out our docs") — not by exposing the score.
Version your scoring model
When you change weights or prompts, bump a _score_version string on the submission (e.g. "v2026-04"). This lets you compare conversion rates across scoring iterations — "did the new rubric improve close rate on hot leads, or just move more leads into the hot bucket?"
Decide deliberately about partial submissions
Partial-submission scoring is off by default. Turn it on when the existenceof a lead matters more than completeness — e.g. you want to re-engage drop-offs at step 3 who looked promising on steps 1–2. Leave it off when you want a clean "these all finished the form" signal for sales.
Run a monthly scoring review
Block 30 minutes once a month to open the scoring analytics. Look at: distribution shape, tier conversion rates, top rules by fire count, and any submissions in the hot bucket that looked wrong. Small adjustments every month beat a big rewrite every year.
Pair scoring with email-based enrichment for short forms
If you run a short form (email + name only) for top-funnel capture, add enrichment: Clearbit, Apollo, or RB2B will fill in firmographics from the email address so your rule-based scoring has something to work with. The enriched fields live on the submission and feed scoring exactly the same way as user-provided fields.
Document the rubric in a place the whole team can see
Scoring is a cross-functional contract between marketing and sales. Marketing defines what a "hot" lead looks like; sales agrees to treat those leads with priority. The rubric should live somewhere both teams can read, and updates should be socialized before they ship.
Common mistakes to avoid
- Scoring against arbitrary numbers instead of outcomes."We think enterprise should score high" becomes a self-fulfilling prophecy; reps only call enterprise, so only enterprise closes, so the scoring looks validated. Wire close-won back and let the data speak.
- Over-engineering the rule set.A rubric with 40 rules isn't precise; it's fragile. 8–12 well-chosen rules outperforms 40 guesses.
- Shipping AI scoring without a JSON schema. Free-form LLM output will break. Always use JSON mode with a validated schema.
- Forgetting the OOO / load-balance logic. Routing to a single AE on vacation kills hot leads. Wire round-robin with skip-on-OOO from day one.
- Leaking the score to the visitor. Never display
_scoreor_tierin visitor-facing copy. Use it for routing, not UX. - Ignoring the reason field. Scores without reasons produce distrust. Rep trust is the whole point — protect it.
- Assuming the model is stable forever. Prompts drift; traffic sources shift; the ICP evolves. Review monthly or be surprised quarterly.
Alternatives compared
vs. manual qualification by SDRs
Manual triage works up to roughly 100 submissions per month. Above that, SDR time spent reading forms costs more than an automated scoring setup — and the inconsistency between reviewers (what one calls "hot", another calls "maybe") makes funnel metrics unreliable. Scoring wins on cost, speed, and consistency.
vs. HubSpot / Marketo rule-based scoring
HubSpot and Marketo invented modern rule-based scoring and remain strong at it. They lose on two fronts: they can't score free-text answers (no AI reading of "describe your use case"), and their scoring lives inside a larger marketing automation platform that most form users don't need. AnimationFunnel's scoring is form-native — it runs inline with the submission without a round trip to a separate platform — and it includes AI out of the box.
vs. Salesforce Einstein / 6sense predictive
Predictive scoring from Salesforce Einstein or 6sense is a classical ML approach: train on historical close-won data, produce a propensity score. Strengths: picks up patterns humans miss. Weaknesses: needs a large labeled dataset (thousands of closed deals), is a black box in sales reviews, and can't explain itself. AnimationFunnel's generative-AI approach works on day one with no training data and produces natural-language reasoning that reps can read.
vs. DIY Zapier + OpenAI scoring
A common DIY setup routes form submissions through Zapier to an OpenAI prompt and back to a CRM. It works but is fragile: every step is a failure mode, JSON parsing is manual, the prompt lives in a Zap nobody owns, and cost caps are invisible until the Zapier bill arrives. Moving this stack into AnimationFunnel collapses five tools into one integration with proper retries, cost caps, and an audit trail.
vs. Typeform / Jotform scoring
Typeform and Jotform both offer simple numeric scoring (add points per answer, show a total at the end). Neither supports AI scoring, neither has routing built in, and neither includes audit trails or calibration analytics. Fine for a quiz-style form, underpowered for a revenue-critical lead funnel.
Migrating from other tools
From HubSpot scoring
- Export your current HubSpot scoring rules (Settings → Properties → Score).
- Map each rule to AnimationFunnel's rule syntax. Most HubSpot point-rules translate 1:1.
- If your HubSpot model uses behavioral signals (pages visited, emails opened), those stay in HubSpot — blend them with the AnimationFunnel form score via a Pipedrive or HubSpot custom field that sums both.
- Replay both scores against the same dataset and confirm distribution shape before switching over.
From Zapier + OpenAI
- Copy the existing prompt from the Zapier step. Paste into the AnimationFunnel OpenRouter integration.
- Move the JSON schema into JSON mode — AnimationFunnel will validate and retry automatically.
- Decommission the Zap. The webhook or Google Sheets step that used to pick up the Zapier output now reads from the submission's scoring fields directly.
- Set per-submission and monthly cost caps — something Zapier lacked.
From Marketo / Eloqua
Marketo and Eloqua scoring models are typically more complex than their HubSpot equivalents — expect 30–50 rules and behavioral signals layered in. A pragmatic migration keeps Marketo for MQL scoring based on behavior, and uses AnimationFunnel's form scoring as the final gate that promotes MQL → SQL. The form is the last touchpoint before sales; scoring at that point has the most information available.
Privacy, compliance & ethics
GDPR
Lead scoring constitutes profiling under GDPR Article 22. AnimationFunnel treats it as non-solely-automated decision-making (a human sales rep ultimately makes the contact decision), which keeps it within legitimate-interest grounds. For extra protection, the consent field can include a line disclosing that submissions are algorithmically scored for qualification — recommended for EU-resident traffic.
Other regional regimes (CCPA, LGPD, PIPEDA)
Most non-EU privacy regimes borrow GDPR's profiling rules almost verbatim. AnimationFunnel's default EU data residency, combined with optional regional storage on enterprise plans (US, BR), keeps scored submissions within the appropriate jurisdictional boundaries.
HIPAA (healthcare)
Forms that collect protected health information (PHI) must be configured under a Business Associate Agreement (BAA). Scoring on PHI is technically allowed but introduces heightened obligations: prompt content is PHI, the OpenRouter integration must point at a BAA-eligible endpoint, and logs must follow PHI retention rules. See the healthcare guide.
Transparency & ethical use
- Never use scoring to discriminate. Avoid signals correlated with protected characteristics. If your rules down-score based on a proxy for race, gender, or national origin — even unintentionally — you have a legal and ethical problem, not just a calibration problem.
- Never route based solely on free-mail domains for consumer-facing products. A B2C service that treats gmail users as lower-priority is disqualifying most of its own market.
- Audit AI-generated reasons periodically. LLMs sometimes produce reasoning that cites protected characteristics ("the applicant's name suggests foreign origin..."). Filter these at the prompt level ("do not consider name, nationality, or demographic characteristics") and audit the output.
Frequently asked questions
What is lead scoring?
Lead scoring is the practice of assigning a numeric value — typically 0 to 100 — to every inbound lead to represent how likely they are to become a paying customer. It combines demographic, firmographic, technographic, and behavioral signals into a single number that sales teams can act on, replacing manual triage with data-driven qualification.
What is the difference between lead scoring and lead grading?
Lead scoring measures interest and intent (how likely to buy), while lead grading measures fit (how well the lead matches the ICP). Most modern systems combine both into a single composite score. AnimationFunnel treats fit signals and intent signals as weighted components of one number rather than maintaining two separate values.
What is the difference between MQL and SQL?
An MQL (Marketing Qualified Lead) has shown enough interest to warrant marketing follow-up (email nurture, retargeting). An SQL (Sales Qualified Lead) has crossed a higher threshold and is worth direct sales outreach. Lead scoring defines the thresholds that promote leads between these stages — typically 40 for MQL and 70 for SQL on a 0–100 scale.
Is rule-based or AI-based lead scoring better?
Rule-based scoring is faster, cheaper, and deterministic — ideal for structured data like budget, team size, or role. AI scoring handles unstructured free-text and nuance but costs per call and is harder to debug. Most mature teams use a hybrid model: rules for hard filters and deterministic signals, AI layered on top for soft signals. Start with rules and add AI once the form reveals answers that rules cannot evaluate.
Can I score old submissions retroactively?
Yes. From Submissions → Bulk actions → Re-score you can run the current scoring model against any filtered set of past submissions. AI scoring will charge per submission through OpenRouter, so batch size and cost caps matter — a dry run against 100 submissions first is wise before backfilling 50,000.
Are partial (abandoned) submissions scored?
By default, no — a lead needs to complete the form to be scored. Enable Score partial submissions under scoring settings if you want drop-offs scored too; the model will see only the answers provided so far, and its confidence will naturally be lower. Partial scores are tagged _score_partial: true so you can filter them out of primary routing.
How do I explain a score to a sales rep?
Every submission's detail page shows which rules fired, what the AI returned, how the two were combined, and any gates that applied. The _score_reason field is designed to be dropped straight into a CRM note. In Pipedrive, the integration writes it onto the deal as a visible note so reps see the reasoning the moment they open the record.
What if the AI model's behavior drifts?
OpenRouter model versions are pinned by default — if you selected anthropic/claude-opus-4-7, that exact version is used until you change it. When a newer version ships, the UI surfaces a "new version available" banner and lets you A/B test the new model against the current one on a percentage of traffic before switching over. Bump _score_version when you cut over so historical comparisons remain clean.
Can a human override the score?
Yes. From the submission detail page, click Edit score to set a manual override. The original machine score is preserved in _score_machine, and the manual value takes precedence in all downstream integrations. Overrides are logged with the editing user and reason.
Is lead scoring GDPR-compliant?
Yes, with the caveat that lead scoring constitutes profiling under GDPR Article 22. AnimationFunnel treats it as non-solely-automated decision-making (a human sales rep ultimately makes contact decisions), which keeps it within normal legitimate-interest grounds. For peace-of-mind, the consent field can include a line disclosing that submissions are algorithmically scored for qualification.
How much does AI lead scoring cost?
Cost depends on the model chosen via OpenRouter. Claude Opus 4.7 costs roughly $0.01–0.03 per submission; Claude Sonnet or GPT-4o cost 5–10× less; Haiku or GPT-4o-mini cost under $0.001 per call. AnimationFunnel adds no markup and lets you set per-submission and monthly cost caps to prevent runaway bills.
What happens when scoring is wrong? False positives vs. false negatives.
No scoring model is perfect. False positives (cold leads scored hot) waste AE time; false negatives(hot leads scored cold) are lost revenue. The calibration loop is designed to surface both: review hot leads that didn't close, and occasionally audit cold leads to check for missed intent. Adjust weights accordingly. Over time, a healthy model trends toward a 5–10% mis-scoring rate — which is still an order of magnitude better than manual review.
Glossary
- BANT — Budget, Authority, Need, Timeline. Classic sales qualification framework.
- Composite score — a single score that combines fit and intent signals.
- Demographic signal — information about the individual (role, seniority, email domain).
- Explicit signal — data the lead provides directly.
- Firmographic signal — information about the company (size, industry, country).
- Gate — a hard override that clamps tier or routing regardless of the combined score.
- Hot lead — top tier, typically 15–20% of traffic, immediate sales priority.
- Implicit signal — data inferred from metadata (UTM, referring page, device).
- Lead grade — legacy term for the fit component of a score.
- Lead scoring — the practice of assigning a numeric value to a lead to represent purchase likelihood.
- MQL — Marketing Qualified Lead. Score threshold typically ≥ 40.
- Negative signal — a fact that reduces the score (personal email, wrong geo).
- OpenRouter — the AI routing service AnimationFunnel uses to access 100+ LLMs.
- PQL — Product Qualified Lead. Scored on in-app usage, not form answers.
- Predictive scoring — classical ML approach trained on historical close/won data.
- Rubric — the written definition of what each score band means.
- SAL — Sales Accepted Lead. An SQL formally accepted by a rep.
- SDR — Sales Development Representative. Historically, the role that scoring automates.
- SQL — Sales Qualified Lead. Score threshold typically ≥ 70.
- Technographic signal— information about the lead's tech stack.
- Tier — human-readable bucket derived from the score (hot, warm, cold, disqualified).
Related reading
Was this page helpful?