Why this is hard to get right
A Scenario That Shows Why This Is Hard
Maya is a Director of Revenue Operations at a 400-person SaaS company. She's been tasked with leading a vendor evaluation for a new customer support platform. The current tool is crumbling under ticket volume. Her VP of Engineering, Head of Customer Success, and CFO all have a seat at the table — and each one has a different definition of "best."
She's collected 30 pages of RFP responses, two demo recordings, and a pricing spreadsheet full of asterisks. Her first instinct is to drop everything into ChatGPT and ask it to "compare these vendors and pick the best one." The output she gets back is a polite essay that praises all three vendors equally and ends with "the best choice depends on your specific needs." Useless.
The problem isn't the AI. The problem is that Maya gave it no decision framework.
Without criteria weights, the model treats pricing the same as security compliance. Without hard constraints, it recommends a vendor that flunked her SOC 2 check. Without a defined output format, she gets paragraphs when she needed a table her CFO could scan in 90 seconds.
She tries again, this time thinking about what a real procurement analyst would need to know before scoring anything. She maps out her criteria: Security at 25 points, Integrations at 20, Admin Effort at 15, Reporting at 15, Pricing at 15, and Vendor Stability at 10. She lists her hard stops — SOC 2 Type II, SAML-based SSO, EU data residency, a $80k annual budget cap, and a 6-week go-live window. She specifies the output: a weighted scorecard table, a 150-word exec recommendation, the top 5 risks, and 7 diligence questions for follow-up calls.
The second response is a different document entirely. Vendor B comes out on top with a score of 87/100. The risks section flags Vendor C's recent leadership turnover. The diligence questions go straight into the next vendor call.
Maya walks into the review meeting with a scorecard everyone can interrogate instead of a gut feeling no one can defend. The CFO asks two questions. The decision gets made in 40 minutes.
That shift — from "compare these" to a fully specified evaluation framework — is exactly what a well-constructed prompt unlocks. The domain knowledge was already in Maya's head. The prompt structure forced her to put it on paper in a form the model could use.
Common mistakes to avoid
Skipping Criteria Weights Entirely
When you list evaluation criteria without weights, the AI treats all factors as equal. A vendor that excels at reporting but fails on security gets the same composite score as one that nails both. Define percentage weights upfront — they're the difference between a ranking and a real decision. Without them, you get a balanced summary, not a scorecard.
Omitting Hard Constraints and Deal-Breakers
If you don't specify non-negotiable requirements, the AI can recommend a technically impressive vendor that can't pass your security review or blows your budget. List every hard constraint explicitly — compliance certifications, data residency, budget ceiling, timeline. This filters out non-starters before scoring begins, which is what a real procurement analyst does first.
Not Specifying the Decision Audience
A recommendation written for a technical lead looks nothing like one written for a CFO. Identify who will read and approve the output. Without this, you get a generic summary that's too technical for executives and too shallow for engineers. Audience context shapes tone, depth, and which risks deserve emphasis.
Pasting Raw RFP Notes Without Structure
Dumping unformatted vendor notes into a prompt forces the AI to interpret what matters. It will guess which details are relevant, often amplifying vendor marketing language over your actual requirements. Organize your notes by vendor and criteria category before pasting them in. Structured input produces structured output.
Asking for a Winner Without Requesting Risk Analysis
A recommendation without risk context is a liability. Vendors who score well on features often carry hidden risks — integration fragility, immature support, contract lock-in. Always request a risk section alongside the scorecard. This protects you in the post-signature debrief when something goes wrong.
Using Vague Scoring Instructions
Telling the AI to 'score each vendor' without a defined scale produces inconsistent results. Does 8/10 mean good or great? Define your scoring scale explicitly (e.g., 1-5 where 3 is meets requirements, 5 is exceeds significantly) and include a tie-break rule. This makes the output defensible across multiple reviewers and review cycles.
The transformation
Compare these vendors and tell me which one is best for our company.
You’re a **procurement analyst** helping us select a **customer support platform**. 1. Use the RFP notes below to score **Vendor A, B, and C**. 2. Create a **weighted scorecard** with totals and rank order. 3. Write a **150-word recommendation** for our exec team. **Criteria + weights:** Security (25), Integrations (20), Admin effort (15), Reporting (15), Pricing (15), Vendor stability (10). **Hard constraints:** SOC 2 Type II, SSO (SAML), data residency in EU, budget **≤ $80k/year**, go-live in **6 weeks**. **Output format:** Table + top 5 risks + 7 diligence questions. RFP notes: [paste notes]
Why this works
Role Assignment Anchors Judgment
The After Prompt opens with "You're a procurement analyst." This isn't cosmetic. It activates an evaluator mindset rather than a summarizer mindset. The model applies tradeoff logic, weights competing factors, and flags risks — behaviors that a neutral "compare these" instruction never triggers.
Weighted Criteria Force Real Tradeoffs
The After Prompt specifies "Security (25), Integrations (20), Admin effort (15)..." with explicit point values. This prevents the AI from averaging across all dimensions as if they're equal. The weights encode your organization's actual priorities, so the final rank order reflects what you care about — not what sounds balanced.
Hard Constraints Act as a Pre-Filter
The line "SOC 2 Type II, SSO (SAML), data residency in EU, budget ≤ $80k/year, go-live in 6 weeks" eliminates vendors before scoring begins. This mirrors real procurement logic. It also protects you from a top-scored vendor that can't clear your legal or security review.
Defined Output Format Controls Usefulness
The After Prompt specifies "Table + top 5 risks + 7 diligence questions." Each output type serves a different audience — the table for executives, the risks for your security team, the diligence questions for follow-up vendor calls. Specifying format upfront means you get a document, not a draft.
Word Limit Creates Exec-Ready Output
The "150-word recommendation for our exec team" constraint forces compression. Without it, AI outputs sprawl into caveats and qualifications that executives skip. A tight word limit forces the model to prioritize, which means you get a recommendation someone will actually read and act on.
The framework behind the prompt
The Procurement Science Behind Vendor Scorecards
Vendor selection sits at the intersection of decision theory, organizational psychology, and procurement practice. Understanding why structured evaluation outperforms gut-feel comparison helps you build better prompts — and defend better decisions.
Multi-Criteria Decision Analysis (MCDA) is the formal framework behind weighted scorecards. MCDA acknowledges that real-world decisions involve competing objectives that can't reduce to a single metric. By assigning explicit weights to criteria, evaluators make their value judgments visible and auditable. Research in behavioral economics consistently shows that unstructured group decisions amplify bias toward whichever vendor presented most recently or most confidently — a phenomenon called the recency effect. Weighted scorecards counteract this by anchoring discussion to pre-committed criteria rather than post-hoc impressions.
The analytic hierarchy process (AHP), developed by Thomas Saaty in the 1970s, formalized pairwise comparison of criteria as a way to derive defensible weights. Modern procurement teams rarely implement full AHP, but the underlying logic — that weights should reflect deliberate tradeoffs, not instinct — remains foundational. When you specify "Security (25), Pricing (15)" in a prompt, you're doing a simplified version of AHP.
Risk-adjusted procurement adds another layer. Standard scorecards measure capability at a point in time. Risk-adjusted evaluation also considers switching costs, vendor financial stability, and implementation failure probability. The "top 5 risks" section in the After Prompt operationalizes this principle. It forces the model to treat the evaluation as a forward-looking decision, not just a feature comparison.
Finally, stakeholder alignment theory explains why the recommendation format matters as much as the score. Research on organizational decision-making shows that evaluations fail not because they're wrong, but because they don't speak to the audience who approves them. An exec-ready 150-word recommendation with a clear rank order is more likely to produce a decision than a 10-page report that requires interpretation. Prompt structure controls output structure — and output structure controls adoption.
Prompt variations
You are a senior infrastructure architect evaluating cloud observability platforms for a 200-person engineering team.
Score these three vendors: Datadog, Honeycomb, and Grafana Cloud.
Weighted criteria:
- Distributed tracing depth (30 points)
- Log ingestion cost at 500GB/day (25 points)
- Kubernetes and Helm chart support (20 points)
- Alert routing and on-call integrations (15 points)
- Vendor support SLA (10 points)
Hard constraints: Must support OpenTelemetry natively. No per-seat licensing. Must integrate with PagerDuty. Annual budget cap of $120k.
Output format:
- Weighted scorecard table with totals and rank order
- 200-word recommendation for our VP of Engineering
- Top 4 operational risks
- 6 technical diligence questions for proof-of-concept planning
Vendor notes: [paste your evaluation notes here]
You are a marketing operations analyst helping select a B2B marketing automation platform for a 50-person marketing team running $2M in annual pipeline campaigns.
Evaluate these vendors: HubSpot Marketing Hub, Marketo Engage, and ActiveCampaign.
Weighted criteria:
- CRM sync reliability with Salesforce (30 points)
- Multi-touch attribution reporting (25 points)
- Email deliverability and A/B testing (20 points)
- Admin burden and ramp time (15 points)
- Contract flexibility (10 points)
Hard constraints: Must have native Salesforce bidirectional sync. No minimum contract over 12 months. Must support account-based marketing workflows. Annual budget under $60k.
Output format:
- Weighted scorecard with vendor totals
- 150-word recommendation for CMO and RevOps VP
- Top 3 migration risks from our current platform
- 5 questions to ask each vendor during a second demo
RFP notes: [paste vendor responses here]
You are a procurement risk analyst conducting a vendor risk assessment for a regulated financial services firm evaluating document management platforms.
Assess these vendors: iManage, NetDocuments, and SharePoint Online.
Weighted criteria:
- Data encryption and access controls (30 points)
- Audit trail completeness (25 points)
- eDiscovery and legal hold support (20 points)
- Integration with existing DMS workflow (15 points)
- Vendor financial stability (10 points)
Hard constraints: Must meet SEC Rule 17a-4 WORM storage requirements. Data must remain in US-based data centers. Must support Active Directory federation. Annual cost must not exceed $95k including implementation.
Output format:
- Scored comparison table with weighted totals
- 175-word recommendation written for our General Counsel
- Top 5 compliance and contract risks
- 8 due diligence questions focused on regulatory posture and breach notification
Vendor materials: [paste RFP responses and demo notes here]
You are an operations lead at a 15-person startup evaluating project management tools to replace our current system.
Compare: Linear, Notion, and Asana.
Weighted criteria:
- Setup time and onboarding ease (35 points)
- Engineering and product workflow fit (30 points)
- Reporting and sprint tracking (20 points)
- Per-seat cost at 20 users (15 points)
Hard constraints: Must be fully operational within one week. No annual contracts required upfront. Must have a usable free tier or trial. Monthly cost must stay under $400.
Output format:
- Simple scoring table with totals
- 100-word plain-language recommendation for our founding team
- Top 2 risks of switching mid-sprint
- 3 questions to validate our choice before committing
Our notes from trials: [paste your observations here]
When to use this prompt
Marketing Ops Leaders
Score marketing automation vendors using weighted criteria like integrations, attribution, and admin time.
Product Managers
Evaluate analytics or feature flag platforms with constraints around data residency, SDK support, and rollout timelines.
Customer Success Teams
Compare customer success tools and surface onboarding risks before you commit to a multi-year contract.
Engineering Managers
Assess infrastructure or observability vendors with security controls, SSO requirements, and implementation effort.
Sales Operations
Rank CRM add-ons by workflow fit, reporting depth, and total cost under a fixed budget cap.
Pro tips
- 1
Define your “hard constraints” first so the model filters out non-starters early.
- 2
Share who will approve the decision so the recommendation matches their priorities and reading time.
- 3
Include 2–3 deal-breaker scenarios so the output highlights risks that matter to you.
- 4
Set a scoring scale and tie-break rule so different reviewers reach the same conclusion.
Most procurement decisions don't happen in a single prompt. The best results come from a three-round workflow that mirrors a real RFP process.
Round 1 — Shortlisting: Use a lightweight prompt to filter a long vendor list down to three finalists based on hard constraints alone. Ask the AI to output only a pass/fail table with one-line reasoning per constraint per vendor. This takes 60 seconds and eliminates vendors before you invest time in full evaluation.
Round 2 — Weighted Scorecard: Apply the full structured prompt to your finalists. Include your weighted criteria, hard constraints, RFP notes, and defined output format. This is where the scorecard, recommendation, and diligence questions are generated.
Round 3 — Risk Deep Dive: Follow up on the riskiest vendor or the closest second-place finisher. Ask: "Assume we selected Vendor B. List the top 7 implementation risks in order of likelihood, and for each, describe one mitigation step." This turns a scorecard into a transition plan.
Running all three rounds takes under an hour and produces a document package that covers decision rationale, risk acknowledgment, and follow-up actions — everything a procurement review committee needs.
One of the most common causes of inconsistent AI output is an undefined scoring scale. Before you run any vendor evaluation prompt, decide on your scale and include it explicitly. Here's a reference you can paste directly into your prompt:
Recommended 5-point scale:
- 5 — Exceeds requirements significantly. Vendor capability goes beyond stated needs with minimal configuration.
- 4 — Meets requirements fully. Vendor meets the stated criterion with no gaps.
- 3 — Meets requirements partially. Vendor meets the minimum threshold but requires workarounds or customization.
- 2 — Does not meet requirements. Vendor falls short and would require significant investment to close the gap.
- 1 — Fails requirement. Vendor cannot meet this criterion within reasonable scope or budget.
Include a tie-break rule when two vendors land within 5 points of each other: "If totals are within 5 points, rank by the highest-weighted criterion where they differ." This prevents decision paralysis in close evaluations and gives your committee a clear procedural answer when scores converge. Consistent scales also make it easier to re-run the evaluation with updated notes six months later without reinterpreting past scores.
Vendor selection in healthcare, financial services, and government contexts carries compliance requirements that standard prompts don't capture. You need to build regulatory framing directly into your evaluation criteria and hard constraints.
Healthcare (HIPAA): Add a mandatory criteria category called Regulatory Compliance worth 30 points minimum. Include sub-criteria for BAA availability, PHI data handling documentation, and breach notification SLAs. Hard constraints should include HIPAA attestation status and third-party audit recency.
Financial Services (SOX, SEC, FINRA): Require vendors to address audit trail completeness, WORM storage certification, and data retention policy alignment. Add a diligence question specifically about regulatory examination readiness: "Describe how your platform supports a regulatory examination, including examiner access to records."
Government (FedRAMP, ITAR): Replace standard security criteria with a FedRAMP authorization level check as a hard constraint. Weight data sovereignty and personnel security clearance compatibility heavily. Flag any vendor headquartered outside the US for additional scrutiny in your risk section.
For all regulated contexts, add this line to your prompt: "Highlight any area where vendor claims cannot be verified from the materials provided and flag it as requiring third-party attestation."
When not to use this prompt
When This Prompt Pattern Is Not the Right Tool
This prompt type produces its best results when you have at least two vendors, real RFP notes, and defined internal requirements. There are situations where a different approach serves you better.
Don't use this prompt if you're in early discovery. If you haven't yet defined your requirements or talked to vendors, a scorecard will score against the wrong criteria. Start with a requirements-gathering prompt first, then return to vendor evaluation once you know what "good" actually looks like for your organization.
Don't use this if you have only one vendor. When you're evaluating a sole-source renewal or a mandated platform, a competitive scorecard adds false precision. Use a risk assessment or contract review prompt instead.
Don't use this for decisions that require legal or security sign-off. AI output can help you organize your thinking and draft diligence questions, but it cannot replace a qualified security review, legal contract analysis, or compliance certification check. Use the scorecard as an input to those processes, not a substitute for them.
Don't use this when stakeholder alignment is the real problem. If your executive team already has a preferred vendor and the evaluation is political cover, a scorecard won't resolve that. Address the alignment problem directly before investing in structured evaluation.
Troubleshooting
The scorecard scores all vendors within a few points of each other
Your criteria may be too broad or your notes too similar across vendors. Add a differentiation instruction: "For each criterion, explicitly note what distinguishes each vendor — even small differences matter." Also check whether you've given the AI enough vendor-specific detail. Sparse notes produce similar scores. Add specific quotes, pricing tiers, or demo observations to create separation in the input.
The recommendation contradicts the scorecard ranking
This happens when the AI weighs narrative context in your notes more heavily than your defined criteria. Add an explicit tie-breaking instruction: "The recommendation must align with the weighted scorecard totals. If you recommend a lower-scoring vendor, state the specific reason your criteria weights don't capture the deciding factor." This forces consistency or surfaces a genuine gap in your evaluation framework.
The risk section is too generic — risks like 'implementation delays' apply to any software
Push for vendor-specific risks by adding context. Include a sentence like: "Each risk must reference a specific characteristic of that vendor based on the notes provided — do not include risks that apply equally to all vendors." If you have vendor-specific details like recent funding events, support reviews, or integration complexity, include them explicitly so the risk section can draw on real signal.
The executive recommendation is too long and reads like a consultant report
Enforce length and format more explicitly. Replace "150-word recommendation" with: "Write a 3-sentence recommendation: sentence 1 states the top vendor, sentence 2 gives the primary reason based on weighted score, sentence 3 states the most important risk and one mitigation step." Structural constraints produce more useful executive summaries than word counts alone.
The diligence questions are too surface-level and vendors can answer them with marketing copy
Specify the type of evidence you want the questions to surface. Add: "Each diligence question must require the vendor to provide a specific artifact, reference customer, configuration example, or third-party certification — not a general description of their capabilities." This produces questions like "Can you provide your most recent SOC 2 Type II audit report?" instead of "How do you handle security?"
How to measure success
How to Evaluate the Quality of Your AI Output
Not all scorecard outputs are equally useful. Check yours against these signals before presenting it to a stakeholder.
The scorecard table must be internally consistent:
- Scores multiply correctly against weights
- Total points sum to the maximum defined by your scale
- Rank order in the table matches the rank order in the written recommendation
The recommendation must be decision-ready:
- Names a specific vendor without hedging or "it depends" language
- Cites the weighted score as the basis for the recommendation
- Stays within the requested word count — executive summaries that run long get skipped
The risk section must be vendor-specific:
- Each risk references a detail from your RFP notes or vendor materials
- No generic risks like "implementation may take longer than expected" appear without a specific vendor context
- At least one risk addresses what happens after you sign, not just during evaluation
The diligence questions must require evidence:
- Each question asks for a specific artifact, customer reference, certification, or configuration example
- No question can be answered with marketing copy
- Questions map to your highest-weighted or most uncertain criteria
Now try it on something of your own
Reading about the framework is one thing. Watching it sharpen your own prompt is another — takes 90 seconds, no signup.
Build a weighted vendor scorecard with exec-ready recommendations and built-in risk analysis.
Try one of these
Frequently asked questions
Three to five vendors works best. Beyond five, the scorecard becomes unwieldy and the recommendation loses precision. If you're evaluating more, consider a two-stage approach: use the first prompt to create a shortlist of three based on hard constraints, then run the full weighted scorecard on the finalists. This mirrors how real procurement teams use RFPs.
Add a note directly in the prompt: "Where vendor information is missing, flag the gap rather than estimate." This prevents the AI from inventing responses. Gaps in your notes become diligence questions for follow-up vendor calls — which is exactly what the output's diligence section should capture. Incomplete notes are data, not a blocker.
Replace the exec recommendation with a decision brief format: include a one-paragraph summary, a scored table, and a section where each evaluation criterion links to a committee member's priority. You can add: "Structure the recommendation so each stakeholder can locate the criteria most relevant to their function." This turns one output into a document that runs a room.
Yes, and it works well. Adjust the framing: "You are evaluating whether to renew our contract with Vendor X or switch to Vendor Y." Add a renewal-specific criterion like implementation switching cost or relationship continuity. Include your current contract terms as context. The scorecard then captures the real tradeoff between staying and moving, not just feature comparison.
This usually means your criteria weights reflect existing bias. Audit your weights — ask yourself if security is really worth more than pricing, or if you weighted it high because a vendor already passed that check. You can also add: "Flag any criteria where all three vendors score similarly, as these are not differentiators." This forces the output to surface where the real decision lives.
Add two elements to your prompt: a scoring rationale column in the table (one sentence per cell explaining each score), and a constraints compliance section confirming which vendors meet your hard requirements. This turns the scorecard from an opinion into an auditable document. Finance and legal teams want a paper trail — build it into the output format.
Start with one full-output prompt, then use targeted follow-ups to go deeper. After your first scorecard, ask: "Expand the risk section for Vendor B with specific contract and implementation risks." Or: "Rewrite the executive recommendation assuming the primary concern is speed to go-live, not cost." Iteration works best when your base output is already structured.