# Franhaven Source Ingestion Criteria

Use this gate before adding a row to `franchise-buyer-raw-comments-sprint-001.csv`.

## Source Type Labels

- `comment`: Reply/comment under a post, video, forum thread, review, or article.
- `submission`: Original Reddit post or forum thread starter.
- `post`: Original social post from LinkedIn, Facebook, TikTok, or similar.
- `review`: App store, Google, Trustpilot, franchise portal, or marketplace review.
- `video_caption`: Creator/video metadata worth preserving for context.
- `forum_reply`: Non-Reddit forum reply when the platform distinction matters.

## Ingest

Add the row when it contains at least one of these:

- First-hand franchise buyer, owner, operator, or former-franchisee experience.
- Specific due diligence question or comparison behavior.
- Cost detail: buildout, royalties, working capital, SBA loan, payroll, marketing fees, rent, cash flow, or break-even.
- FDD, Item 19, territory, legal, financing, validation call, or franchisor-support signal.
- Clear objection, regret, fear, decision-stage language, or expectation gap.
- High-signal phrasing that can become copy, FAQ, sales discovery, content, or offer positioning.

## Review Manually

Hold for manual review when it is useful but weak:

- Second-hand advice without owner/buyer context.
- Broad small-business advice that may apply to franchise buyers.
- Low-detail claims with strong emotion but little evidence.
- Promotional content that also contains a real objection or owner pain.
- Duplicated topic but unusually strong wording.

## Reject

Do not ingest:

- Spam, affiliate pitches, lead-gen bait, bot-like comments, or obvious astroturfing.
- Duplicate rows unless the phrasing or context is materially different.
- Empty reactions, jokes, generic encouragement, or one-word sentiment.
- Generic entrepreneurship content with no franchise-specific signal.
- Claims that cannot be tied to a source URL, thread, platform, or author context.
- Defamatory personal claims about named individuals unless independently verified and necessary.

## Minimum Row Requirements

Every ingested row must include:

- `source_type`
- `platform`
- `source_url`
- `thread_url` when available
- `raw_comment_text`
- `captured_summary`
- `initial_relevance`
- `is_spam_or_irrelevant`
- `is_duplicate`

## Scoring Shortcut

- `high`: Specific, first-hand, franchise-relevant, commercially useful.
- `medium`: Relevant but less specific, second-hand, or needs confirmation.
- `low`: Weak signal kept only because it supports a recurring pattern.

Rows marked `is_spam_or_irrelevant=true` should stay out of dashboards and analysis unless being used to audit collection quality.
