Cost savings (short notes)¶

1) Reduce LLM calls per article¶

Today each article can trigger multiple LLM calls (find articles + write summary + dedup/retry). Merging steps or removing redundant checks saves tokens immediately. Biggest savings come from fewer calls.

2) Lower completion limits¶

Set a tighter max_completion_tokens and make prompts shorter. This caps output length, which directly cuts spend. If articles are still OK, keep it low.

3) Stricter dedup + fewer trends¶

Skip items already covered (today + yesterday) more aggressively and reduce DEFAULT_TARGET_TRENDS if needed. Fewer articles = fewer LLM calls.

4) Run less often¶

The workflow runs hourly (6–21). Dropping to every 2–3 hours cuts cost roughly in half or more.

5) Use Batch/Flex tiers (if latency OK)¶

Batch/Flex pricing is cheaper but slower. Good if you don’t need near‑real‑time updates.

6) Cache results¶

If a trend repeats, reuse a previous summary or do a small update instead of full regeneration.

7) Model swap (optional)¶

gpt-5-nano is ~5x cheaper than gpt-5-mini, but quality drops (more errors, weaker reasoning). Only use if you can tolerate lower quality.