Cost savings (short notes)¶
1) Reduce LLM calls per article¶
Today each article can trigger multiple LLM calls (find articles + write summary + dedup/retry). Merging steps or removing redundant checks saves tokens immediately. Biggest savings come from fewer calls.
2) Lower completion limits¶
Set a tighter max_completion_tokens and make prompts shorter. This caps output length, which directly cuts spend. If articles are still OK, keep it low.
3) Stricter dedup + fewer trends¶
Skip items already covered (today + yesterday) more aggressively and reduce DEFAULT_TARGET_TRENDS if needed. Fewer articles = fewer LLM calls.
4) Run less often¶
The workflow runs hourly (6–21). Dropping to every 2–3 hours cuts cost roughly in half or more.
5) Use Batch/Flex tiers (if latency OK)¶
Batch/Flex pricing is cheaper but slower. Good if you don’t need near‑real‑time updates.
6) Cache results¶
If a trend repeats, reuse a previous summary or do a small update instead of full regeneration.
7) Model swap (optional)¶
gpt-5-nano is ~5x cheaper than gpt-5-mini, but quality drops (more errors, weaker reasoning). Only use if you can tolerate lower quality.