llm_stock_report Full Guide (English)
This guide covers end-to-end usage: local runs, GitHub Actions automation, configuration, model/data behavior, and troubleshooting.
For a step-by-step GitHub Actions setup walkthrough:
- docs/github-actions-setup_EN.md
1. Scope
llm_stock_report is a research/reporting pipeline that:
- retrains CN/US/HK models weekly,
- runs next-day prediction daily,
- combines news evidence and LLM reasoning,
- sends summary + detailed messages to Telegram.
It is not an auto-trading system.
2. Runtime Pipeline
A daily report run (run_report) executes:
1. Load symbol universe from config/universe.yaml
2. Fetch historical market data
- use local cache first and fetch only missing ranges
- apply exponential retry on fetch failures
3. Build technical features and next_day_return
4. Load latest model (auto-retrain if missing/expired)
5. Predict and rank symbols
6. Fetch news (Tavily primary, Brave fallback)
7. Generate Chinese narratives via OpenAI
8. Render outputs and send Telegram messages
A retrain run (run_retrain) executes:
- fetch data -> build features -> train -> save model.
3. Project Layout
app/
common/ config, logging, schemas
data/ data fetchers and symbol normalization
features/ technical indicators
model/ frame builder, trainer, predictor, registry
news/ Tavily/Brave search + fallback
llm/ OpenAI client + prompts + reasoner
report/ markdown rendering + Telegram sender
jobs/ CLI entrypoints
config/
universe.yaml
report.yaml
outputs/{market}/{date}/
summary.md
details.md
predictions.csv
run_meta.json
models/{market}/{model_version}/
model.pkl
metadata.json
4. Environment Setup
4.1 Python
- Python 3.11 recommended
4.2 Install
python -m pip install --upgrade pip
python -m pip install -e '.[dev]'
If you also need the optional pyqlib dependency set, run:
python -m pip install -e '.[qlib]'
4.2.1 Test Entry
The official test command is:
python -m pytest
4.3 Environment Variables
cp .env.example .env
Fill required values:
- TAVILY_API_KEY
- BRAVE_API_KEY
- TELEGRAM_BOT_TOKEN
- TELEGRAM_CHAT_ID
Configure at least one LLM path:
- LLM_PROVIDER=openai + OPENAI_API_KEY
- LLM_PROVIDER=gemini + GEMINI_API_KEY
- LLM_PROVIDER=ollama + OLLAMA_BASE_URL + OLLAMA_MODEL
Optional:
- TELEGRAM_MESSAGE_THREAD_ID
- OPENAI_BASE_URL
- OPENAI_MODEL
- GEMINI_MODEL
- GEMINI_BASE_URL
- OLLAMA_MODEL
- OLLAMA_BASE_URL
- LLM_PROVIDER
- REPORT_LANGUAGE (zh / en, default zh)
- PAGES_SITE_BASE_URL (used to append Pages links at the end of Telegram cards)
- PAGES_DEFAULT_LANGUAGE (zh / en, default zh)
- PAGES_CASE_RETENTION_DAYS (Pages case retention days, default 3)
- MAX_STOCKS_PER_RUN
- DETAIL_MESSAGE_CHAR_LIMIT
- MODEL_EXPIRE_DAYS
- DAILY_ANALYSIS_LOOKBACK_DAYS (daily reasoning lookback window, default 30)
- MARKET_INDEX_FETCH_ENABLED (default true)
- STOCK_LIST_CN / STOCK_LIST_US / STOCK_LIST_HK
- LLM_MAX_RETRIES
- LLM_RETRY_BASE_DELAY_SECONDS
- LLM_RETRY_MAX_DELAY_SECONDS
- LLM_RETRY_JITTER_SECONDS
5. Universe Configuration
Edit config/universe.yaml:
cn:
- SH600519
- SZ000001
- SZ300750
us:
- AAPL
- MSFT
- NVDA
hk:
- HK00700
- HK03690
- HK09988
Notes:
- CN accepts SHxxxxxx / SZxxxxxx or plain 6-digit symbols (normalized internally).
- US uses normal tickers.
- HK accepts HK00700, 00700, 0700, or 700 (normalized to HK00700).
- Per-run symbol count is capped by MAX_STOCKS_PER_RUN (default 30).
6. Local Commands
6.1 Retrain
python -m app.jobs.run_retrain --market cn --date 2026-03-04
python -m app.jobs.run_retrain --market us --date 2026-03-04
python -m app.jobs.run_retrain --market hk --date 2026-03-04
Note: retraining checks local cache under qlib_data/history/, incrementally tops up missing history, and prunes stale rows outside retention.
6.2 Daily Report
python -m app.jobs.run_report --market cn --date 2026-03-04
python -m app.jobs.run_report --market us --date 2026-03-04
python -m app.jobs.run_report --market hk --date 2026-03-04
Run without Telegram sending:
python -m app.jobs.run_report --market cn --date 2026-03-04 --no-telegram
7. Output Contract
predictions.csv columns:
- market
- symbol
- asof_date
- score
- rank
- side
- pred_return
- model_version
- data_window_start
- data_window_end
run_meta.json keys:
- run_id
- market
- status
- total_symbols
- success_symbols
- failed_symbols
- failed_list
- model_version
- llm_model
- search_provider_primary
- search_provider_fallback
- start_time
- end_time
- model_engine
- model_fallback_used
- model_warning
8. Telegram Protocol
Send order:
1. Summary message ([CN] YYYY-MM-DD 日报摘要)
2. Per-symbol detail chunks ([CN][symbol][i/n])
3. Market overview message ([CN][MARKET][1/1])
Behavior:
- Max message length default: 3500 chars
- Long content is force-chunked with (i/n) markers
- Local files stay in Markdown; Telegram messages are converted into Telegram-safe HTML formatting before send
- Titles, emphasis, and news links are rendered in a Telegram-compatible format instead of exposing raw Markdown
- Telegram copy is also condensed into short "card" style blurbs instead of long reasoning paragraphs
- If PAGES_SITE_BASE_URL is configured, each card footer includes the daily case webpage link
9. GitHub Actions
9.1 Workflow files
.github/workflows/ci.yml.github/workflows/daily_cn.yml.github/workflows/daily_hk.yml.github/workflows/daily_us.yml.github/workflows/weekly_retrain.yml.github/workflows/deploy_pages.yml
9.2 Schedules (UTC)
daily_cn.yml:0 8 * * 1-5(16:00 Asia/Shanghai weekdays)daily_hk.yml:30 9 * * 1-5(17:30 Asia/Shanghai weekdays)daily_us.yml:30 23 * * 1-5(07:30 Asia/Shanghai next day)- These are automatic runs for each trading-day window: CN/HK on Asia/Shanghai weekdays, US on Asia/Shanghai Tue-Sat morning.
ci.yml: runs fullpython -m pytestonpush/pull_request- Daily report and retrain workflows run a fixed smoke-test suite before the main job.
weekly_retrain.yml: scheduled Sunday retraining- On GitHub-hosted runners, local Ollama is not reachable by default; use Ollama locally or on self-hosted runners.
deploy_pages.yml: auto-publishes GitHub Pages whendocs/**orpages_data/**changes (docs + daily cases)
9.2.1 GitHub Pages notes
- After each daily report,
python -m app.jobs.export_casewrites a case snapshot intopages_data/cases/**. deploy_pages.ymlrenders docs + cases into a static site and deploys it to Pages.- For first-time setup, go to
Settings -> Pagesand setSourcetoGitHub Actions.
9.3 GitHub Secrets
Required:
- TAVILY_API_KEY
- BRAVE_API_KEY
- TELEGRAM_BOT_TOKEN
- TELEGRAM_CHAT_ID
Optional:
- OPENAI_API_KEY (required when LLM_PROVIDER=openai)
- GEMINI_API_KEY (required when LLM_PROVIDER=gemini)
- OLLAMA_API_KEY (only for protected remote Ollama)
- TELEGRAM_MESSAGE_THREAD_ID
9.4 GitHub Variables (optional)
MAX_STOCKS_PER_RUN(default 30)DETAIL_MESSAGE_CHAR_LIMIT(default 3500)MODEL_EXPIRE_DAYS(default 8)STOCK_LIST_CN/STOCK_LIST_US/STOCK_LIST_HK(env override for universe)OPENAI_BASE_URLOPENAI_MODELLLM_PROVIDER(openai/gemini/ollama)REPORT_LANGUAGE(zh/en, defaultzh)PAGES_SITE_BASE_URL(used for Telegram card footer links)PAGES_DEFAULT_LANGUAGE(zh/en, defaultzh)PAGES_CASE_RETENTION_DAYS(default3)GEMINI_MODELGEMINI_BASE_URLOLLAMA_MODELOLLAMA_BASE_URLDAILY_ANALYSIS_LOOKBACK_DAYS(default30)LLM_MAX_RETRIES(default6)LLM_RETRY_BASE_DELAY_SECONDS(default5)LLM_RETRY_MAX_DELAY_SECONDS(default120)LLM_RETRY_JITTER_SECONDS(default1)MARKET_INDEX_FETCH_ENABLED(defaulttrue)
9.5 Artifacts
Each workflow uploads:
- outputs/**
- models/**
Retention: 14 days.
9.6 Training Window and Retry Knobs
TRAINING_WINDOW_DAYS: main training window (default730)DAILY_ANALYSIS_LOOKBACK_DAYS: lookback context window for daily reasoning (default30)FEATURE_WARMUP_DAYS: extra warmup days for indicators (default60)HISTORY_PRUNE_BUFFER_DAYS: extra cache retention (default60)INCREMENTAL_OVERLAP_DAYS: overlap days for incremental sync (default7)FETCH_MAX_RETRIES: max retries for data fetch (default5)FETCH_RETRY_BASE_DELAY_SECONDS: base retry delay (default15)FETCH_RETRY_MAX_DELAY_SECONDS: max retry delay (default300)FETCH_RETRY_JITTER_SECONDS: random jitter seconds (default2)LLM_MAX_RETRIES: max retry attempts for LLM calls (default6)LLM_RETRY_BASE_DELAY_SECONDS: base delay for LLM retries (default5)LLM_RETRY_MAX_DELAY_SECONDS: max delay for LLM retries (default120)LLM_RETRY_JITTER_SECONDS: random jitter for LLM retries (default1)LLM_PROVIDER: provider switch (openai/gemini/ollama)REPORT_LANGUAGE: report and Telegram language (zh/en)PAGES_SITE_BASE_URL: site root used to append daily case webpage links to Telegram cardsPAGES_DEFAULT_LANGUAGE: default landing language for GitHub Pages (zh/en)PAGES_CASE_RETENTION_DAYS: GitHub Pages keeps latest N calendar days in cases (3by default)GEMINI_MODEL: Gemini model name (defaultgemini-2.0-flash)OLLAMA_MODEL: Ollama model name (defaultqwen2.5:7b)OLLAMA_BASE_URL: Ollama base URL (defaulthttp://127.0.0.1:11434)MARKET_INDEX_FETCH_ENABLED: enable benchmark index fetch (defaulttrue)
9.7 LLM Reliability Hardening
Prompt and parser now enforce:
- no fabricated metrics beyond provided inputs,
- explicit evidence references (N1/N2...) or clear evidence shortage,
- confidence score (0-100) and reliability notes,
- concrete, actionable risk points instead of generic statements.
10. Troubleshooting
10.1 Data fetch fails for all symbols
Check:
- Network access
- Symbol format
- Date validity around market trading days
10.2 Empty predictions
Check:
- Insufficient lookback history
- Feature columns full of NaN
- predict_frame.csv in outputs/debug/...
10.3 Telegram send errors
Check:
- bot token/chat id correctness
- bot permissions in target group
- topic id if sending to a thread
10.4 LLM errors
Check:
- LLM_PROVIDER is one of openai/gemini/ollama
- For openai: verify OPENAI_API_KEY and OPENAI_BASE_URL
- For gemini: verify GEMINI_API_KEY and GEMINI_BASE_URL
- For ollama: verify OLLAMA_BASE_URL reachability and model pull status
- model availability
10.5 News is empty
- Tavily failure automatically falls back to Brave
- If both fail, report still runs with empty news evidence
11. Operational Advice
- Keep weekly retraining enabled
- Avoid frequent large universe changes
- Run
--no-telegramsmoke tests before production - Keep
outputsandmodelsartifacts for audit/replay - Investigate
failed_listwhenstatus=partial