LLM Stock Report · GitHub Pages

Detailed docs + daily case updates

llm_stock_report Full Guide (English)

This guide covers end-to-end usage: local runs, GitHub Actions automation, configuration, model/data behavior, and troubleshooting.

For a step-by-step GitHub Actions setup walkthrough:
- docs/github-actions-setup_EN.md

1. Scope

llm_stock_report is a research/reporting pipeline that:
- retrains CN/US/HK models weekly,
- runs next-day prediction daily,
- combines news evidence and LLM reasoning,
- sends summary + detailed messages to Telegram.

It is not an auto-trading system.

2. Runtime Pipeline

A daily report run (run_report) executes:
1. Load symbol universe from config/universe.yaml
2. Fetch historical market data
- use local cache first and fetch only missing ranges
- apply exponential retry on fetch failures
3. Build technical features and next_day_return
4. Load latest model (auto-retrain if missing/expired)
5. Predict and rank symbols
6. Fetch news (Tavily primary, Brave fallback)
7. Generate Chinese narratives via OpenAI
8. Render outputs and send Telegram messages

A retrain run (run_retrain) executes:
- fetch data -> build features -> train -> save model.

3. Project Layout

app/
  common/      config, logging, schemas
  data/        data fetchers and symbol normalization
  features/    technical indicators
  model/       frame builder, trainer, predictor, registry
  news/        Tavily/Brave search + fallback
  llm/         OpenAI client + prompts + reasoner
  report/      markdown rendering + Telegram sender
  jobs/        CLI entrypoints

config/
  universe.yaml
  report.yaml

outputs/{market}/{date}/
  summary.md
  details.md
  predictions.csv
  run_meta.json

models/{market}/{model_version}/
  model.pkl
  metadata.json

4. Environment Setup

4.1 Python

4.2 Install

python -m pip install --upgrade pip
python -m pip install -e '.[dev]'

If you also need the optional pyqlib dependency set, run:

python -m pip install -e '.[qlib]'

4.2.1 Test Entry

The official test command is:

python -m pytest

4.3 Environment Variables

cp .env.example .env

Fill required values:
- TAVILY_API_KEY
- BRAVE_API_KEY
- TELEGRAM_BOT_TOKEN
- TELEGRAM_CHAT_ID

Configure at least one LLM path:
- LLM_PROVIDER=openai + OPENAI_API_KEY
- LLM_PROVIDER=gemini + GEMINI_API_KEY
- LLM_PROVIDER=ollama + OLLAMA_BASE_URL + OLLAMA_MODEL

Optional:
- TELEGRAM_MESSAGE_THREAD_ID
- OPENAI_BASE_URL
- OPENAI_MODEL
- GEMINI_MODEL
- GEMINI_BASE_URL
- OLLAMA_MODEL
- OLLAMA_BASE_URL
- LLM_PROVIDER
- REPORT_LANGUAGE (zh / en, default zh)
- PAGES_SITE_BASE_URL (used to append Pages links at the end of Telegram cards)
- PAGES_DEFAULT_LANGUAGE (zh / en, default zh)
- PAGES_CASE_RETENTION_DAYS (Pages case retention days, default 3)
- MAX_STOCKS_PER_RUN
- DETAIL_MESSAGE_CHAR_LIMIT
- MODEL_EXPIRE_DAYS
- DAILY_ANALYSIS_LOOKBACK_DAYS (daily reasoning lookback window, default 30)
- MARKET_INDEX_FETCH_ENABLED (default true)
- STOCK_LIST_CN / STOCK_LIST_US / STOCK_LIST_HK
- LLM_MAX_RETRIES
- LLM_RETRY_BASE_DELAY_SECONDS
- LLM_RETRY_MAX_DELAY_SECONDS
- LLM_RETRY_JITTER_SECONDS

5. Universe Configuration

Edit config/universe.yaml:

cn:
  - SH600519
  - SZ000001
  - SZ300750
us:
  - AAPL
  - MSFT
  - NVDA
hk:
  - HK00700
  - HK03690
  - HK09988

Notes:
- CN accepts SHxxxxxx / SZxxxxxx or plain 6-digit symbols (normalized internally).
- US uses normal tickers.
- HK accepts HK00700, 00700, 0700, or 700 (normalized to HK00700).
- Per-run symbol count is capped by MAX_STOCKS_PER_RUN (default 30).

6. Local Commands

6.1 Retrain

python -m app.jobs.run_retrain --market cn --date 2026-03-04
python -m app.jobs.run_retrain --market us --date 2026-03-04
python -m app.jobs.run_retrain --market hk --date 2026-03-04

Note: retraining checks local cache under qlib_data/history/, incrementally tops up missing history, and prunes stale rows outside retention.

6.2 Daily Report

python -m app.jobs.run_report --market cn --date 2026-03-04
python -m app.jobs.run_report --market us --date 2026-03-04
python -m app.jobs.run_report --market hk --date 2026-03-04

Run without Telegram sending:

python -m app.jobs.run_report --market cn --date 2026-03-04 --no-telegram

7. Output Contract

predictions.csv columns:
- market
- symbol
- asof_date
- score
- rank
- side
- pred_return
- model_version
- data_window_start
- data_window_end

run_meta.json keys:
- run_id
- market
- status
- total_symbols
- success_symbols
- failed_symbols
- failed_list
- model_version
- llm_model
- search_provider_primary
- search_provider_fallback
- start_time
- end_time
- model_engine
- model_fallback_used
- model_warning

8. Telegram Protocol

Send order:
1. Summary message ([CN] YYYY-MM-DD 日报摘要)
2. Per-symbol detail chunks ([CN][symbol][i/n])
3. Market overview message ([CN][MARKET][1/1])

Behavior:
- Max message length default: 3500 chars
- Long content is force-chunked with (i/n) markers
- Local files stay in Markdown; Telegram messages are converted into Telegram-safe HTML formatting before send
- Titles, emphasis, and news links are rendered in a Telegram-compatible format instead of exposing raw Markdown
- Telegram copy is also condensed into short "card" style blurbs instead of long reasoning paragraphs
- If PAGES_SITE_BASE_URL is configured, each card footer includes the daily case webpage link

9. GitHub Actions

9.1 Workflow files

9.2 Schedules (UTC)

9.2.1 GitHub Pages notes

9.3 GitHub Secrets

Required:
- TAVILY_API_KEY
- BRAVE_API_KEY
- TELEGRAM_BOT_TOKEN
- TELEGRAM_CHAT_ID

Optional:
- OPENAI_API_KEY (required when LLM_PROVIDER=openai)
- GEMINI_API_KEY (required when LLM_PROVIDER=gemini)
- OLLAMA_API_KEY (only for protected remote Ollama)
- TELEGRAM_MESSAGE_THREAD_ID

9.4 GitHub Variables (optional)

9.5 Artifacts

Each workflow uploads:
- outputs/**
- models/**
Retention: 14 days.

9.6 Training Window and Retry Knobs

9.7 LLM Reliability Hardening

Prompt and parser now enforce:
- no fabricated metrics beyond provided inputs,
- explicit evidence references (N1/N2...) or clear evidence shortage,
- confidence score (0-100) and reliability notes,
- concrete, actionable risk points instead of generic statements.

10. Troubleshooting

10.1 Data fetch fails for all symbols

Check:
- Network access
- Symbol format
- Date validity around market trading days

10.2 Empty predictions

Check:
- Insufficient lookback history
- Feature columns full of NaN
- predict_frame.csv in outputs/debug/...

10.3 Telegram send errors

Check:
- bot token/chat id correctness
- bot permissions in target group
- topic id if sending to a thread

10.4 LLM errors

Check:
- LLM_PROVIDER is one of openai/gemini/ollama
- For openai: verify OPENAI_API_KEY and OPENAI_BASE_URL
- For gemini: verify GEMINI_API_KEY and GEMINI_BASE_URL
- For ollama: verify OLLAMA_BASE_URL reachability and model pull status
- model availability

10.5 News is empty

11. Operational Advice