Icon
Back to home page

A Weekly AI Research Trend Alerter Powered by Agentic RAG

A 46-node n8n workflow that weekly fetches arXiv AI papers, classifies them with an LLM, stores them in Weaviate, and uses agentic RAG to email a trend digest.

Case Study Image

The brief: a research scout that reads the week's AI papers so you don't have to.

The goal was a hands-off system that, once a week, surveys the latest AI and machine-learning research on arXiv and delivers a short, approachable email summarising what's actually trending — not a raw list of papers, but a synthesised view of the dominant themes and the work most worth knowing about. We built this as a 46-node n8n workflow in two clean halves: an ingestion-and-enrichment pipeline that builds a searchable knowledge base, and an agentic-RAG analysis layer that turns that knowledge base into a readable digest.

Part 1 — Fetch, clean, enrich, and store. A weekly schedule trigger kicks things off by computing a rolling one-week date window. The workflow then queries the free arXiv API for recent machine-learning abstracts (the cs.LG and stat.ML categories), sorted newest-first, capped at a configurable maximum (200 by default). arXiv returns XML, so the data is converted to JSON, split out by article, and cleaned — multi-value fields like authors and categories are normalised into arrays and the publication date is properly formatted — before duplicates are removed.

The cleaned abstracts then pass through an LLM enrichment agent (Claude 3.7 Sonnet via OpenRouter) that does something a raw feed never could: it reads each paper's title and abstract and classifies it against a curated taxonomy of AI topics — Foundation Models, LLM Fine-tuning, PEFT, RAG, Model Quantization, Agentic AI, and more — assigning a single primary category, up to two secondary categories, and a 1-to-5 potential-impact score that flags everything from incremental work to potential paradigm shifts. This structured metadata is what later makes trend analysis possible.

Each enriched record is then embedded (using OpenAI embeddings, with abstracts chunked so each paper becomes a single clean chunk) and upserted into a Weaviate vector collection. A deliberate verification step follows: the workflow aggregates the list of arXiv IDs that were just uploaded and generates a static session ID, confirming the week's articles are safely in Weaviate before any analysis runs. This guards against the analysis agent operating on incomplete data.

Part 2 — Agentic RAG trend analysis and alerting. The second half configures an AI agent with Weaviate attached as a retrieval tool. Rather than being handed the papers directly, the agent actively queries the vector store: it first runs an aggregate query to count articles by primary topic — using publication volume as a signal of what's hot that week — then performs vector searches to pull the most relevant and highest-impact papers. From there it groups work by topic, prioritises trends backed by more papers or higher impact and quality scores, and selects one or two representative papers to cite for each trend. The agent is held to strict grounding rules: query the store, retry until data is retrieved, never rely on memory, and never hallucinate.

The agent outputs a clean, structured JSON object containing an email subject line and body. A short post-processing chain converts the body from Markdown to HTML (restoring paragraph breaks and auto-linking arXiv URLs so every cited paper is clickable), and the finished digest is sent via SMTP email.

Why this design works. The two-stage split is what makes it both reliable and genuinely insightful. Part 1 does the heavy, deterministic work of building a trustworthy, enriched knowledge base — and verifies it before proceeding — while Part 2 lets an agent reason over that structured data rather than over a noisy raw feed. Using publication counts plus impact scores as trend signals means the summary reflects what the field is actually focused on, not just whatever the agent saw first. Because arXiv's API is free and the topic taxonomy, paper limit, and embedding model are all configurable, the system is cheap to run and easy to retarget at other research categories. The result is a self-maintaining research scout that quietly does a week's worth of reading and lands a concise, cited, easy-to-read digest in your inbox.

LLM-enriched research taxonomy
Agentic RAG over a Weaviate vector store
Verified ingest before analysis
Call Icon
+1-888-669-1935