About the Role Mantis Analytics is an AI-powered threat intelligence and simulation platform that helps enterprises and public-sector organizations detect, analyze, and forecast real-world risks affecting operations and supply chains.
You’ll be in charge of the LLM application layer across our projects — the retrieval, agent design, and document-processing work that turns models into products. Today this includes a copilot (RAG + agentic + document processing), the agentic representation layer in our topic-modelling pipeline, and an agentic fallback for news scraping. Soon it will extend into geopolitical forecasting and a custom LLM analytics workspace.
You won’t be pre-training or fine-tuning models — deep-ML and LLMOps are covered in-house — but you do need to understand how LLMs and embeddings work well enough to design systems around their real behavior, debug them, and make retrieval and agents genuinely reliable. This is a build-heavy role with direct mentorship on the model-internals side and room to grow fast. Responsibilities * Build and improve RAG systems: chunking, embedding, retrieval quality, re-ranking, context-window optimization, grounding, and hallucination reduction. * Design agentic workflows — tool use, multi-step reasoning, retrieval-augmented agents, and agentic fallbacks when classical methods fail (e.g. scraping). * Own document-processing pipelines: ingestion, parsing, structuring, and turning unstructured sources into queryable knowledge. * Build the agentic representation layer for topic modelling — LLM-driven labelling and summarization over embedding-based clusters. * Help stand up the geopolitical analytics assistant: a custom LLM workspace over a curated corpus. * Integrate models via OpenAI-compatible APIs across hosted and self-hosted endpoints (with serving/ops support available in-house). * Own retrieval and agent quality: measure grounding, tool correctness, and hallucination; debug failure modes (bad retrievals, tool-call errors, loops, mode collapse, format mismatches).
What You Need * A real working understanding of how LLMs and embeddings work — enough to reason about why retrieval missed or why an agent failed, not just retry. * Hands-on experience building something non-trivial with LLMs: a RAG system, an agent with tool use, or a document-processing pipeline. * Strong Python; comfortable with async patterns. * Practical grounding in retrieval and embeddings: vector search, similarity, chunking tradeoffs, metadata-based retrieval. * A verification mindset — you check exact behavior and measure quality rather than eyeballing it.
Nice to Have * Experience with agent frameworks (LangChain, LlamaIndex, or similar), MCP, or self-extending / skill-based agents. * Prompt-engineering depth: structured / JSON output, few-shot, reasoning control. * Familiarity with self-hosted serving (vLLM) — useful but not required; the ops layer is handled in-house. * Exposure to clustering / topic modelling or working with embeddings at scale. * Familiarity with LLM observability and cost-performance tradeoffs.
What You’ll Learn Here Direct mentorship on model internals, post-training, and evaluation from someone closing that gap deliberately. You’ll move from using LLMs to deeply understanding them — and grow into harder forecasting and analytics problems as those projects come online.