Bringing AI to Backstage: Building an LLM-Powered Developer Portal

Posted on in Technology Management / Leadership

Backstage is already where many platform teams want developers to go for service ownership, docs, APIs, runbooks, and operational metadata. The problem is that developers do not always want to navigate a portal. Sometimes they just want to ask a question:

  • "Who owns checkout-service?"
  • "Where is the runbook for restarting Kafka?"
  • "What changed before last night's payments incident?"
  • "Which services still depend on the old Redis cluster?"
  • "Where is the Terraform for staging RDS?"

That is the useful version of "AI in Backstage." Not a chatbot bolted onto the corner of the page. Not a demo that summarizes whatever text happens to be near the cursor. A useful Backstage AI assistant should sit on top of the catalog, TechDocs, search, deployment metadata, and ownership model that Backstage already tries to organize.

The hard part is not calling an LLM. The hard part is grounding the answer in fresh, permission-aware engineering metadata and showing the developer where the answer came from.

Start With The Backstage Data Model

Backstage is valuable because it gives you a structured model for software ownership. The Software Catalog can represent systems, components, APIs, resources, users, groups, and relationships. The catalog backend exposes a JSON REST API, and catalog entity descriptor files are YAML but map to the same shape when returned through the API.

That matters for AI integration because you should not treat Backstage like a pile of pages to scrape. Treat it like a structured metadata system.

A typical Component entity might include:

apiVersion: backstage.io/v1alpha1
kind: Component
metadata:
  name: checkout-service
  description: Handles checkout and payment authorization.
  annotations:
    github.com/project-slug: example/checkout-service
    pagerduty.com/service-id: P123ABC
spec:
  type: service
  lifecycle: production
  owner: team-payments
  system: commerce

That gives you several useful retrieval hooks:

  • Entity name.
  • Owner.
  • System.
  • Lifecycle.
  • Repository annotation.
  • PagerDuty annotation.
  • Description.
  • Entity relationships.

The LLM should not invent this data. It should retrieve it, summarize it, and cite it.

What The Assistant Should Answer

Do not start with "chat with the portal." That is too vague.

Start with specific developer questions:

  • Ownership: "Who owns this service?"
  • Docs: "Where is the runbook?"
  • Deployment: "What changed recently?"
  • Infrastructure: "Where is the Terraform?"
  • Dependencies: "What depends on this API?"
  • Operations: "Who is on call?"
  • Discovery: "Which services are related to checkout?"

These questions naturally map to different data sources. Some are catalog questions. Some are search questions. Some require API calls to GitHub, Argo CD, PagerDuty, CI, or incident tooling. Some should not go through vector search at all.

That is an important design point. A Backstage AI assistant should use retrieval and tools, not just embeddings.

Reference Architecture

I would split the system into five layers:

  1. Backstage UI plugin: chat or query interface inside the portal.
  2. AI backend service: handles prompts, retrieval, authorization, and model calls.
  3. Metadata connectors: catalog, TechDocs, search, deployment systems, incident tools, GitHub, and on-call systems.
  4. Retrieval stores: vector index for docs and fuzzy search, plus structured stores for exact facts.
  5. Observability and evaluation: logs, traces, feedback, test questions, and answer-quality checks.

This separation keeps the Backstage plugin thin. That is usually the right instinct. The UI should not know how to assemble prompts, manage embeddings, apply permissions, or decide whether a deployment answer came from Argo CD or GitHub Actions.

Use Backstage Search Before Inventing A New Search System

Backstage already has a Search feature. It integrates with the Software Catalog and TechDocs, and it is meant to provide extensible search across the Backstage ecosystem.

That does not make it a complete LLM retrieval system, but it is a good starting point. If Backstage Search can already find a catalog entity or TechDocs page, your AI layer should consider using those search results before duplicating the entire indexing pipeline.

The practical architecture is often hybrid:

  • Use Backstage Catalog APIs for exact entity facts.
  • Use Backstage Search for existing portal search results.
  • Use a vector index for semantic retrieval over long docs, runbooks, and postmortems.
  • Use live API calls for volatile state such as deployment status or current on-call.

This is less elegant than "put everything in a vector database," but it is much more likely to be correct.

Index The Right Things

Not every piece of Backstage data belongs in a vector store.

Good vector candidates:

  • TechDocs pages.
  • Runbooks.
  • Service READMEs.
  • Architecture decision records.
  • Incident summaries.
  • Operational guides.
  • Human-readable catalog descriptions.

Poor vector candidates:

  • Current on-call.
  • Current deployment state.
  • Secret-bearing logs.
  • Exact dependency graph queries.
  • Access-controlled documents without permission metadata.
  • Anything that must be correct to the minute.

For exact facts, use structured APIs. For fuzzy discovery, use semantic search. For answers that combine both, retrieve from both and make the answer show its sources.

Extracting Catalog Context

The catalog API is the most obvious starting point. A simple prototype can pull entities from the catalog backend:

curl http://localhost:7007/api/catalog/entities | jq

For each entity, build an internal representation that preserves both readable text and structured metadata:

{
  "kind": "Component",
  "name": "checkout-service",
  "owner": "team-payments",
  "system": "commerce",
  "lifecycle": "production",
  "repo": "example/checkout-service",
  "pagerduty": "P123ABC",
  "description": "Handles checkout and payment authorization.",
  "source": "backstage-catalog"
}

The readable version is useful for embeddings. The structured fields are useful for citations, permissions, filters, and exact answers.

Keep Prompting Boring

The prompt should make the assistant less creative, not more.

Example:

You are an internal developer portal assistant.

Answer using only the provided context and tool results.
If the answer is not present, say that you do not know.
Never invent owners, repositories, deployment times, on-call rotations,
infrastructure paths, or runbook URLs.

Return:
- answer
- confidence: high | medium | low
- sources
- suggested next step

This is not glamorous. It is the point.

For more detail on prompt boundaries, see How to Write Secure Prompts for AI-Driven Developer Workflows.

Build The Backstage Plugin As A Thin Client

Backstage frontend plugins can provide the UI for the assistant. The plugin should send the developer's question and current context to an internal backend:

  • Current entity reference, if the developer is on a service page.
  • User identity or token context.
  • Question text.
  • Optional conversation ID.

The backend should return:

  • Answer.
  • Source links.
  • Confidence.
  • Follow-up actions.
  • Error or "not enough information" state.

The plugin should not hide uncertainty. If the assistant only found a runbook from 2023 or a catalog entity with no owner, show that. A polished wrong answer is worse than an honest incomplete one.

Entity-Aware Questions Are The First Win

The easiest useful UI is not a global chatbot. It is an entity-aware assistant on catalog pages.

If the developer is looking at checkout-service, the assistant already knows:

  • The entity ref.
  • The owner.
  • The system.
  • The annotations.
  • The TechDocs link.
  • The related APIs and resources.

That context makes questions better:

What changed recently?
Where is the runbook?
Who is on call?
What dashboards should I check?
Where is the deployment config?

Starting on entity pages also reduces ambiguity. "Who owns this?" is answerable when "this" is a catalog entity. In a global search box, the assistant has to guess.

Permissions Are Not Optional

This is where many prototypes get dangerous.

Backstage often centralizes metadata that points at private systems: repos, deployment records, incidents, runbooks, dashboards, on-call rotations, and internal docs. An AI assistant can accidentally become a permission bypass if you index everything into one store and answer every user from the same context.

At minimum:

  • Store source identifiers and permission metadata with indexed documents.
  • Filter retrieval results based on the requesting user.
  • Avoid indexing secrets and sensitive logs.
  • Do not leak private document snippets through summaries.
  • Keep audit logs for sensitive queries.
  • Respect the access model of upstream systems.

If a user cannot open the source document, the assistant should not summarize it for them.

Freshness Matters More Than Embedding Cleverness

Embedding stale data beautifully does not make it true.

Backstage catalog data may be stable enough to index periodically. TechDocs may be fine on a CI-driven refresh. Deployment status, incident state, and on-call rotation should usually be fetched live.

Think about freshness by data type:

Data Type Suggested Approach
Catalog ownership Catalog API plus periodic indexing
TechDocs/runbooks Search/vector index refreshed by CI or schedule
Current on-call Live PagerDuty/Opsgenie API call
Recent deployment Live CI/CD or deployment API call
Incident status Live incident-management API call
Architecture docs Vector index with source links

The answer should also expose freshness:

Source: Backstage catalog, fetched 2026-06-09 14:05 UTC

That kind of detail is not noise when the answer may affect production.

Evaluation: Test The Assistant Like A Developer Tool

If you put this in front of engineers, they will trust it faster than they should. That means you need evaluation before launch.

Create a small question set:

  • "Who owns checkout-service?"
  • "Where is checkout-service's runbook?"
  • "Which service owns the payments API?"
  • "What changed before incident INC-123?"
  • "Who owns a fake service that does not exist?"

For each question, record:

  • Expected answer.
  • Required source.
  • Whether live data is required.
  • Whether the assistant should refuse or say it does not know.

Run this set whenever you change the prompt, retrieval settings, model, or data sources. If the assistant becomes more fluent and less accurate, roll it back.

For a more implementation-oriented walkthrough, see Building a Full-Stack LangChain Prototype for Natural Language Developer Queries.

Build vs. Buy

You do not have to build all of this yourself.

Commercial developer portal vendors and AI documentation tools are moving in this direction. Backstage service providers may also offer hosted features that solve parts of the problem. The build-versus-buy question depends on where your metadata lives and how custom your workflow is.

Build when:

  • Backstage is already central to your platform strategy.
  • You have custom internal systems the assistant must understand.
  • Permission boundaries are complicated.
  • You need tight integration with internal workflows.
  • You have platform engineering capacity to maintain it.

Buy when:

  • Your needs are mostly documentation search and summaries.
  • You do not have the team to maintain retrieval infrastructure.
  • Your metadata is already in a supported SaaS ecosystem.
  • You need something useful quickly and can live with vendor constraints.

The wrong answer is building a fragile prototype and pretending it is a platform.

A Practical Rollout Plan

I would roll this out in phases:

  1. Entity-page assistant for ownership, docs, and related links.
  2. TechDocs Q&A with citations and explicit stale-doc warnings.
  3. Live operational lookups for deployment and on-call.
  4. Slack or CLI integration backed by the same service.
  5. Action suggestions such as "open runbook" or "file catalog fix," not autonomous production changes.

Do not start with write actions. Reading and explaining metadata is already a large enough trust problem. Let the system earn confidence before it can mutate anything.

Conclusion

Bringing AI to Backstage is not about making the portal feel trendy. It is about reducing the friction between a developer's question and the metadata your organization already has.

The useful architecture is grounded: catalog APIs for exact facts, TechDocs and Search for discoverability, vector retrieval for long-form docs, live APIs for volatile state, and a thin Backstage plugin that makes the workflow feel native.

If the assistant can answer "who owns this?", "where is the runbook?", and "what changed recently?" with sources and appropriate uncertainty, it will earn its place. If it guesses, hides stale context, or leaks information across permission boundaries, it will become another platform toy that engineers learn to ignore.

Start small. Keep sources visible. Make uncertainty acceptable. Treat the AI assistant like production developer tooling, because that is what it becomes the moment people depend on it.

For more practical engineering and developer tooling notes, visit Slaptijack.

Slaptijack's Koding Kraken