Git is usually the first place developers look when they need to understand a system. That makes sense. The code is there. The commit history is there. The pull requests are there. If you are lucky, the README is not lying too badly.
But Git is only one layer of the developer experience.
The real answer to "how does this service work?" may be spread across a service catalog, TechDocs, Terraform, Kubernetes manifests, CI runs, deployment events, incident tickets, on-call schedules, Slack threads, dashboards, and a handful of tribal conventions that have somehow survived three reorganizations.
Internal developer portals are supposed to pull that mess together. Backstage, OpsLevel, Port, homegrown service catalogs, and platform dashboards all try to answer the same basic question: "Where is the information a developer needs to ship and operate this thing?"
LLMs can help, but only if we use them as a language layer over real metadata. If the portal becomes a chatbot that guesses from stale docs, we have not solved developer productivity. We have built a more confident version of search.
The Portal Is Not The Product
A common platform engineering mistake is treating the portal itself as the product. The real product is the developer workflow the portal improves.
Developers want to answer questions like:
- Who owns this service?
- Where is the runbook?
- What changed before this incident?
- Which repo contains the deployment config?
- What dashboard should I check first?
- What API version is this consumer using?
- Is this service production, experimental, deprecated, or abandoned?
Those are workflow questions. Some require search. Some require structured metadata. Some require live operational data. Some require judgment.
An LLM-powered portal should make those questions easier to answer. It should not be a novelty interface that sits beside the same stale catalog.
Start With Metadata Quality
LLMs expose metadata quality problems quickly.
If your service catalog has missing owners, stale repository links, inconsistent names, and runbooks last updated before half the team joined, an AI assistant will not fix that. It will either refuse to answer, which is honest but disappointing, or it will invent the missing connective tissue, which is worse.
Before building the assistant, inspect the metadata:
- Are service owners current?
- Are lifecycle states meaningful?
- Are repository annotations consistent?
- Are docs linked from the catalog?
- Are runbooks discoverable?
- Are deployment systems connected to services?
- Are incident records tied back to services?
- Are API relationships represented anywhere?
The first win may not be the LLM at all. It may be cleaning up ownership and linking the catalog to the systems people already use.
That is not glamorous work. It is also exactly the work that makes the AI layer useful.
Use The Right Retrieval Mode
Do not shove everything into a vector database and call it architecture.
Different developer questions need different retrieval strategies:
| Question | Better Source |
|---|---|
| Who owns this service? | Service catalog |
| Where is the runbook? | Catalog link or docs search |
| What changed recently? | Deployment system, GitHub, CI/CD |
| Who is on call? | PagerDuty, Opsgenie, or calendar system |
| What does this runbook say? | Vector search over docs |
| Which services depend on this API? | Catalog relationships or dependency graph |
| Why did this incident happen? | Incident review plus deployment history |
Vector search is useful for fuzzy, long-form content: runbooks, READMEs, architecture decision records, incident summaries, and docs. Structured APIs are better for exact facts. Live APIs are better for volatile state.
The right architecture combines them.
A Practical Architecture
An LLM-powered internal developer portal usually needs five pieces:
- Portal UI: Backstage plugin, Port page, OpsLevel extension, Slack command, or internal web UI.
- Query backend: receives the question, user identity, and current context.
- Retrieval layer: searches catalog data, docs, vector stores, and live operational APIs.
- Answer layer: builds a constrained prompt, calls the model, and formats the answer.
- Evaluation and observability: logs retrieval inputs, answer quality, latency, confidence, source usage, and user feedback.
Keep the UI thin. The portal should not assemble prompts or decide which systems to query. That belongs in a backend service where you can test it, secure it, and change it without rebuilding every front end.
Context Beats Chat
The most useful AI portal experiences are context-aware.
If a developer is already on the catalog page for checkout-service, the
assistant should know that. The question "who owns this?" is trivial when the
entity reference is known. The question "what changed recently?" can start from
the service's repository, deployment annotations, and owning team.
That is better than a global chatbot that treats every question as a cold start.
Useful context includes:
- Current catalog entity.
- User identity and permissions.
- Current page or route.
- Linked repository.
- Owning team.
- Related APIs and resources.
- Recent deployment or incident links.
The assistant should use the portal context as a retrieval filter, not just as decorative prompt text.
Answers Need Sources
If an internal assistant answers an operational question without sources, the answer is not done.
A good response should include:
- The direct answer.
- The source document, entity, API, or event.
- Freshness, when relevant.
- Confidence level.
- A suggested next step.
For example:
checkout-service is owned by team-payments.
Sources:
- Backstage catalog entity: component:default/checkout-service
- GitHub repository annotation: example/checkout-service
- PagerDuty annotation: payments-primary
Confidence: high
Next step: open the service runbook.
That answer is reviewable. A developer can click through and verify it.
This also protects the portal team. When the assistant gives a bad answer, you need to know whether the model reasoned poorly, retrieval returned bad context, or the underlying metadata was wrong.
Permissions Are The Hard Part
Internal developer portals often sit near sensitive information:
- Private repositories.
- Incident timelines.
- Deployment history.
- Architecture docs.
- Ownership and escalation paths.
- Security runbooks.
- Infrastructure paths.
If your assistant indexes all of that and ignores permissions, it becomes a leakage system.
The permission model needs to exist at retrieval time, not just in the UI. Do not retrieve documents the user cannot access and then hope the model will avoid mentioning them. Filter first. Prompt second.
Practical requirements:
- Store source identifiers with indexed chunks.
- Preserve ACL or ownership metadata.
- Filter retrieval by user permission.
- Avoid indexing secrets and raw sensitive logs.
- Log sensitive queries carefully.
- Respect upstream system authorization.
If a developer cannot open the source, the assistant should not summarize the source.
Freshness Is A Product Feature
Developer metadata has different shelf lives.
A README might be useful for months. A runbook might be useful until the next architecture change. Current on-call is useful only if it is current. Deployment state may be stale after an hour. Incident context can change while the incident is active.
Use the right source for the freshness requirement:
- Catalog facts can be fetched from the catalog API.
- Docs can be indexed on CI or a schedule.
- Current on-call should come from the on-call system.
- Recent deployments should come from CI/CD or deployment tooling.
- Incident state should come from the incident system.
The answer should expose freshness when it matters:
Deployment data fetched from Argo CD at 2026-06-09 18:42 UTC.
That is not busywork. It lets the reader decide how much to trust the answer.
Prompting Should Be Constrained
An internal developer assistant should not be creative with facts.
The prompt should say things like:
Answer only from retrieved context and tool results.
If the answer is missing, say you do not know.
Do not invent owners, repositories, runbooks, deployment times, on-call
rotations, dashboards, or infrastructure paths.
Always include sources.
This is not enough by itself, but it is still worth doing. A vague prompt invites vague behavior. A constrained prompt makes the expected failure mode clear.
For deeper prompt guidance, see How to Write Secure Prompts for AI-Driven Developer Workflows.
Evaluation Comes Before Rollout
The portal team should treat the assistant like developer tooling, not like a content experiment.
Before launch, build a small evaluation set:
- Known ownership questions.
- Known runbook lookup questions.
- Questions that should require live data.
- Ambiguous service names.
- Fake services that should return "I do not know."
- Permission-bound documents.
For each question, define:
- Expected answer.
- Required source.
- Allowed confidence.
- Whether refusal is correct.
Run this set whenever you change prompts, retrieval logic, embeddings, models, or data sources. If the assistant gets smoother but less accurate, that is a regression.
This is also where feedback loops matter. Add "helpful / not helpful" feedback, but do not rely on that alone. Developers are busy. Silent failure is common.
Where Backstage Fits
Backstage is a natural place to start because it already has the right shape: catalog entities, TechDocs, search, plugins, ownership, and relationships. A Backstage AI assistant can start with entity-aware Q&A and expand from there.
If you are specifically working in Backstage, read Bringing AI to Backstage: Building an LLM-Powered Developer Portal. That article goes deeper on the Backstage-specific architecture.
But the broader pattern applies beyond Backstage:
- OpsLevel can provide service maturity and ownership data.
- Port can model developer workflows and scorecards.
- A homegrown portal can expose internal metadata directly.
- Slack can be a lightweight query interface.
- A CLI can support engineers who live in terminals.
The portal surface matters less than the metadata quality, permissions, retrieval strategy, and evaluation discipline.
Build vs. Buy
The build-versus-buy decision depends on how unique your engineering environment is.
Buy or extend a product when:
- Your needs are mostly service catalog, docs, and basic ownership lookup.
- Your data sources are standard and well supported.
- Your platform team is small.
- You need something useful quickly.
- You can accept vendor constraints around models, indexing, and permissions.
Build when:
- You have unusual internal systems.
- Permission boundaries are complex.
- Developer workflows are tightly integrated with custom tooling.
- You need control over retrieval, logging, evaluation, and prompts.
- Platform engineering can support the system long term.
Do not build because AI demos are fun. Build because the workflow is important enough to own.
A Sensible Rollout
I would roll this out in phases:
- Read-only service Q&A: ownership, docs, links, lifecycle, related systems.
- Docs and runbook Q&A: semantic retrieval with citations.
- Operational lookup: current deployments, on-call, dashboards, incidents.
- Workflow suggestions: "open runbook," "file catalog fix," "create ticket."
- Carefully governed actions: only after trust, permissions, and audit logs are boring.
Start where the blast radius is low. Read-only answers are valuable and much easier to govern than write actions.
What Success Looks Like
A good LLM-powered developer portal does not make engineers say, "Wow, AI."
It makes them say:
- "I found the owner without asking Slack."
- "I got to the right runbook faster."
- "The portal told me the data was stale."
- "The assistant linked the source, so I trusted it."
- "The platform team found broken catalog metadata because the assistant could not answer basic questions."
That last one is underrated. A good assistant will expose bad metadata. That is not failure. That is a roadmap.
Conclusion
LLMs can make internal developer portals more useful, but only when they are grounded in real engineering metadata and constrained by the same operational discipline we expect from other platform tools.
Git gives you code and history. A developer portal should connect that code to ownership, docs, infrastructure, deployments, incidents, and support paths. An LLM can make that connected metadata conversational, but it cannot make stale, missing, or unauthorized data safe by wishing.
Start with the questions developers already ask. Clean up the metadata. Use structured APIs for facts, semantic retrieval for docs, live APIs for volatile state, and sources for every answer. Then evaluate the system like something people will depend on.
Because if it works, they will.
For more practical engineering and developer tooling notes, visit Slaptijack.