AI Engineer

Updated for 2026: AI Engineer interview questions and answers covering core skills, tools, and best practices for roles in the US, Europe & Canada.

20 Questions

hardai-engineer-rag

What is Retrieval-Augmented Generation (RAG) and how do you build it?

RAG combines retrieval (search) with generation (LLM) to ground answers in your data. Core steps: - Chunk documents and create embeddings - Store in a vector database - Retrieve top-k relevant chunks - Prompt the model with retrieved context Quality depends on chunking, retrieval, and evaluation—not just the LLM.

RAGLLMVector Search

mediumai-engineer-embeddings

What are embeddings and how do you use them for search and recommendations?

Embeddings are vector representations that capture semantic similarity. Use cases: - Semantic search - Clustering - Recommendations - Deduplication Key considerations: model choice, normalization, distance metric, and evaluation with real queries. Monitor drift and update embeddings when content changes.

EmbeddingsSearchAI

hardai-engineer-vector-database

How do vector databases work and what should you consider when choosing one?

Vector DBs store embeddings and support approximate nearest neighbor (ANN) search. Consider: - Index type and recall/latency trade-offs - Filtering + hybrid search (keyword + vector) - Update frequency and reindexing - Multi-tenant isolation and cost Choose based on workload: query volume, freshness needs, and filter complexity.

Vector DBSearchArchitecture

hardai-engineer-prompt-injection

What is prompt injection and how do you mitigate it in LLM applications?

Prompt injection is when untrusted input manipulates the model to ignore instructions or reveal secrets. Mitigations: - Treat all external text as untrusted - Separate system instructions from user content - Use allowlisted tools/actions - Output filtering + policy checks - Least-privilege tool permissions Test with red-team prompts and monitor for policy violations.

SecurityLLMSafety

hardai-engineer-evaluation

How do you evaluate LLM applications beyond simple accuracy?

LLM evaluation is multi-dimensional. Measure: - Factuality/grounding - Relevance and completeness - Toxicity/safety - Latency and cost - User satisfaction Use golden sets, human review, and automated checks. Track regressions when prompts/models change.

EvaluationLLMQuality

hardai-engineer-hallucinations

How do you reduce hallucinations in LLM-powered products?

Hallucinations happen when the model generates unsupported claims. Mitigations: - Use RAG with high-quality retrieval - Require citations from sources - Add refusal behavior when context is missing - Use constrained outputs (schemas) Also improve prompts and evaluate on failure cases. Never present answers as authoritative without grounding for high-stakes domains.

LLMRAGSafety

mediumai-engineer-structured-output

Why use structured outputs (JSON schemas) with LLMs and how do you implement them safely?

Structured outputs reduce parsing errors and enable reliable automation. Use: - JSON schema constraints - Post-parse validation - Retry-on-parse-failure with strict prompts Never let the model execute privileged actions directly—validate and authorize tool calls server-side.

LLMToolingReliability

mediumai-engineer-fine-tuning-vs-rag

Fine-tuning vs RAG: when should you use each for an AI product?

RAG is best for injecting up-to-date knowledge and citations. Fine-tuning is best for style, format, and domain behavior. Often you combine them: - Fine-tune for tone and instruction-following - RAG for factual, current content Choose based on latency, cost, update frequency, and evaluation results.

RAGFine-tuningLLM

hardai-engineer-safety-guardrails

What safety guardrails should AI engineers implement for user-facing assistants?

Guardrails reduce harmful outputs and unsafe actions. Include: - Content policy filters - Sensitive topic handling - Tool/action allowlists - Rate limiting and abuse detection - Logging + review workflows Design for least privilege and handle jailbreak attempts as a normal threat, not an edge case.

SafetySecurityLLM

mediumai-engineer-prompt-templates

How do you build maintainable prompt templates and avoid prompt spaghetti?

Treat prompts like code. Best practices: - Use versioned templates and small reusable components - Separate instructions, context, and output schema - Add tests with golden inputs - Track changes and regressions This makes prompt iterations auditable and reduces accidental behavior changes.

PromptingLLMEngineering

hardai-engineer-conversation-memory

How do you implement conversation memory without leaking sensitive data or growing costs?

Memory should be selective and privacy-safe. Approaches: - Summarize history - Store structured user preferences - Retrieve only relevant past context Avoid storing secrets, implement retention policies, and cap tokens. Use RAG-style retrieval for long-term memory instead of sending full history every time.

LLMPrivacyArchitecture

hardai-engineer-tool-calling

How do you design safe tool calling (function calling) in AI agents?

Tool calling must be constrained and authorized. Best practices: - Allowlist tools and validate arguments - Require confirmations for destructive actions - Enforce permissions server-side - Log tool calls for auditing Never let the model directly execute privileged actions without validation and policy checks.

AgentsSecurityLLM

mediumai-engineer-llm-caching

How do you cache LLM responses safely to reduce latency and cost?

Cache when outputs are deterministic enough. Techniques: - Cache embeddings and retrieval results - Cache prompt+context hashes - Use short TTLs for dynamic data Avoid caching sensitive content, and include model/version in cache keys to prevent mixing outputs across model changes.

PerformanceCostLLM

hardai-engineer-chunking-strategy

How do you choose chunking strategies for RAG (size, overlap, structure)?

Chunking quality strongly affects retrieval. Guidelines: - Chunk by structure (headings/sections) - Keep chunks small enough to be specific - Use overlap to preserve context - Store metadata (source, section) Evaluate retrieval with real queries and tune chunk size/overlap based on recall and answer quality.

RAGRetrievalLLM

mediumai-engineer-hybrid-search

What is hybrid search and when is it better than pure vector search?

Hybrid search combines keyword (BM25) and vector similarity. It’s better when: - Exact terms matter (IDs, error codes) - Queries are short or ambiguous - You need filtering and precision Hybrid approaches often outperform pure vector search for enterprise docs where terminology is important.

SearchVector SearchArchitecture

mediumai-engineer-dataset-curation

How do you curate datasets for evaluation and fine-tuning in AI products?

Dataset quality drives model behavior. Practices: - Define user intents and failure cases - Create balanced, labeled examples - Remove sensitive data - Version datasets and track provenance Use a golden set for regression testing and update it as product requirements evolve.

DataEvaluationLLM

hardai-engineer-privacy

How do you handle privacy and sensitive data in AI/LLM applications?

LLM apps can leak data if not designed carefully. Practices: - Minimize what you send to the model - Redact sensitive fields - Use retention controls - Apply access control and auditing For enterprise, consider on-prem/isolated deployments and strict data processing agreements.

PrivacySecurityLLM

mediumai-engineer-monitoring

What should you monitor in production LLM applications?

Monitor both system and quality signals. Track: - Latency and error rate - Cost per request - Safety policy violations - Retrieval quality (for RAG) - User feedback and escalations Use sampling for human review and track regressions when prompts/models change.

MonitoringLLMReliability

hardai-engineer-cost-optimization

How do you reduce LLM costs without harming quality?

Cost reduction is a system design problem. Levers: - Smaller/cheaper models for simple tasks - Caching and batching - Shorter prompts and better retrieval - Multi-model routing Measure quality with a golden set so you don’t optimize cost at the expense of user experience.

CostOptimizationLLM

hardai-engineer-model-routing

What is multi-model routing and how do you implement it?

Multi-model routing chooses different models based on task complexity. Examples: - Cheap model for classification/summaries - Strong model for reasoning - Fallback when confidence is low Implement with routing rules, confidence scoring, and evaluation. Always log routing decisions to debug failures and costs.

ArchitectureOptimizationLLM