Module 3 — Glossary Recap, Mini Projects, Next Steps
Objective: consolidate fundamentals and ship outputs. This page provides a searchable glossary recap (50 items), three hands-on mini projects using free tools, and a short execution plan.
Click “Show details” on any card. Search filters across terms, synonyms, and tags.
Glossary Recap (Plain English)
Generative AIRecap
Models that create text, images, audio, or code.
- Use: drafting, design, assistants.
- Note: pair with retrieval and tools for accuracy.
Foundation ModelRecap
Large pretrained model adapted to many tasks.
- Common: language, vision, multimodal.
- SLMs can beat LLMs on focused tasks.
TransformerLLM
Architecture using attention blocks.
- Dominant for text and code.
- Efficient attention extends context affordably.
Token / TokenizationLLM
Small text chunks; pricing and latency are per token.
- Short prompts + smart retrieval = lower cost.
Context WindowLLM
Max input size an LLM can read at once.
- Longer isn’t always better; use reranked retrieval.
Prompt / System PromptLLM
Instructions that steer behavior and tone.
- Keep concise; rely on tools for facts & math.
Temperature / Top-k / Top-pLLM
Controls randomness and candidate choices.
- Low = consistent; high = creative.
Function Calling (Tool Use)Agentic
LLM calls APIs (calc, DB, search) for facts/actions.
- Require schemas, timeouts, and logging.
AgentAgentic
Loop: plan → call tools → check → continue.
- Cap steps and budget to avoid loops.
RAGSearch
Retrieve sources, then answer with citations.
- Reranker + good chunking drive quality.
EmbeddingSearch
Numeric representation of meaning.
- Use for search, clustering, recommendations.
Vector DatabaseSearch
Store and search embeddings efficiently.
- Pick for hybrid search, filters, ops fit.
Hybrid SearchSearch
Combine keyword + vector search.
- Baseline for robust retrieval systems.
RerankerSearch
Reorder retrieved chunks by true relevance.
- Often doubles correct answers on long docs.
Chunking StrategySearch
Split docs with headings and small overlaps.
- Bad chunking silently ruins RAG quality.
Citations & AttributionTrust
Show sources for auditability.
- Mandatory for policy/finance/health content.
Diffusion (Latent / Flow / Consistency)Vision
Generate by denoising noise into images/audio.
- Flow/consistency reduce steps → faster outputs.
VAEVision
Compress to a smooth latent space you can sample.
- Backbone for many diffusion pipelines.
GANVision
Generator vs discriminator duel; sharp results.
- Great for super-resolution and critics.
Visual ControlVision
Guide generation with edges/poses/masks.
- Use for catalog consistency and swaps.
OCR & Layout ParsingVision
Turn scans/PDFs into structured text + tables.
- Prefer CSV/HTML for tables, not raw text.
ASR (Speech-to-Text)Audio
Transcribe audio; diarization improves accuracy.
- Use for meetings, call centers, podcasts.
TTS (Text-to-Speech)Audio
Natural-sounding voice output.
- Use for IVR, narration, accessibility.
Voice Cloning (Consent)Audio
Replicate a voice for branding.
- Require explicit consent; watermark outputs.
Code CompletionDev
Suggest next lines as you type.
- Backed by repo-aware context for best results.
SQL GenerationDev
Turn questions into SQL safely.
- Enforce schemas and read-only for safety.
Documentation AssistDev
Draft docstrings/READMEs/examples.
- Keep source of truth in repo + RAG to prevent drift.
LoRA / AdaptersTuning
Lightweight fine-tuning to add tone or formats.
- Cheap, reversible, stackable by task.
Full Fine-TuningTuning
Update many/all weights for maximum control.
- Expensive; needs clean, licensed data.
Transfer LearningTuning
Reuse pretrained knowledge for a new task.
- Strong results with limited data.
GuardrailsSafety
Block unsafe content/actions via policies and checks.
- Use classifiers + allowlists + tool limits.
Prompt InjectionSafety
Text tries to override rules or exfiltrate data.
- Sanitize inputs; separate roles; restrict tools.
PII & PrivacySafety
Limit personal data usage and storage.
- Prefer local/SLM paths for sensitive flows.
Copyright & LicensingSafety
Respect content licenses; keep provenance tags.
- Applies to training data and outputs.
QuantizationDeploy
Store weights in fewer bits to save memory & time.
- 4-bit mixed precision common with small quality loss.
Knowledge DistillationDeploy
Train a small “student” to imitate a big “teacher”.
- Enables on-device and cost control.
KV CacheDeploy
Reuse attention history to speed generation.
- Critical for chat latency and long outputs.
Prompt CachingDeploy
Reuse responses for repeated prompts.
- Set TTLs; manage keys to avoid stale content.
Batching & QueuesDeploy
Process requests in batches to raise throughput.
- Batch offline; stream interactive tasks.
Inference ServersDeploy
vLLM, Triton, llama.cpp, ONNX Runtime.
- Choose by hardware, latency, and scaling needs.
PerplexityEval
How “surprised” the model is by data (lower is better).
- Not a guarantee of factual accuracy.
Exact-Match / F1Eval
Q&A correctness metrics.
- Add citation rate and harmful error rate for safety.
Retrieval MetricsEval
Recall@k, MRR, nDCG to judge search quality.
- Measure before and after reranking.
Latency & Cost KPIsEval
Time-to-first-token, total time, cost per answer.
- Set SLOs; stream for UX; batch offline.
A/B Tests & CanaryEval
Safely compare variants with small traffic slices.
- Apply to prompts, models, rerankers.
Alignment (RLHF / DPO)Policy
Optimize behavior to follow human preferences.
- Combine with guardrails; don’t rely on filters alone.
Content DriftOps
Knowledge changes; answers go stale without re-indexing.
- Schedule ingestion; watch freshness KPIs.
Observability & TracingOps
Track prompts, tool calls, latency, cost, errors.
- Required for debugging and audits.
Prompt ManagementOps
Version, test, and roll back prompts like code.
- Keep a golden set for regression testing.
Routing (SLM vs LLM)Ops
Send simple queries to small models; hard ones to large.
- Delivers major cost savings with stable quality.
Mini Projects — Text, Image, Code
Project 1 — Text (200-word Blog Intro)
Time: 30–45 min
Tools: Any LLM (ChatGPT/Gemini/Claude)
Output: 1 intro + 3 bullets + meta description
Step 1 — Sources. Pick one page from your site and one reliable external source.
Step 2 — Prompt.
Write a 180–220 word blog intro on “Digital Growth in 2025” for enterprise leaders. Tone: concise, direct, ROI-focused. Use these facts only:
- [Fact from my site, 2025]
- [Fact from external source, 2025]
Return: one paragraph + 3 bullet takeaways. Cite inline (Source, 2025).
Step 3 — Tighten. Ask for shorter sentences and remove filler. Temperature 0.2–0.4.
Deliverables.
- 200-word intro (±20 words).
- 3 bullet takeaways with inline citations.
- Meta description ≤ 155 characters.
Project 2 — Image (Hero Banner)
Time: 30–45 min
Tools: Leonardo.ai, Mage.space, or Clipdrop
Output: 1 banner + alt text + caption
Step 1 — Prompt.
Flat illustration, corporate palette (blue/teal/neutral), modern workspace, marketer at multi-screen desk, subtle city skyline, clean negative space.
Step 2 — Variants. Generate 4; keep guidance moderate. If supported, add brand hex codes.
Step 3 — Edit & Export. Fix small issues with inpainting/outpainting. Export 1920×1080 (or responsive sizes).
Deliverables.
- Final banner JPG/PNG (~200–400 KB).
- Alt text ≤ 125 chars and a 1-line caption.
- Optional: second variant for A/B test.
Project 3 — Code (Colab: JSON Summary)
Time: 30–60 min
Tools: Google Colab + any model API
Output: .ipynb + valid JSON
Step 1 — Colab Notebook. Create a new notebook. Install the provider SDK if needed.
Step 2 — Minimal Shape.
# Pseudocode — replace with your provider client
import json, os
API_KEY = "YOUR_KEY"
prompt = "Summarize https://example.com/policy into 3 bullets of 15 words each."
# response = call_model(API_KEY, prompt)
# expected JSON: {"bullets": ["...", "...", "..."]}
# print(json.dumps(response, ensure_ascii=False, indent=2))
Step 3 — Validate JSON. If the model returns text, instruct it to output valid JSON and retry.
Deliverables.
- Notebook (.ipynb) with a successful run.
- Printed JSON with 3 bullets.
- 1–2 sentences noting latency and (if shown) token cost.
Optional Packaging
- Publish the blog intro + banner on your site.
- Link the read-only Colab and a screenshot of the JSON result.
- Add a 5-bullet “How it was built” summary.
Next Steps
Execution Plan (7 / 30 / 90 days)
7-Day: Ship one page: 200-word explainer, 1 banner, 1 JSON summary. Track time & cost.
30-Day: Build 4 pages. Add retrieval with citations for at least one page.
90-Day: Introduce reranker, logging, and a small evaluation set. Cut cost via SLM routing.
Focus Areas for 2025
- RAG Quality: hybrid search, rerankers, clean chunking, strict citations.
- Agents: strict tool schemas, budgets, timeouts, and stop rules.
- Evaluation: golden sets; EM/F1 + citation rate; latency/cost SLOs.
- Efficiency: prefer SLMs; quantization; KV cache; batching.
- Governance: privacy, licensing, abuse prevention, rate limits.
Last updated: 2025-09 • Self-contained learning page. No external ceremony.