Module 3 — Glossary Recap, Mini Projects, Next Steps

Objective: consolidate fundamentals and ship outputs. This page provides a searchable glossary recap (50 items), three hands-on mini projects using free tools, and a short execution plan.

Click “Show details” on any card. Search filters across terms, synonyms, and tags.

Glossary Recap (Plain English)

Generative AIRecap
Models that create text, images, audio, or code.
  • Use: drafting, design, assistants.
  • Note: pair with retrieval and tools for accuracy.
Foundation ModelRecap
Large pretrained model adapted to many tasks.
  • Common: language, vision, multimodal.
  • SLMs can beat LLMs on focused tasks.
TransformerLLM
Architecture using attention blocks.
  • Dominant for text and code.
  • Efficient attention extends context affordably.
Token / TokenizationLLM
Small text chunks; pricing and latency are per token.
  • Short prompts + smart retrieval = lower cost.
Context WindowLLM
Max input size an LLM can read at once.
  • Longer isn’t always better; use reranked retrieval.
Prompt / System PromptLLM
Instructions that steer behavior and tone.
  • Keep concise; rely on tools for facts & math.
Temperature / Top-k / Top-pLLM
Controls randomness and candidate choices.
  • Low = consistent; high = creative.
Function Calling (Tool Use)Agentic
LLM calls APIs (calc, DB, search) for facts/actions.
  • Require schemas, timeouts, and logging.
AgentAgentic
Loop: plan → call tools → check → continue.
  • Cap steps and budget to avoid loops.
RAGSearch
Retrieve sources, then answer with citations.
  • Reranker + good chunking drive quality.
EmbeddingSearch
Numeric representation of meaning.
  • Use for search, clustering, recommendations.
Vector DatabaseSearch
Store and search embeddings efficiently.
  • Pick for hybrid search, filters, ops fit.
Hybrid SearchSearch
Combine keyword + vector search.
  • Baseline for robust retrieval systems.
RerankerSearch
Reorder retrieved chunks by true relevance.
  • Often doubles correct answers on long docs.
Chunking StrategySearch
Split docs with headings and small overlaps.
  • Bad chunking silently ruins RAG quality.
Citations & AttributionTrust
Show sources for auditability.
  • Mandatory for policy/finance/health content.
Diffusion (Latent / Flow / Consistency)Vision
Generate by denoising noise into images/audio.
  • Flow/consistency reduce steps → faster outputs.
VAEVision
Compress to a smooth latent space you can sample.
  • Backbone for many diffusion pipelines.
GANVision
Generator vs discriminator duel; sharp results.
  • Great for super-resolution and critics.
Visual ControlVision
Guide generation with edges/poses/masks.
  • Use for catalog consistency and swaps.
OCR & Layout ParsingVision
Turn scans/PDFs into structured text + tables.
  • Prefer CSV/HTML for tables, not raw text.
ASR (Speech-to-Text)Audio
Transcribe audio; diarization improves accuracy.
  • Use for meetings, call centers, podcasts.
TTS (Text-to-Speech)Audio
Natural-sounding voice output.
  • Use for IVR, narration, accessibility.
Voice Cloning (Consent)Audio
Replicate a voice for branding.
  • Require explicit consent; watermark outputs.
Code CompletionDev
Suggest next lines as you type.
  • Backed by repo-aware context for best results.
SQL GenerationDev
Turn questions into SQL safely.
  • Enforce schemas and read-only for safety.
Documentation AssistDev
Draft docstrings/READMEs/examples.
  • Keep source of truth in repo + RAG to prevent drift.
LoRA / AdaptersTuning
Lightweight fine-tuning to add tone or formats.
  • Cheap, reversible, stackable by task.
Full Fine-TuningTuning
Update many/all weights for maximum control.
  • Expensive; needs clean, licensed data.
Transfer LearningTuning
Reuse pretrained knowledge for a new task.
  • Strong results with limited data.
GuardrailsSafety
Block unsafe content/actions via policies and checks.
  • Use classifiers + allowlists + tool limits.
Prompt InjectionSafety
Text tries to override rules or exfiltrate data.
  • Sanitize inputs; separate roles; restrict tools.
PII & PrivacySafety
Limit personal data usage and storage.
  • Prefer local/SLM paths for sensitive flows.
Copyright & LicensingSafety
Respect content licenses; keep provenance tags.
  • Applies to training data and outputs.
QuantizationDeploy
Store weights in fewer bits to save memory & time.
  • 4-bit mixed precision common with small quality loss.
Knowledge DistillationDeploy
Train a small “student” to imitate a big “teacher”.
  • Enables on-device and cost control.
KV CacheDeploy
Reuse attention history to speed generation.
  • Critical for chat latency and long outputs.
Prompt CachingDeploy
Reuse responses for repeated prompts.
  • Set TTLs; manage keys to avoid stale content.
Batching & QueuesDeploy
Process requests in batches to raise throughput.
  • Batch offline; stream interactive tasks.
Inference ServersDeploy
vLLM, Triton, llama.cpp, ONNX Runtime.
  • Choose by hardware, latency, and scaling needs.
PerplexityEval
How “surprised” the model is by data (lower is better).
  • Not a guarantee of factual accuracy.
Exact-Match / F1Eval
Q&A correctness metrics.
  • Add citation rate and harmful error rate for safety.
Retrieval MetricsEval
Recall@k, MRR, nDCG to judge search quality.
  • Measure before and after reranking.
Latency & Cost KPIsEval
Time-to-first-token, total time, cost per answer.
  • Set SLOs; stream for UX; batch offline.
A/B Tests & CanaryEval
Safely compare variants with small traffic slices.
  • Apply to prompts, models, rerankers.
Alignment (RLHF / DPO)Policy
Optimize behavior to follow human preferences.
  • Combine with guardrails; don’t rely on filters alone.
Content DriftOps
Knowledge changes; answers go stale without re-indexing.
  • Schedule ingestion; watch freshness KPIs.
Observability & TracingOps
Track prompts, tool calls, latency, cost, errors.
  • Required for debugging and audits.
Prompt ManagementOps
Version, test, and roll back prompts like code.
  • Keep a golden set for regression testing.
Routing (SLM vs LLM)Ops
Send simple queries to small models; hard ones to large.
  • Delivers major cost savings with stable quality.

Mini Projects — Text, Image, Code

Project 1 — Text (200-word Blog Intro)

Time: 30–45 min
Tools: Any LLM (ChatGPT/Gemini/Claude)
Output: 1 intro + 3 bullets + meta description
Step 1 — Sources. Pick one page from your site and one reliable external source.
Step 2 — Prompt.
Write a 180–220 word blog intro on “Digital Growth in 2025” for enterprise leaders. Tone: concise, direct, ROI-focused. Use these facts only: - [Fact from my site, 2025] - [Fact from external source, 2025] Return: one paragraph + 3 bullet takeaways. Cite inline (Source, 2025).
Step 3 — Tighten. Ask for shorter sentences and remove filler. Temperature 0.2–0.4.
Deliverables.
  • 200-word intro (±20 words).
  • 3 bullet takeaways with inline citations.
  • Meta description ≤ 155 characters.

Project 2 — Image (Hero Banner)

Time: 30–45 min
Tools: Leonardo.ai, Mage.space, or Clipdrop
Output: 1 banner + alt text + caption
Step 1 — Prompt.
Flat illustration, corporate palette (blue/teal/neutral), modern workspace, marketer at multi-screen desk, subtle city skyline, clean negative space.
Step 2 — Variants. Generate 4; keep guidance moderate. If supported, add brand hex codes.
Step 3 — Edit & Export. Fix small issues with inpainting/outpainting. Export 1920×1080 (or responsive sizes).
Deliverables.
  • Final banner JPG/PNG (~200–400 KB).
  • Alt text ≤ 125 chars and a 1-line caption.
  • Optional: second variant for A/B test.

Project 3 — Code (Colab: JSON Summary)

Time: 30–60 min
Tools: Google Colab + any model API
Output: .ipynb + valid JSON
Step 1 — Colab Notebook. Create a new notebook. Install the provider SDK if needed.
Step 2 — Minimal Shape.
# Pseudocode — replace with your provider client import json, os API_KEY = "YOUR_KEY" prompt = "Summarize https://example.com/policy into 3 bullets of 15 words each." # response = call_model(API_KEY, prompt) # expected JSON: {"bullets": ["...", "...", "..."]} # print(json.dumps(response, ensure_ascii=False, indent=2))
Step 3 — Validate JSON. If the model returns text, instruct it to output valid JSON and retry.
Deliverables.
  • Notebook (.ipynb) with a successful run.
  • Printed JSON with 3 bullets.
  • 1–2 sentences noting latency and (if shown) token cost.

Optional Packaging

  • Publish the blog intro + banner on your site.
  • Link the read-only Colab and a screenshot of the JSON result.
  • Add a 5-bullet “How it was built” summary.

Next Steps

Execution Plan (7 / 30 / 90 days)

7-Day: Ship one page: 200-word explainer, 1 banner, 1 JSON summary. Track time & cost.
30-Day: Build 4 pages. Add retrieval with citations for at least one page.
90-Day: Introduce reranker, logging, and a small evaluation set. Cut cost via SLM routing.

Focus Areas for 2025

  • RAG Quality: hybrid search, rerankers, clean chunking, strict citations.
  • Agents: strict tool schemas, budgets, timeouts, and stop rules.
  • Evaluation: golden sets; EM/F1 + citation rate; latency/cost SLOs.
  • Efficiency: prefer SLMs; quantization; KV cache; batching.
  • Governance: privacy, licensing, abuse prevention, rate limits.

Last updated: 2025-09 • Self-contained learning page. No external ceremony.