Module 1 — Introduction & Capabilities of Generative AI

This is a **plain-English glossary with examples**. Each card has: a one-line meaning, simple use-cases, and a brief “2025 note” so you don’t memorize outdated ideas.

Tip: Click “Show details” on any card. Search filters across terms and synonyms.

Core AI & ML

Artificial Intelligence (AI) Synonyms: smart software
Plain meaning: software that performs tasks we associate with human thinking (understanding, deciding, creating).
  • Use for: recommendations, assistants, content generation.
  • Example: an assistant that drafts emails from bullet points.
  • 2025: Most “AI” you meet is ML powered; “generative AI” creates new content, not just predictions.
Machine Learning (ML)
Plain meaning: letting computers learn patterns from data instead of hand-coding rules.
  • Use for: spam filters, forecasting, churn prediction.
  • Example: a model predicts if a visitor will convert.
  • 2025: Generative models are now a major branch of ML, not a niche.
Deep Learning
Plain meaning: ML using many-layer neural networks to learn complex patterns.
  • Use for: language, images, audio, code.
  • Example: a network that turns text prompts into images.
  • 2025: Transformers dominate many tasks; other architectures still matter for efficiency.
Neural Network
Plain meaning: a web of “neurons” (numbers) connected by “weights” that turn inputs into outputs.
  • Use for: almost all modern AI tasks.
  • Example: a network maps words → next word probabilities.
  • 2025: Size isn’t everything; data quality and training method matter more.
Parameters (Weights)
Plain meaning: the numbers inside a model that get tuned during training.
  • Use for: storing what the model has learned.
  • Example: a 7B-parameter model has ~7 billion numbers.
  • 2025: Smaller but well-trained models (SLMs) can beat larger ones for focused tasks.
Training Data
Plain meaning: examples the model learns from.
  • Use for: teaching patterns and facts.
  • Example: product descriptions + outcomes (clicked / not clicked).
  • 2025: Curated, de-duplicated data beats huge noisy scrapes for many tasks.
Labels / Annotations
Plain meaning: the answers attached to training examples.
  • Use for: supervised learning.
  • Example: “spam” vs “not spam”.
  • 2025: Preference data (what humans prefer) is crucial for aligned generative models.
Supervised Learning
Plain meaning: learn from input → correct output pairs.
  • Use for: prediction and classification.
  • Example: predict price from features.
  • 2025: Still everywhere; also used to initially train LLMs before preference tuning.
Unsupervised Learning
Plain meaning: find structure in data without labels.
  • Use for: clustering, compression, anomaly detection.
  • Example: grouping customer behavior automatically.
  • 2025: Many “self-supervised” methods fall under this umbrella.
Self-Supervised Learning
Plain meaning: create your own labels from the data itself (e.g., hide a word and predict it).
  • Use for: pretraining LLMs and vision models at scale.
  • Example: next-token prediction.
  • 2025: Core method behind modern foundation models.
Reinforcement Learning (RL)
Plain meaning: learn by trial and error using rewards.
  • Use for: game agents, optimization, preference tuning (RLHF).
  • Example: making a chatbot prefer helpful replies via reward signals.
  • 2025: Often replaced or complemented by simpler preference-optimization methods like DPO.

Generative vs Discriminative; Model Families

Generative AI
Plain meaning: models that create new content (text, images, audio, code).
  • Use for: drafting, summarizing, translating, designing.
  • Example: write a product description in your brand voice.
  • 2025: Most products mix generation with retrieval and tools.
Discriminative AI
Plain meaning: models that decide or label (is this A or B?).
  • Use for: spam/not spam, approve/deny, positive/negative.
  • Example: classify support tickets by urgency.
  • 2025: Often used alongside generative models as critics/rerankers.
Foundation Model
Plain meaning: a large pretrained model you adapt for many tasks.
  • Use for: text, vision, multimodal tasks with minimal extra data.
  • Example: start from a general LLM, then add your brand style.
  • 2025: “Small but capable” models are popular for cost/privacy.
Pretraining
Plain meaning: teach a model general skills on huge datasets before any specialization.
  • Use for: language understanding, world knowledge.
  • Example: train on public text to predict next tokens.
  • 2025: Data mix and dedup matter as much as total size.
Transfer Learning
Plain meaning: reuse knowledge from a pretrained model for your task.
  • Use for: good results with little data.
  • Example: adapt a general LLM to your support FAQs.
  • 2025: Often done with adapters (LoRA) rather than full retrains.
Fine-Tuning
Plain meaning: adjust the model on your examples so it behaves your way.
  • Use for: tone, jargon, formats.
  • Example: make the model write like your brand guide.
  • 2025: Use LoRA/adapters for low cost; mix with retrieval for facts.
Autoregressive Model
Plain meaning: generates one token at a time, each based on previous ones.
  • Use for: text/code generation.
  • Example: LLMs predicting the next word.
  • 2025: Dominant for language; also appears in image/audio models.
Diffusion Model
Plain meaning: learns to turn noise into a clean image/audio step by step.
  • Use for: image/audio generation, editing, inpainting.
  • Example: create lifestyle photos from a short prompt.
  • 2025: Latent diffusion + flow/consistency methods reduce steps → faster outputs.
Latent Diffusion
Plain meaning: diffusion done in a compressed space so it’s faster and cheaper.
  • Use for: high-quality images with lower compute.
  • Example: VAE encodes image → diffusion denoises in latent space.
  • 2025: Standard approach for production image models.
VAE (Variational Autoencoder)
Plain meaning: compress data into a smooth latent space you can sample from.
  • Use for: generation with controllable factors; anomaly detection.
  • Example: interpolate between customer personas in latent space.
  • 2025: Often used as the latent compressor inside diffusion pipelines.
VQ-VAE
Plain meaning: an autoencoder that uses a codebook (discrete latents) for crisp reconstructions.
  • Use for: discrete tokens for images/audio; stable training.
  • Example: turn images into code tokens then generate tokens.
  • 2025: Common in token-based image/audio generators.
GAN (Generative Adversarial Network)
Plain meaning: a generator tries to fool a discriminator; the duel improves realism.
  • Use for: sharp small-domain images, super-resolution, augmentation.
  • Example: upscale product images cleanly.
  • 2025: Less common than diffusion for general use, but still great critics/evaluators and for narrow domains.
Normalizing Flows / Flow Matching
Plain meaning: learn a reversible path from noise to data for efficient sampling.
  • Use for: faster generation with good control.
  • 2025: Gains popularity for fewer steps vs classic diffusion.
Multimodal Model
Plain meaning: handles more than one type of input/output (text, image, audio, video).
  • Use for: describe images, talk about videos, generate alt text.
  • 2025: Text+image is common; audio/video is growing fast.

LLM Core Concepts

Token
Plain meaning: a small chunk of text (or audio/image patch) the model sees.
  • Why it matters: costs and speed are per token.
  • 2025: Tokenizers are improving for code and multilingual text.
Tokenization
Plain meaning: how text is split into tokens.
  • Use for: preparing inputs for LLMs.
  • 2025: Good tokenization reduces cost and weird breaks in Bangla/English mixes.
Embedding (Vector)
Plain meaning: a list of numbers that captures the meaning of text/image/audio.
  • Use for: search, clustering, recommendations, RAG.
  • 2025: Cross-encoder reranking often improves results even more than better vectors alone.
Positional Encoding
Plain meaning: a way to tell the model the order of tokens.
  • Use for: keep sequence structure (beginning vs end).
  • 2025: Rotary embeddings (RoPE) and variants help longer contexts.
Attention
Plain meaning: the model focuses on the most relevant tokens when predicting the next one.
  • Use for: capturing long-range relationships.
  • 2025: Memory-efficient attention and SSM hybrids extend context cheaply.
Transformer
Plain meaning: the architecture that powers most LLMs via attention blocks.
  • Use for: text, code, multimodal tasks.
  • 2025: Still dominant; efficiency tweaks matter more than raw size.
Context Window
Plain meaning: how much input the model can read at once.
  • Tip: more context isn’t always better; retrieve the right chunks.
  • 2025: Long contexts are common, but retrieval+rERANK often wins on accuracy/cost.
Prompt
Plain meaning: the text instructions you give the model.
  • Use for: steer style, format, steps.
  • 2025: Structured prompts + tools beat clever wording alone.
System Prompt
Plain meaning: hidden “role” instructions that set behavior (e.g., tone, safety).
  • Use for: guardrails and style defaults.
  • 2025: Keep it short, test it, and pair with policy checks.
Temperature
Plain meaning: controls randomness; low = safe and consistent, high = creative.
  • 2025: For production, keep it low and rely on tools/retrieval.
Top-k / Top-p
Plain meaning: sampling tricks that limit the choices to the most likely tokens.
  • 2025: Tune with temperature for stable style.
Beam Search
Plain meaning: keep several candidate sentences and pick the best one.
  • Use for: translation, when consistency matters more than creativity.
Function Calling / Tool Use
Plain meaning: the model decides to call an API (calculator, DB, search) to get facts or take actions.
  • 2025: Core to reliable apps; log every call and add limits.
Agent (Planner + Tools)
Plain meaning: a loop where the model plans steps and uses tools until the job is done.
  • 2025: Add guardrails: allowed tools, timeouts, budgets.
RAG (Retrieval-Augmented Generation)
Plain meaning: fetch relevant docs first, then have the model answer using them.
  • Use for: up-to-date, auditable answers with citations.
  • 2025: Rerankers boost quality; keep indexes fresh.
Vector Database
Plain meaning: a store built for embeddings so you can find “similar meaning” items.
  • 2025: Choose based on scale, filters, hybrid search, and ops comfort.
Reranker
Plain meaning: a model that reorders retrieved chunks to bring the best ones to the top.
  • 2025: Often doubles answer accuracy in long-doc search.
Chunking
Plain meaning: splitting documents into pieces sized for retrieval/context limits.
  • 2025: Use headings and overlap; bad chunking ruins RAG.

Training, Optimization & Evaluation

Loss Function (Cross-Entropy)
Plain meaning: the score the model tries to minimize during training (lower is better).
  • 2025: Perplexity tracks how well a language model predicts tokens.
Perplexity
Plain meaning: how “surprised” a language model is by the data (lower is better).
  • Note: Good perplexity doesn’t guarantee factual answers without retrieval.
Exact-Match / F1
Plain meaning: common accuracy metrics for Q&A (match answer text; F1 balances precision/recall).
  • 2025: Also track citation rate and harmful error rate.
Overfitting
Plain meaning: the model memorizes training data and fails on new data.
  • Fix: more diverse data, regularization, early stopping.
Regularization (Dropout / Weight Decay)
Plain meaning: techniques that make models generalize better, not memorize.
  • 2025: Still useful even with massive models.
Batch / Epoch / Iteration
Plain meaning: batch = examples at once; epoch = full pass over data; iteration = one update step.
  • Tip: tune batch size for stability and speed.
Learning Rate
Plain meaning: how big a step the model takes when learning.
  • 2025: Schedules and warmup remain important for stability.
Optimizers (SGD, Adam, AdamW)
Plain meaning: rules for how to adjust the weights to reduce loss.
  • 2025: AdamW is a strong default for transformers.
Backpropagation / Gradient Descent
Plain meaning: calculate how much each weight caused the error, then nudge it to improve.
  • 2025: Ubiquitous; automatic differentiation libraries do the math.

Safety, Alignment & Reliability

Hallucination
Plain meaning: a confident-sounding wrong answer.
  • Fix: retrieval with citations, tool use, lower temperature, validation steps.
Alignment
Plain meaning: shaping a model to follow human intent and policy.
  • 2025: Preference optimization + guardrails > filters alone.
RLHF (Reinforcement Learning from Human Feedback)
Plain meaning: train a reward model from human preferences, then optimize the LLM for higher reward.
  • 2025: Often complemented or replaced by DPO for simplicity.
DPO (Direct Preference Optimization)
Plain meaning: optimize directly from preferred vs. rejected answers without a reward model.
  • 2025: Popular for simpler, stable alignment.
Guardrails / Safety Filters
Plain meaning: checks that block unsafe content or actions.
  • 2025: Combine policy prompts, classifiers, and tool restrictions.
Prompt Injection
Plain meaning: malicious text tries to override instructions or leak data.
  • Mitigate: sanitize inputs, separate roles, restrict tools, scan outputs.
PII / Data Privacy
Plain meaning: protect personal data; avoid sending sensitive info unnecessarily.
  • 2025: On-prem or SLMs often used for privacy-critical tasks.

Efficiency & Deployment

Quantization
Plain meaning: store numbers with fewer bits to save memory and speed up inference.
  • 2025: 4-bit and mixed-precision are common with small quality drop.
Pruning / Sparsity
Plain meaning: remove less important connections to shrink the model.
  • 2025: Structured sparsity helps on GPUs that support it.
Knowledge Distillation
Plain meaning: train a small model to imitate a big model.
  • 2025: Widely used to build strong SLMs from LLM teachers.
KV Cache
Plain meaning: remember attention keys/values from previous tokens to speed up long generations.
  • 2025: Critical for streaming chat performance.
Prompt Caching
Plain meaning: reuse results for repeated prompts to cut cost/latency.
  • 2025: Pair with good versioning and TTLs.
Streaming Tokens
Plain meaning: send tokens to the user as they are generated for faster perceived speed.
  • 2025: Standard UX for chat and assistants.
Latency vs Throughput
Plain meaning: latency = time per request; throughput = requests per second.
  • 2025: Batch non-interactive tasks; stream interactive ones.
Cost per Token
Plain meaning: most providers bill by tokens in and out.
  • Tip: compress prompts and retrieve only what you need.
Rate Limits
Plain meaning: caps on how many requests/tokens you can send per minute.
  • 2025: Use queues and fallbacks to avoid errors.
Context Compression
Plain meaning: shrink or summarize text so it fits the context window.
  • 2025: Use rerankers + summaries; don’t dump whole PDFs.
External Memory
Plain meaning: store facts outside the model (DB, notes) and fetch them when needed.
  • 2025: Prefer explicit stores to “long system prompts”.

Tools Snapshot (plain definitions)

These are names you’ll see in docs and tutorials. This section is descriptive, not promotional.

PyTorchDL framework
Build and train neural networks in Python; most research & many products use it.
  • 2025: Strong ecosystem; lots of example code.
TensorFlowDL framework
Another major library for training models; popular in some enterprises.
  • 2025: Often used with Keras for higher-level APIs.
JAXDL framework
High-performance library with fast math and function transforms.
  • 2025: Popular in cutting-edge research; steeper learning curve.
Hugging Face TransformersModel hub
Ready-to-use code & pretrained models for NLP, vision, audio.
  • 2025: Standard for experimenting quickly.
Model APIs (OpenAI, Azure, Anthropic, Google, Mistral, Cohere)Hosted
Call powerful hosted models without running them yourself.
  • 2025: Compare price, latency, features, and data policies.
Llama (Meta)Open weights
Widely used family of open-weight language models.
  • 2025: Strong SLM/LLM baselines; check license terms.
OllamaLocal run
Run many open models locally with simple commands.
  • 2025: Great for demos and privacy.
vLLMInference
High-throughput LLM server that speeds up generation.
  • 2025: Common in production for cost savings.
llama.cppInference
Runs quantized open models on CPUs/GPUs with tiny footprints.
  • 2025: Powers many on-device apps.
ONNX RuntimePortability
Run models across different hardware and languages.
  • 2025: Useful for edge and Windows stacks.
CUDA / ROCmGPU drivers
Software layers that let models use NVIDIA/AMD GPUs.
  • 2025: Version mismatches are a common failure point.
FAISS / Milvus / Weaviate / PineconeVector DBs
Datastores/search engines for embeddings.
  • 2025: Pick based on ops skill, hybrid search, and reranker support.
LangChain / LlamaIndexOrchestration
Libraries to wire prompts, tools, retrieval, and memory together.
  • 2025: Useful, but keep control—avoid heavy magic for production.
NVIDIA Triton Inference ServerServing
Hosts models at scale with batching and GPU utilization features.
  • 2025: Option for teams running their own infrastructure.
Weights & Biases / MLflowTracking
Track experiments, datasets, and model versions.
  • 2025: Helps compare runs and reproduce results.

Last updated: 2025-09 • This page is scoped and safe to embed anywhere on your site.