Workshop 4: Training & Fine-Tuning
Duration: 60 minutes | Level: Intermediate | Prerequisites: Workshops 1-3
What Youβll Master
Transform your expert from generic to specialized by training it with your own data and knowledge.1
Understanding Training
Learn the different types of training available
2
Document Upload
Add knowledge through documents and files
3
QA Pair Creation
Create question-answer pairs for precise training
4
Model Fine-Tuning
Fine-tune a base model with your data
5
Training Evaluation
Test and evaluate your training results
Types of Training
B-Bot offers multiple training approaches:Document Retrieval (RAG)
Best for: Large knowledge bases, manuals, documentationYour expert searches documents to find relevant information.
QA Pairs
Best for: Specific questions, exact answers, brand voiceDirect question-answer mappings for precise responses.
Fine-Tuning
Best for: Unique behavior, consistent style, specialized tasksTrain a custom model with your data.
Document Training (RAG)
How RAG Works
Uploading Documents
- Navigate to Training in the sidebar
- Click Documents tab
- Click Upload Documents

Supported Formats
| Format | Best For | Notes |
|---|---|---|
| Manuals, reports | Extracts text and structure | |
| DOCX | Word documents | Preserves formatting |
| TXT | Plain text | Simplest format |
| MD | Markdown docs | Great for technical docs |
| CSV | Structured data | Creates searchable rows |
| JSON | API docs, structured | Maintains hierarchy |
π― Exercise: Document Upload
Create a simple product manual for training:1
Create Document
Create a file called
product_manual.md:2
Upload to B-Bot
Upload this document to your TechSupport AI expert
3
Test Retrieval
Ask: βWhat should I do if the LED is red?β
Document Processing Options
Chunking Strategy
Chunking Strategy
How documents are split for search:
- Paragraph: Best for structured documents
- Sentence: Best for FAQs
- Token-based: Best for long documents
Embedding Model
Embedding Model
How text is converted to vectors:
- OpenAI ada-002: High quality, standard
- Cohere: Good for multilingual
- Local: Privacy-focused
Overlap
Overlap
How much context is shared between chunks:
- Higher overlap = better context preservation
- Lower overlap = faster search
QA Pair Training
When to Use QA Pairs
β Great For
- Brand-specific terminology
- Exact pricing/policies
- Consistent answers to common questions
- Company voice and tone
β οΈ Less Effective For
- Open-ended questions
- Complex reasoning
- Large knowledge bases
- Frequently changing info
Creating QA Pairs
Navigate to Training β QA Pairs:
π― Exercise: Create QA Pairs
Create these QA pairs for your TechSupport AI:- QA Pair 1
- QA Pair 2
- QA Pair 3
Question:Answer:
QA Pair Best Practices
Write Natural Questions
Write Natural Questions
Use questions as real users would ask them:
- β βProvide information about shipping policiesβ
- β βHow long does shipping take?β
Include Variations
Include Variations
Add multiple phrasings for the same question:
- βWhatβs the warranty?β
- βHow long is my product covered?β
- βIs this under warranty?β
Complete Answers
Complete Answers
Provide full, helpful answers:
- Include all relevant details
- Add next steps or links
- Use your brand voice
Fine-Tuning
What is Fine-Tuning?
Fine-tuning trains the modelβs neural network weights with your data, creating a specialized version of the base model.Starting a Fine-Tune Job
1
Prepare Data
Collect at least 50-100 high-quality training examples
2
Navigate to Fine-Tuning
Go to Training β Fine-Tuning
3
Select Base Model
Choose the model to fine-tune (e.g., GPT-4o-mini)
4
Upload Training Data
Upload your prepared dataset
5
Start Training
Begin the fine-tuning job and monitor progress
Training Data Format
Fine-tuning uses JSONL format:Fine-Tuning Tips
Data Quality
Quality over quantity. 50 excellent examples beat 500 mediocre ones.
Diverse Examples
Include various topics, question types, and edge cases.
Consistent Format
Maintain consistent response style across all examples.
Iterate
Fine-tune in rounds, testing and improving each time.
Model Distillation
What is Distillation?
Distillation transfers knowledge from a powerful model (teacher) to a smaller, faster model (student).Benefits
| Aspect | Before Distillation | After Distillation |
|---|---|---|
| Cost | $$$$ (GPT-4o) | $$ (fine-tuned mini) |
| Speed | ~3s per response | ~1s per response |
| Quality | Excellent | Very Good (for your domain) |
Evaluating Training Results
Testing Your Trained Expert
1
Create Test Set
Prepare 10-20 questions your expert should answer well
2
Run Tests
Ask each question and record the response
3
Evaluate
Score responses for accuracy, tone, and completeness
4
Iterate
Add more training data where gaps exist
Evaluation Criteria
| Criterion | What to Check |
|---|---|
| Accuracy | Is the information correct? |
| Completeness | Is the answer thorough? |
| Tone | Does it match your brand voice? |
| Relevance | Does it answer the actual question? |
| Helpfulness | Would a real user be satisfied? |
π― Challenge: Complete Training Pipeline
1
Upload 3 Documents
Add product manuals, FAQs, and policy documents
2
Create 10 QA Pairs
Cover common questions with perfect answers
3
Test with 5 Questions
Verify the expert uses the training data
4
Refine
Improve based on test results
Best Practices Summary
Start with RAG
Document retrieval is fastest to implement and easiest to update
Add QA for Precision
Use QA pairs for questions that must have exact answers
Fine-Tune Last
Only fine-tune when you have enough data and clear improvement goals
Test Continuously
Regular testing catches regressions early