Generative AI at Work: From Theory to Real Practice
Practical guide for implementing generative AI in your company. Real cases, proven architectures and how to avoid the most common mistakes.
Generative AI has gone from science fiction to a working tool in record time. But between the hype and reality there's a gap. This article is not a theoretical introduction: it's a practical guide for those who already understand the potential and want to implement it without dying in the attempt.
π― This article is for you if:
- You've tried ChatGPT but don't know how to integrate it into workflows
- You're concerned about data privacy when using third-party APIs
- You want to understand which model to use for which task
- You need to justify the investment with measurable ROI
The Real State of Generative AI in 2026
Let's forget the hype. These are the facts you need to know:
The 3 Implementation Models
When we talk about "implementing AI", there are three fundamentally different architectures. Each has its use cases:
βοΈ 1. Cloud API (The Most Common)
You call an API (OpenAI, Anthropic, Google) and receive responses. Simple, fast, but with implications:
| Aspect | Advantage | Disadvantage |
|---|---|---|
| Quality | Frontier models (GPT-4o, Claude 3.5) | Vendor dependency |
| Cost | Pay-per-use, no infrastructure | Can scale up quickly |
| Privacy | β | Data sent to third parties |
| Latency | Good (100-500ms) | Network dependent |
When to use it: Rapid prototyping, non-sensitive tasks, when you need the best available model.
π₯οΈ 2. Local LLM (Self-Hosted)
You run the model on your own infrastructure using tools like Ollama, LMStudio or vLLM.
# Run Llama 3.1 8B locally with Ollama ollama run llama3.1:8b # Or with LMStudio (graphical interface) # Download the GGUF model β Load β Ready # OpenAI-compatible API curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{"model": "llama3.1:8b", "messages": [{"role": "user", "content": "Hello"}]}'
| Model Size | VRAM Required | Recommended Hardware |
|---|---|---|
| 7B - 8B | 8 GB | RTX 3070 or higher |
| 13B | 12 GB | RTX 4070 Ti |
| 70B | 48+ GB | 2x RTX 4090 or A100 |
When to use it: Sensitive data, compliance (GDPR, HIPAA), high request volume, no variable costs.
π― 3. Custom Models (Fine-Tuning)
You train or fine-tune a model with your own data. The holy grail for specific use cases.
β οΈ Important: Fine-tuning is not magic. You need quality data (minimum 1000+ examples) and a clear use case. If you can solve the problem with prompting, you probably don't need fine-tuning.
| Technique | Description | Use |
|---|---|---|
| LoRA | Efficient parameter tuning | β Most used |
| QLoRA | LoRA with quantization | Limited memory |
| Full Fine-Tune | Full model tuning | Rarely needed |
The Decision Framework: Which Model For What?
After implementing AI in dozens of projects, this is the framework we use:
// Simplified decision tree Is the data sensitive? ββ YES β Do you need frontier-quality responses? β ββ YES β Claude/GPT-4 with Enterprise Agreement β ββ NO β Local LLM (Llama 3.1, Mistral) β ββ NO β Is the volume high (>10K req/day)? ββ YES β Limited budget? β ββ YES β Local LLM or DeepSeek β ββ NO β GPT-4o or Claude 3.5 β ββ NO β Is the task very specific? ββ YES β Fine-tuning a small model ββ NO β Cloud API (GPT-4o-mini, Claude Haiku)
Practical Architecture: The Multi-Provider Pattern
At Cadences we implement a pattern we call "AI Service with fallback". The idea is simple:
- Define an abstract interface for AI operations
- Implement multiple providers (14 in our case)
- Route based on task type, cost and availability
- If a provider fails, automatically switch to another
// Simplified AI Service example interface AIProvider { chat(messages: Message[]): Promise<string> embed(text: string): Promise<number[]> } class AIService { private providers: Map<string, AIProvider> async chat(messages: Message[], options: ChatOptions) { // 1. Select provider based on task const provider = this.selectProvider(options) try { return await provider.chat(messages) } catch (error) { // 2. Fallback to alternative provider const fallback = this.getFallback(provider) return await fallback.chat(messages) } } }
Benefits of this pattern:
- Resilience: If OpenAI has an outage, you switch to Claude or Gemini
- Cost optimization: Use cheap models for simple tasks
- Flexibility: Adding a new provider means implementing an interface
- A/B Testing: Compare responses from different models
Providers We Use (And Why)
Google Gemini
RecommendedExcellent quality/price ratio. Gemini 2.0 Flash is our default option for most tasks.
Anthropic Claude
PremiumThe best for complex reasoning and instruction following. Claude 3.5 Sonnet is impressive.
DeepSeek
Best PriceChinese model with quality comparable to GPT-4 at a fraction of the price. DeepSeek-V3 is excellent.
Ollama + Llama 3.1
LocalOur option for local execution. Llama 3.1 8B runs on consumer hardware with good results.
Real Use Cases: How We Use AI at Cadences
π€ 1. Project Assistant with Context
The Cadences assistant is not a generic chatbot. It knows your projects, tasks, clients and data. How?
Pattern: Context Window Optimization
π·οΈ 2. Automatic Ticket Classification
When an email or form arrives, we automatically classify it using a fine-tuned model:
Urgency
High / Medium / Low
Category
Support / Sales / Billing / Other
Sentiment
Positive / Neutral / Negative
Language
Automatic detection
This classifier runs on the Cadences Local ML Trainer, without sending data to third parties.
π¬ 3. Storefront Chatbot
Each Storefront can have a chatbot that knows the business's products and services:
// Chatbot configuration { "provider": "gemini", "model": "gemini-2.0-flash-exp", "context": { "businessName": "Restaurant23", "products": [...], // Full menu "faqs": [...], // Frequently asked questions "rules": [ "Always respond in English", "If they ask about reservations, give the phone number", "Don't give approximate prices, use real ones" ] } }
Most Common Mistakes (And How to Avoid Them)
β Mistake 1: Send all the context
"I pass the entire database to the model so it has context"
β Solution: Use RAG (Retrieval Augmented Generation). Search only for relevant documents and pass them to the model.
β Mistake 2: Don't validate outputs
"The model returns JSON, I parse it directly"
β Solution: Always validate with schema (Zod, JSON Schema). Models hallucinate. Add retries with corrected prompts.
β Mistake 3: Premature fine-tuning
"I need fine-tuning so the model understands my business"
β Solution: Optimize the prompt first. Then few-shot learning. 95% of cases are solved without fine-tuning.
β Mistake 4: One model for everything
"We use GPT-4 for everything because it's the best"
β Solution: Use the right model for each task. Simple classification β small model. Complex reasoning β frontier model.
Metrics and ROI: How to Measure Impact
Generative AI is easy to implement, hard to measure. These are the metrics we recommend:
β‘ Productivity Metrics
- Time saved: Minutes per task before vs. after
- Tasks completed: Work volume processed per day
- Automation rate: % of tasks that don't require human intervention
β Quality Metrics
- Accuracy: % of correct responses (you need ground truth)
- Escalation rate: % of cases requiring human review
- User satisfaction: NPS or response ratings
π° Cost Metrics
- Cost per request: $ spent per API call
- Cost per completed task: Total $ / tasks processed
- ROI: (Value generated - AI Cost) / AI Cost Γ 100
Want to see AI in action?
Try the Cadences assistant. It's not a generic chatbot: it knows your projects, executes actions and learns from your business.
Try AI AssistantConclusion: AI is a Tool, Not Magic
Generative AI is transformative, but it's not magic. It requires thoughtful architecture, proper model selection, output validation and clear metrics.
The keys to a successful implementation:
- Start small: One use case, one model, measurable results
- Multi-provider: Don't depend on a single vendor
- Local-first when it matters: Privacy is non-negotiable for sensitive data
- Measure everything: If you can't measure the impact, you can't improve it
- Iterate fast: The landscape changes every month, stay updated
Generative AI won't replace your job. But someone who knows how to use it well, probably will.