Back to Blog
AI & ML 15 min read

Generative AI at Work: From Theory to Real Practice

Practical guide for implementing generative AI in your company. Real cases, proven architectures and how to avoid the most common mistakes.

C
Cadences Team
Artificial intelligence and machine learning

Generative AI has gone from science fiction to a working tool in record time. But between the hype and reality there's a gap. This article is not a theoretical introduction: it's a practical guide for those who already understand the potential and want to implement it without dying in the attempt.

🎯 This article is for you if:

  • You've tried ChatGPT but don't know how to integrate it into workflows
  • You're concerned about data privacy when using third-party APIs
  • You want to understand which model to use for which task
  • You need to justify the investment with measurable ROI
Current Landscape

The Real State of Generative AI in 2026

Let's forget the hype. These are the facts you need to know:

14+
Viable LLM providers for production (GPT-4, Claude, Gemini, DeepSeek, Llama, Mistral...)
90%
Of companies have tried generative AI. Only 20% use it systematically.
3-5x
Productivity improvement in content generation and coding tasks.
7B-70B
Parameter range of models you can run locally on consumer hardware.
Architectures

The 3 Implementation Models

When we talk about "implementing AI", there are three fundamentally different architectures. Each has its use cases:

☁️ 1. Cloud API (The Most Common)

You call an API (OpenAI, Anthropic, Google) and receive responses. Simple, fast, but with implications:

Aspect Advantage Disadvantage
Quality Frontier models (GPT-4o, Claude 3.5) Vendor dependency
Cost Pay-per-use, no infrastructure Can scale up quickly
Privacy β€” Data sent to third parties
Latency Good (100-500ms) Network dependent

When to use it: Rapid prototyping, non-sensitive tasks, when you need the best available model.

πŸ–₯️ 2. Local LLM (Self-Hosted)

You run the model on your own infrastructure using tools like Ollama, LMStudio or vLLM.

# Run Llama 3.1 8B locally with Ollama
ollama run llama3.1:8b

# Or with LMStudio (graphical interface)
# Download the GGUF model β†’ Load β†’ Ready

# OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.1:8b", "messages": [{"role": "user", "content": "Hello"}]}'
Model Size VRAM Required Recommended Hardware
7B - 8B 8 GB RTX 3070 or higher
13B 12 GB RTX 4070 Ti
70B 48+ GB 2x RTX 4090 or A100

When to use it: Sensitive data, compliance (GDPR, HIPAA), high request volume, no variable costs.

🎯 3. Custom Models (Fine-Tuning)

You train or fine-tune a model with your own data. The holy grail for specific use cases.

⚠️ Important: Fine-tuning is not magic. You need quality data (minimum 1000+ examples) and a clear use case. If you can solve the problem with prompting, you probably don't need fine-tuning.

Technique Description Use
LoRA Efficient parameter tuning ⭐ Most used
QLoRA LoRA with quantization Limited memory
Full Fine-Tune Full model tuning Rarely needed
Strategy

The Decision Framework: Which Model For What?

After implementing AI in dozens of projects, this is the framework we use:

// Simplified decision tree

Is the data sensitive?
β”œβ”€ YES β†’ Do you need frontier-quality responses?
β”‚       β”œβ”€ YES β†’ Claude/GPT-4 with Enterprise Agreement
β”‚       └─ NO β†’ Local LLM (Llama 3.1, Mistral)
β”‚
└─ NO β†’ Is the volume high (>10K req/day)?
        β”œβ”€ YES β†’ Limited budget?
        β”‚       β”œβ”€ YES β†’ Local LLM or DeepSeek
        β”‚       └─ NO β†’ GPT-4o or Claude 3.5
        β”‚
        └─ NO β†’ Is the task very specific?
                β”œβ”€ YES β†’ Fine-tuning a small model
                └─ NO β†’ Cloud API (GPT-4o-mini, Claude Haiku)
Implementation

Practical Architecture: The Multi-Provider Pattern

At Cadences we implement a pattern we call "AI Service with fallback". The idea is simple:

  1. Define an abstract interface for AI operations
  2. Implement multiple providers (14 in our case)
  3. Route based on task type, cost and availability
  4. If a provider fails, automatically switch to another
// Simplified AI Service example
interface AIProvider {
  chat(messages: Message[]): Promise<string>
  embed(text: string): Promise<number[]>
}

class AIService {
  private providers: Map<string, AIProvider>
  
  async chat(messages: Message[], options: ChatOptions) {
    // 1. Select provider based on task
    const provider = this.selectProvider(options)
    
    try {
      return await provider.chat(messages)
    } catch (error) {
      // 2. Fallback to alternative provider
      const fallback = this.getFallback(provider)
      return await fallback.chat(messages)
    }
  }
}

Benefits of this pattern:

  • Resilience: If OpenAI has an outage, you switch to Claude or Gemini
  • Cost optimization: Use cheap models for simple tasks
  • Flexibility: Adding a new provider means implementing an interface
  • A/B Testing: Compare responses from different models
Providers

Providers We Use (And Why)

G

Google Gemini

Recommended

Excellent quality/price ratio. Gemini 2.0 Flash is our default option for most tasks.

Best for: Document analysis, multimodal (images+text), general tasks
C

Anthropic Claude

Premium

The best for complex reasoning and instruction following. Claude 3.5 Sonnet is impressive.

Best for: Code, complex analysis, tasks requiring precision
DS

DeepSeek

Best Price

Chinese model with quality comparable to GPT-4 at a fraction of the price. DeepSeek-V3 is excellent.

Best for: High volume, limited budget, code tasks
πŸ¦™

Ollama + Llama 3.1

Local

Our option for local execution. Llama 3.1 8B runs on consumer hardware with good results.

Best for: Sensitive data, offline, no variable costs
In Action

Real Use Cases: How We Use AI at Cadences

πŸ€– 1. Project Assistant with Context

The Cadences assistant is not a generic chatbot. It knows your projects, tasks, clients and data. How?

Pattern: Context Window Optimization

1. Retrieve relevant context - Current project (name, description, status) - Recent tasks (last 20) - Custom project fields - Conversation history (last 10 messages) 2. Compress context - Only include data the model needs - Summarize long tasks - Omit irrelevant metadata 3. Inject into system prompt - "You are an assistant for [project name]" - "The user has [N] pending tasks" - "You can use these functions: [list]"

🏷️ 2. Automatic Ticket Classification

When an email or form arrives, we automatically classify it using a fine-tuned model:

!

Urgency

High / Medium / Low

🏷️

Category

Support / Sales / Billing / Other

😊

Sentiment

Positive / Neutral / Negative

🌐

Language

Automatic detection

This classifier runs on the Cadences Local ML Trainer, without sending data to third parties.

πŸ’¬ 3. Storefront Chatbot

Each Storefront can have a chatbot that knows the business's products and services:

// Chatbot configuration
{
  "provider": "gemini",
  "model": "gemini-2.0-flash-exp",
  "context": {
    "businessName": "Restaurant23",
    "products": [...],  // Full menu
    "faqs": [...],      // Frequently asked questions
    "rules": [
      "Always respond in English",
      "If they ask about reservations, give the phone number",
      "Don't give approximate prices, use real ones"
    ]
  }
}
Lessons Learned

Most Common Mistakes (And How to Avoid Them)

❌ Mistake 1: Send all the context

"I pass the entire database to the model so it has context"

βœ… Solution: Use RAG (Retrieval Augmented Generation). Search only for relevant documents and pass them to the model.

❌ Mistake 2: Don't validate outputs

"The model returns JSON, I parse it directly"

βœ… Solution: Always validate with schema (Zod, JSON Schema). Models hallucinate. Add retries with corrected prompts.

❌ Mistake 3: Premature fine-tuning

"I need fine-tuning so the model understands my business"

βœ… Solution: Optimize the prompt first. Then few-shot learning. 95% of cases are solved without fine-tuning.

❌ Mistake 4: One model for everything

"We use GPT-4 for everything because it's the best"

βœ… Solution: Use the right model for each task. Simple classification β†’ small model. Complex reasoning β†’ frontier model.

Measurement

Metrics and ROI: How to Measure Impact

Generative AI is easy to implement, hard to measure. These are the metrics we recommend:

⚑ Productivity Metrics

  • Time saved: Minutes per task before vs. after
  • Tasks completed: Work volume processed per day
  • Automation rate: % of tasks that don't require human intervention

βœ… Quality Metrics

  • Accuracy: % of correct responses (you need ground truth)
  • Escalation rate: % of cases requiring human review
  • User satisfaction: NPS or response ratings

πŸ’° Cost Metrics

  • Cost per request: $ spent per API call
  • Cost per completed task: Total $ / tasks processed
  • ROI: (Value generated - AI Cost) / AI Cost Γ— 100

Want to see AI in action?

Try the Cadences assistant. It's not a generic chatbot: it knows your projects, executes actions and learns from your business.

Try AI Assistant
Final Thoughts

Conclusion: AI is a Tool, Not Magic

Generative AI is transformative, but it's not magic. It requires thoughtful architecture, proper model selection, output validation and clear metrics.

The keys to a successful implementation:

  1. Start small: One use case, one model, measurable results
  2. Multi-provider: Don't depend on a single vendor
  3. Local-first when it matters: Privacy is non-negotiable for sensitive data
  4. Measure everything: If you can't measure the impact, you can't improve it
  5. Iterate fast: The landscape changes every month, stay updated

Generative AI won't replace your job. But someone who knows how to use it well, probably will.

C
Cadences Team
Building the future of AI-powered business tools
Share: