Generative AI at Work: From Theory to Real Practice

Generative AI has gone from science fiction to a working tool in record time. But between the hype and reality there's a gap. This article is not a theoretical introduction: it's a practical guide for those who already understand the potential and want to implement it without dying in the attempt.

  🎯 This article is for you if:
    
You've tried ChatGPT but don't know how to integrate it into workflows
   
You're concerned about data privacy when using third-party APIs
   
You want to understand which model to use for which task
   
You need to justify the investment with measurable ROI

Current Landscape

The Real State of Generative AI in 2026

Let's forget the hype. These are the facts you need to know:

14+

Viable LLM providers for production (GPT-4, Claude, Gemini, DeepSeek, Llama, Mistral...)

90%

Of companies have tried generative AI. Only 20% use it systematically.

3-5x

Productivity improvement in content generation and coding tasks.

7B-70B

Parameter range of models you can run locally on consumer hardware.

Architectures

The 3 Implementation Models

When we talk about "implementing AI", there are three fundamentally different architectures. Each has its use cases:

☁️ 1. Cloud API (The Most Common)

You call an API (OpenAI, Anthropic, Google) and receive responses. Simple, fast, but with implications:

Aspect	Advantage	Disadvantage
Quality	Frontier models (GPT-4o, Claude 3.5)	Vendor dependency
Cost	Pay-per-use, no infrastructure	Can scale up quickly
Privacy	—	Data sent to third parties
Latency	Good (100-500ms)	Network dependent

When to use it: Rapid prototyping, non-sensitive tasks, when you need the best available model.

🖥️ 2. Local LLM (Self-Hosted)

You run the model on your own infrastructure using tools like Ollama, LMStudio or vLLM.

# Run Llama 3.1 8B locally with Ollama
ollama run llama3.1:8b

# Or with LMStudio (graphical interface)
# Download the GGUF model → Load → Ready

# OpenAI-compatible API
curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "llama3.1:8b", "messages": [{"role": "user", "content": "Hello"}]}'

Model Size	VRAM Required	Recommended Hardware
7B - 8B	8 GB	RTX 3070 or higher
13B	12 GB	RTX 4070 Ti
70B	48+ GB	2x RTX 4090 or A100

When to use it: Sensitive data, compliance (GDPR, HIPAA), high request volume, no variable costs.

🎯 3. Custom Models (Fine-Tuning)

You train or fine-tune a model with your own data. The holy grail for specific use cases.

⚠️ Important: Fine-tuning is not magic. You need quality data (minimum 1000+ examples) and a clear use case. If you can solve the problem with prompting, you probably don't need fine-tuning.

Technique	Description	Use
LoRA	Efficient parameter tuning	⭐ Most used
QLoRA	LoRA with quantization	Limited memory
Full Fine-Tune	Full model tuning	Rarely needed

Strategy

The Decision Framework: Which Model For What?

After implementing AI in dozens of projects, this is the framework we use:

// Simplified decision tree

Is the data sensitive?
├─ YES → Do you need frontier-quality responses?
│       ├─ YES → Claude/GPT-4 with Enterprise Agreement
│       └─ NO → Local LLM (Llama 3.1, Mistral)
│
└─ NO → Is the volume high (>10K req/day)?
        ├─ YES → Limited budget?
        │       ├─ YES → Local LLM or DeepSeek
        │       └─ NO → GPT-4o or Claude 3.5
        │
        └─ NO → Is the task very specific?
                ├─ YES → Fine-tuning a small model
                └─ NO → Cloud API (GPT-4o-mini, Claude Haiku)

Implementation

Practical Architecture: The Multi-Provider Pattern

At Cadences we implement a pattern we call "AI Service with fallback". The idea is simple:

Define an abstract interface for AI operations
Implement multiple providers (14 in our case)
Route based on task type, cost and availability
If a provider fails, automatically switch to another

// Simplified AI Service example
interface AIProvider {
  chat(messages: Message[]): Promise<string>
  embed(text: string): Promise<number[]>
}

class AIService {
  private providers: Map<string, AIProvider>
  
  async chat(messages: Message[], options: ChatOptions) {
    // 1. Select provider based on task
    const provider = this.selectProvider(options)
    
    try {
      return await provider.chat(messages)
    } catch (error) {
      // 2. Fallback to alternative provider
      const fallback = this.getFallback(provider)
      return await fallback.chat(messages)
    }
  }
}

Benefits of this pattern:

Resilience: If OpenAI has an outage, you switch to Claude or Gemini
Cost optimization: Use cheap models for simple tasks
Flexibility: Adding a new provider means implementing an interface
A/B Testing: Compare responses from different models

Providers

Providers We Use (And Why)

Google Gemini

Recommended

Excellent quality/price ratio. Gemini 2.0 Flash is our default option for most tasks.

Best for: Document analysis, multimodal (images+text), general tasks

Anthropic Claude

Premium

The best for complex reasoning and instruction following. Claude 3.5 Sonnet is impressive.

Best for: Code, complex analysis, tasks requiring precision

DeepSeek

Best Price

Chinese model with quality comparable to GPT-4 at a fraction of the price. DeepSeek-V3 is excellent.

Best for: High volume, limited budget, code tasks

🦙

Ollama + Llama 3.1

Local

Our option for local execution. Llama 3.1 8B runs on consumer hardware with good results.

Best for: Sensitive data, offline, no variable costs

In Action

Real Use Cases: How We Use AI at Cadences

🤖 1. Project Assistant with Context

The Cadences assistant is not a generic chatbot. It knows your projects, tasks, clients and data. How?

Pattern: Context Window Optimization

1. Retrieve relevant context - Current project (name, description, status) - Recent tasks (last 20) - Custom project fields - Conversation history (last 10 messages) 2. Compress context - Only include data the model needs - Summarize long tasks - Omit irrelevant metadata 3. Inject into system prompt - "You are an assistant for [project name]" - "The user has [N] pending tasks" - "You can use these functions: [list]"

🏷️ 2. Automatic Ticket Classification

When an email or form arrives, we automatically classify it using a fine-tuned model:

Urgency

High / Medium / Low

🏷️

💬 3. Storefront Chatbot

Each Storefront can have a chatbot that knows the business's products and services:

// Chatbot configuration
{
  "provider": "gemini",
  "model": "gemini-2.0-flash-exp",
  "context": {
    "businessName": "Restaurant23",
    "products": [...],  // Full menu
    "faqs": [...],      // Frequently asked questions
    "rules": [
      "Always respond in English",
      "If they ask about reservations, give the phone number",
      "Don't give approximate prices, use real ones"
    ]
  }
}

Lessons Learned

Most Common Mistakes (And How to Avoid Them)

❌ Mistake 1: Send all the context

"I pass the entire database to the model so it has context"

✅ Solution: Use RAG (Retrieval Augmented Generation). Search only for relevant documents and pass them to the model.

❌ Mistake 2: Don't validate outputs

"The model returns JSON, I parse it directly"

✅ Solution: Always validate with schema (Zod, JSON Schema). Models hallucinate. Add retries with corrected prompts.

❌ Mistake 3: Premature fine-tuning

"I need fine-tuning so the model understands my business"

✅ Solution: Optimize the prompt first. Then few-shot learning. 95% of cases are solved without fine-tuning.

❌ Mistake 4: One model for everything

"We use GPT-4 for everything because it's the best"

✅ Solution: Use the right model for each task. Simple classification → small model. Complex reasoning → frontier model.

Measurement

Metrics and ROI: How to Measure Impact

Generative AI is easy to implement, hard to measure. These are the metrics we recommend:

⚡ Productivity Metrics

Time saved: Minutes per task before vs. after
Tasks completed: Work volume processed per day
Automation rate: % of tasks that don't require human intervention

✅ Quality Metrics

Accuracy: % of correct responses (you need ground truth)
Escalation rate: % of cases requiring human review
User satisfaction: NPS or response ratings

💰 Cost Metrics

Cost per request: $ spent per API call
Cost per completed task: Total $ / tasks processed
ROI: (Value generated - AI Cost) / AI Cost × 100

Want to see AI in action?

Try the Cadences assistant. It's not a generic chatbot: it knows your projects, executes actions and learns from your business.

Try AI Assistant

Final Thoughts

Conclusion: AI is a Tool, Not Magic

Generative AI is transformative, but it's not magic. It requires thoughtful architecture, proper model selection, output validation and clear metrics.

The keys to a successful implementation:

Start small: One use case, one model, measurable results
Multi-provider: Don't depend on a single vendor
Local-first when it matters: Privacy is non-negotiable for sensitive data
Measure everything: If you can't measure the impact, you can't improve it
Iterate fast: The landscape changes every month, stay updated

Generative AI won't replace your job. But someone who knows how to use it well, probably will.