Skip to content
Applied AI

Conversational assistant without platform lock-in

Chatbots you host, control, and can audit. No per-message cost, no vendor lock-in, no customer data leaving your infrastructure.

+40%

exam completion

Assistant for a gastroenterology clinic (~380 patients/month). Self-hosted Llama 3 + LangChain, answers grounded only in clinical guidelines. Result: +25% appointment attendance and +40% endoscopy exam completion, with a lighter front desk.

The problem we solve

Platforms like Intercom, Zendesk, and similar charge per conversation and retain your data. When you want to leave, the history stays — and the per-message cost grows with your success, punishing exactly the growth you wanted. On top of that, generic chatbots get the business context wrong: they answer confidently about what they do not know, because they have no reliable access to your knowledge base.

For regulated sectors — healthcare, finance — there is one more problem: customer data flowing through a third-party platform is a compliance risk that often makes adoption impossible.

How we build

An open source stack (Llama, Mistral) or Claude via your own API, combined with RAG over your knowledge base, in an interface you host. You keep everything: the code, the interface, the history, and the infrastructure.

RAG (Retrieval-Augmented Generation) means the chatbot answers based on your documents — PDF, Notion, Confluence, database — citing the source. It is not model training: you update the documents and the chatbot uses the new content on the next query, with no retraining. This keeps answers anchored to your real content and drastically reduces hallucination.

When privacy is non-negotiable — like the gastroenterology clinic we served (around 380 patients/month) — we use self-hosted models, and the data never leaves the client's infrastructure. There, the assistant runs on Llama 3 + LangChain with answers restricted to the clinical guidelines: when a question falls outside the sources it has, it refuses to answer rather than make something up. Delivered via app and WhatsApp, with self-service kiosks that lightened the front desk, the result was +25% appointment attendance and +40% endoscopy exam completion. A small GPU is usually enough; we help you size it.

What you get

A conversational assistant in production, integrated with your channel (WhatsApp, app, web), answering over your knowledge base with source citations. No per-message cost, no lock-in, with the history and control in your hands.

Paying for a chat platform and want out?

Tell us the situation — we figure out together if self-hosted makes sense in your case.

  • RAG over PDF, Notion, Confluence, or a database, with source citations
  • Open source stack (Llama, Mistral) or Claude via your own API
  • Interface and data hosted on your infrastructure
  • Integration with WhatsApp, an app, or an existing backend
  • No per-message cost and no history retained by third parties
  • Content updates without retraining the model

Investment ranges

Micro Project

PoC, institutional site, WhatsApp and small chatbots. Non-regulated sector, or your first AI project.

$8,000 – $20,000

  • Delivery in weeks
  • RAG + light harness

Small Project

Well-defined scope: targeted automation, lean MVP, focused integration.

$20,000 – $70,000

  • Fixed scope
  • Delivery in weeks

Medium Project

RAG chatbot, enterprise AI agent, SaaS MVP, performance execution.

$70,000 – $220,000

  • Dedicated architecture
  • Integrations

Large Project

Legacy modernization, system rewrite, multi-phase transformation.

From $230,000

  • Multiple phases
  • Dedicated team

Qualitative ranges. The exact figure comes out of Discovery, and is 100% credited toward the project.

FAQ

Can I use it with my internal documents?

Yes. RAG (Retrieval-Augmented Generation) over PDF, Notion, Confluence, database — the chatbot answers based on your content, with source citations.

Do I need a dedicated server?

For self-hosted models (Llama, Mistral), yes — a small GPU is enough. For Claude/GPT via API, any server works. We help you size the infrastructure.

Is it hard to train with our content?

It is not training — it is RAG. You update the documents and the chatbot uses the new content on the next query. No model retraining.

How does the cost compare to a SaaS platform?

The investment is a project, not a per-message subscription. Once delivered, the operational cost is your own server. We estimate a qualitative range in Discovery, credited if the project moves forward.

Have a project like this?

Get an Estimate