Leistung

KI + ML

We integrate AI into business software — RAG (retrieval-augmented generation) over your data, AI agents that take real actions, structured-output extraction from PDFs / forms / emails, classification + scoring pipelines. Our internal use cases include the Merot Finance AI bookkeeping assistant, lead scoring in Merot Leads, and invoice OCR.

Konkrete Ergebnisse

  • RAG over your knowledge base — internal docs / Notion / Confluence / Slack archives → searchable AI assistant.
  • Structured extraction from documents — invoices, contracts, forms, expense reports. JSON output you can store in your DB.
  • AI agents that take actions — book meetings, draft emails, run database queries, post to Slack.
  • Classification + scoring — lead scoring, fraud detection, sentiment, content moderation.
  • Embeddings + search — semantic search over your product catalog, support tickets, code repository.
  • On-premises model serving — when data privacy means cloud APIs are off-limits. Llama 3, Mixtral, fine-tuned smaller models.

Womit wir arbeiten

Wir wählen, was zu Ihrem Team passt — keine erzwungenen Präferenzen.

API providers

OpenAI · Anthropic Claude · Google Gemini · Mistral AI · Cohere

Frameworks

LangChain (sometimes) · LlamaIndex · Vercel AI SDK · Anthropic SDK · OpenAI SDK

Vector databases

pgvector (Postgres) · Pinecone · Qdrant · Weaviate · Chroma

Self-hosted models

Llama 3 (8B-70B) · Mistral 7B / Mixtral 8x7B · Whisper (speech-to-text) · Stable Diffusion (image)

Inference infra

AWS Bedrock · Replicate · Together AI · self-hosted via vLLM / TGI

Eval + observability

LangSmith · Helicone · OpenAI usage dashboards · custom eval harnesses

So arbeiten wir

01

Discovery (1 week)

Define the use case crisply: what input → what output. Write evals up front so we know what 'working' means.

02

Prototype (1-2 weeks)

Smallest possible LLM call that produces the desired output. Measure on the eval set. Decide: API or self-hosted, which model, what prompt.

03

Production (3-6 weeks)

Wrap with retries, fallbacks, observability, cost controls (token budgets, rate limits). Wire into your product.

04

Iterate

AI features need ongoing eval as models change. Monthly retainer or scheduled quarterly tune-up.

Aus unserer eigenen Produktion

Merot Finance AI assistant

Anthropic Claude integrated for bank-statement matching, journal-entry suggestions, and month-close review.

Merot Leads scoring

Claude for product-fit scoring on enrichment + outreach drafts. Custom prompt + structured-output extraction.

Invoice OCR pipeline

Multi-stage extraction: OCR → LLM structured output → human-review queue for low-confidence items.

Engagement-Modell

AI prototype: $10-30K (1-2 weeks, working demo + eval set + decision memo on 'should we ship'). Production AI feature: $30-100K depending on scope. Embedded AI engineer: monthly retainer. We're upfront about model costs — API token spend is billed separately on top of the engineering fee.

Häufige Fragen — KI + ML

Should I use OpenAI, Anthropic, or self-host?

Default: start with Anthropic (Claude 3.5 Sonnet / Claude 4) or OpenAI (GPT-4 family) for the prototype. Switch to self-hosted only when (a) data residency requires it, or (b) the per-call cost exceeds engineering+infra cost of running yourself. Most clients stay on the API providers for years.

Will my data train someone's model?

Not on the enterprise tiers of OpenAI / Anthropic / Google — they have explicit no-training-on-customer-data terms. We turn on those settings during onboarding.

What if the AI hallucinates / produces wrong output?

Two layers: (1) Eval harness — we measure correctness on a labelled test set before shipping and again on every prompt change. (2) Production — high-confidence outputs go straight through; low-confidence outputs go to a human-review queue.

Cost — won't this get expensive?

Common worry, often overblown. Token costs have dropped 90%+ in 18 months. Most production features cost <$1K/month in API spend at meaningful traffic. We set hard token budgets + alerts to catch runaway calls.

Do you do fine-tuning?

Sometimes — usually only when the prompt approach truly can't get there. Fine-tuning has higher up-front cost (curating training data) and re-tuning maintenance every model upgrade. Typically we recommend better prompts + RAG first.

Privacy / on-premises only — can you do that?

Yes. We've deployed Llama 3 70B and Mixtral 8x22B on-premise (single-GPU H100 or 4xA100 setups) for clients in regulated industries. Higher up-front cost, lower per-call cost, full data residency.

AI agents — are these real yet?

Cautiously yes. Single-purpose agents (book a meeting, draft an email, run a SQL query) work well with proper guardrails. Generic 'do anything' agents are still flaky. We scope to single-purpose by default.

Voice / speech?

Whisper for speech-to-text, ElevenLabs / OpenAI TTS for synthesis. We've built call-summary + voice-note transcription features for clients in the legal + healthcare verticals.

KI + ML-Projekt scopen

60 Min. Discovery-Call kostenlos. 6-seitiger Plan in 48 h.