KI + ML
We integrate AI into business software — RAG (retrieval-augmented generation) over your data, AI agents that take real actions, structured-output extraction from PDFs / forms / emails, classification + scoring pipelines. Our internal use cases include the Merot Finance AI bookkeeping assistant, lead scoring in Merot Leads, and invoice OCR.
Konkrete Ergebnisse
- RAG over your knowledge base — internal docs / Notion / Confluence / Slack archives → searchable AI assistant.
- Structured extraction from documents — invoices, contracts, forms, expense reports. JSON output you can store in your DB.
- AI agents that take actions — book meetings, draft emails, run database queries, post to Slack.
- Classification + scoring — lead scoring, fraud detection, sentiment, content moderation.
- Embeddings + search — semantic search over your product catalog, support tickets, code repository.
- On-premises model serving — when data privacy means cloud APIs are off-limits. Llama 3, Mixtral, fine-tuned smaller models.
Womit wir arbeiten
Wir wählen, was zu Ihrem Team passt — keine erzwungenen Präferenzen.
API providers
OpenAI · Anthropic Claude · Google Gemini · Mistral AI · Cohere
Frameworks
LangChain (sometimes) · LlamaIndex · Vercel AI SDK · Anthropic SDK · OpenAI SDK
Vector databases
pgvector (Postgres) · Pinecone · Qdrant · Weaviate · Chroma
Self-hosted models
Llama 3 (8B-70B) · Mistral 7B / Mixtral 8x7B · Whisper (speech-to-text) · Stable Diffusion (image)
Inference infra
AWS Bedrock · Replicate · Together AI · self-hosted via vLLM / TGI
Eval + observability
LangSmith · Helicone · OpenAI usage dashboards · custom eval harnesses
So arbeiten wir
Discovery (1 week)
Define the use case crisply: what input → what output. Write evals up front so we know what 'working' means.
Prototype (1-2 weeks)
Smallest possible LLM call that produces the desired output. Measure on the eval set. Decide: API or self-hosted, which model, what prompt.
Production (3-6 weeks)
Wrap with retries, fallbacks, observability, cost controls (token budgets, rate limits). Wire into your product.
Iterate
AI features need ongoing eval as models change. Monthly retainer or scheduled quarterly tune-up.
Aus unserer eigenen Produktion
Merot Finance AI assistant
Anthropic Claude integrated for bank-statement matching, journal-entry suggestions, and month-close review.
Merot Leads scoring
Claude for product-fit scoring on enrichment + outreach drafts. Custom prompt + structured-output extraction.
Invoice OCR pipeline
Multi-stage extraction: OCR → LLM structured output → human-review queue for low-confidence items.
Engagement-Modell
Häufige Fragen — KI + ML
Should I use OpenAI, Anthropic, or self-host?
Default: start with Anthropic (Claude 3.5 Sonnet / Claude 4) or OpenAI (GPT-4 family) for the prototype. Switch to self-hosted only when (a) data residency requires it, or (b) the per-call cost exceeds engineering+infra cost of running yourself. Most clients stay on the API providers for years.
Will my data train someone's model?
Not on the enterprise tiers of OpenAI / Anthropic / Google — they have explicit no-training-on-customer-data terms. We turn on those settings during onboarding.
What if the AI hallucinates / produces wrong output?
Two layers: (1) Eval harness — we measure correctness on a labelled test set before shipping and again on every prompt change. (2) Production — high-confidence outputs go straight through; low-confidence outputs go to a human-review queue.
Cost — won't this get expensive?
Common worry, often overblown. Token costs have dropped 90%+ in 18 months. Most production features cost <$1K/month in API spend at meaningful traffic. We set hard token budgets + alerts to catch runaway calls.
Do you do fine-tuning?
Sometimes — usually only when the prompt approach truly can't get there. Fine-tuning has higher up-front cost (curating training data) and re-tuning maintenance every model upgrade. Typically we recommend better prompts + RAG first.
Privacy / on-premises only — can you do that?
Yes. We've deployed Llama 3 70B and Mixtral 8x22B on-premise (single-GPU H100 or 4xA100 setups) for clients in regulated industries. Higher up-front cost, lower per-call cost, full data residency.
AI agents — are these real yet?
Cautiously yes. Single-purpose agents (book a meeting, draft an email, run a SQL query) work well with proper guardrails. Generic 'do anything' agents are still flaky. We scope to single-purpose by default.
Voice / speech?
Whisper for speech-to-text, ElevenLabs / OpenAI TTS for synthesis. We've built call-summary + voice-note transcription features for clients in the legal + healthcare verticals.
KI + ML-Projekt scopen
60 Min. Discovery-Call kostenlos. 6-seitiger Plan in 48 h.