Technology & Architecture
Enterprise AI infrastructure for reliable, scalable and privacy-compliant form assistants.
Architecture Overview
The FINO Suite follows a modular cloud architecture with clear layer separation. Each component is independently scalable and replaceable.
Language Models (LLMs)
FINO is model-agnostic and supports various Large Language Models. The choice of model can be configured per tenant and use case.
| Provider | Models | Use Case | EU Hosting |
|---|---|---|---|
| Anthropic | Claude Sonnet family | Dialogue management, form assistance, complex follow-up questions | ✅ via EU infrastructure |
| Amazon | Nova Pro, Nova Lite, Titan | Document analysis, image processing, embeddings | ✅ EU (Frankfurt) |
| Amazon | Nova Sonic | Speech processing (FINO Voice) | ✅ EU (Stockholm) |
| Others | Configurable on request | Customer-specific requirements | Depends on provider |
RAG - Retrieval Augmented Generation
FINO uses RAG to base AI responses on verified facts rather than general model knowledge. The result: professionally accurate, up-to-date and traceable answers.
Retrieval (Knowledge Retrieval)
For each user query, relevant information is retrieved from the connected knowledge bases.
- Vector search: Semantic matching via embeddings
- Hybrid search: Combination of semantic and keyword search
- Ranking: Relevance scoring and filtering of results
- Source references: Every piece of information is traceable to its source
Generation (Response Generation)
The language model generates a response based on the retrieved information and conversation context.
- Context window: Relevant documents are provided to the model
- Prompt engineering: Domain-specific instructions control tone and accuracy
- Validation: Responses are checked for consistency
- Multilingual: Response in the user's language, form in German
Why RAG instead of pure LLM?
Without RAG (pure LLM):
- Responses based on training data (outdated)
- Hallucinations possible
- No source references
- Not tenant-specific
With RAG (FINO):
- Responses based on current, verified sources
- Fact-based and traceable
- Source references with every response
- Individual knowledge base per tenant
MCP - Model Context Protocol
FINO uses the Model Context Protocol (MCP) as an open standard for communication between AI systems and knowledge sources. This enables a flexible, extensible architecture.
Modularity
- Knowledge sources as independent MCP servers
- Easy addition and removal of data sources
- Independent scaling per source
- Standardised interfaces
Distributed Architecture
- Multiple knowledge sources queryable in parallel
- Multi-tenant configuration
- Real-time knowledge base updates
- Interoperability with various AI models
Direct Use in AI Clients
Thanks to MCP, FINO knowledge bases can be integrated directly into AI assistants like Claude Desktop, JetBrains AI or other MCP-compatible clients. Knowledge sources are available to the AI model as tools - no custom development, no middleware required.
This means: Your specialist content is instantly available in any MCP-compatible client - whether for internal research, customer consulting or automated workflows.
Integration & Interfaces
Frontend Integration
FINO is provided as a Web Component - a single HTML tag is all you need for integration.
- No framework required
- Works in any website
- Responsive and accessible
- Customisable design
Backend Interfaces
Standardised APIs are available for deeper integrations.
- REST API: Standard HTTP interface for all products
- MCP Protocol: For knowledge base connectivity
- Webhooks: Event-based notifications
- Form mapping: Automatic mapping of AI responses to form fields
CMS Plugins
WordPress (available)
Ready-to-use plugin with graphical configuration interface. Installation via WordPress admin, branding customisation without code changes.
- All branding options (colours, texts, logos)
- Multilingual configuration
- Page-level visibility control
Other CMS (on request)
Integrations for Drupal, Joomla and Shopware are planned. Contact us for your specific use case.
Performance & Scalability
Performance
- Response times: Typically 3–5 seconds for complex queries
- Caching: Intelligent caching for frequent queries
- Streaming: Responses are streamed in real-time
- Availability: 24/7 operation with automatic failover
Scalability
- Horizontal: Automatic scaling during peak loads
- Multi-tenant: Hundreds of tenants on one infrastructure
- Modular: Individual components independently scalable
- From pilot to production: Same architecture, different sizing
Technical questions?
We are happy to explain the architecture in detail and show how FINO fits into your system landscape.