01 Zakres zadań
- Architect, implement, and optimize end-to-end Retrieval Augmented Generation (RAG) pipelines for enterprise use cases in on-premises environments
- Design and integrate retrieval mechanisms (e.g. vector databases such as Neo4j) with generative models (e.g. LLAMA 3.2, Mistral)
- Fine-tune and optimize retrieval and generation components to achieve high accuracy and low latency
- Implement and customize inference servers using vLLM and LiteLLM for efficient and scalable LLM serving
- Integrate open-source large language models with proprietary data sources and enterprise APIs
- Design GPU-optimized, scalable on-prem infrastructure for model training and inference, ensuring security and data governance compliance
- Collaborate with DevOps teams to containerize workflows using Docker and Kubernetes and automate MLOps pipelines
- Apply performance optimization techniques such as quantization, pruning, and dynamic batching
- Monitor system performance, troubleshoot bottlenecks, and ensure high availability
- Work closely with data engineers and business stakeholders to translate business requirements into technical AI solutions in telco environments
