AI Inference Junior Engineer | Delhi , Noida

We are seeking a highly motivated AI Inference Junior Engineer to support the deployment, optimization, and operation of AI models across modern GPU infrastructure. This role is ideal for early-career engineers passionate about Artificial Intelligence, Machine Learning Infrastructure, Cloud Computing, GPU Acceleration, and Large Language Model (LLM) serving.

As an AI Inference Junior Engineer, you will work alongside experienced AI, platform, and cloud engineers to deploy and manage production-grade AI models, optimize inference performance, support scalable serving environments, and contribute to AI platform development. You will gain hands-on experience with cutting-edge AI technologies, NVIDIA GPU environments, Kubernetes-based infrastructure, cloud-native platforms, and modern inference frameworks.

This position provides an excellent opportunity to build expertise in AI infrastructure, model serving, GPU optimization, and large-scale AI deployment while contributing to innovative AI-powered products and services.

Key Responsibilities

AI Model Deployment & Operations

Assist in deploying and managing Large Language Models (LLMs), multimodal models, vision models, speech models, and embedding models.
Support AI model serving and inference workflows across production and testing environments.
Participate in model versioning, deployment validation, and rollback procedures.
Assist in implementing scalable AI serving architectures.
Monitor model performance and help optimize inference efficiency.

GPU & Performance Optimization

Support GPU resource monitoring and utilization analysis.
Assist in optimizing inference latency, throughput, and memory usage.
Learn and implement model optimization techniques including quantization and caching strategies.
Help identify performance bottlenecks using monitoring and profiling tools.
Contribute to benchmarking AI workloads across different hardware environments.

Cloud & Infrastructure Engineering

Support deployment of containerized AI workloads using Kubernetes and Docker.
Assist in managing cloud-based AI infrastructure environments.
Participate in infrastructure monitoring, troubleshooting, and maintenance activities.
Help maintain scalable and reliable inference clusters.
Support automation and infrastructure improvement initiatives.

Inference Framework Support

Gain Experience With Modern Inference Frameworks Such As

vLLM
NVIDIA TensorRT-LLM
Triton Inference Server
TGI (Text Generation Inference)
Ollama
Ray Serve
SGLang
OpenAI-Compatible APIs

Platform Development

Assist in developing APIs and backend services supporting AI workloads.
Support authentication, usage tracking, monitoring, and platform integrations.
Collaborate with engineering teams to improve platform reliability and scalability.
Participate in testing and deployment activities for AI platform services.
Contribute to documentation and operational procedures.

Collaboration & Learning

Work closely with AI engineers, data scientists, cloud engineers, and platform teams.
Participate in code reviews, technical discussions, and knowledge-sharing sessions.
Stay updated with emerging AI technologies, LLM frameworks, and GPU innovations.
Continuously improve technical skills in AI infrastructure and cloud-native technologies.

Required Skills

Programming & Development

Strong foundation in Python programming.
Understanding of software engineering principles and coding best practices.
Familiarity with REST APIs and backend development concepts.
Basic understanding of version control systems such as Git.
Ability to write clean, maintainable, and testable code.

AI & Machine Learning

Understanding of Machine Learning fundamentals and AI model deployment concepts.
Familiarity with transformer architectures and Large Language Models (LLMs).
Exposure to:

PyTorch
Hugging Face Transformers
Embedding Models
RAG (Retrieval-Augmented Generation)

Interest in model optimization and inference performance.

Cloud & Infrastructure

Basic knowledge of Docker and containerization concepts.
Familiarity with Kubernetes fundamentals.
Understanding of Linux operating systems and command-line environments.
Exposure to AWS, Azure, GCP, or cloud computing concepts.
Basic knowledge of distributed systems and cloud-native architectures.

Databases & Backend Technologies

Understanding of:

PostgreSQL
MongoDB
Redis

Familiarity with API integrations and data management concepts.
Basic understanding of event-driven systems and microservices architecture.

Professional Skills

Strong analytical and problem-solving abilities.
Excellent communication and collaboration skills.
Ability to learn new technologies quickly.
Strong attention to detail and commitment to quality.
Self-motivated and eager to work in fast-paced technology environments.

Education

Bachelor's degree in Computer Science, Artificial Intelligence, Data Science, Information Technology, Software Engineering, Electronics, or a related field.
B.Tech, BE, BCA, B.Sc. (Computer Science/IT), or equivalent qualification.
Master's degree in AI, Machine Learning, Computer Science, or related disciplines is an advantage but not mandatory.

AI Inference Junior Engineer

Recommended Jobs