Deep Learning Algorithms Engineer - ACOT
Nvidia Xem tất cả việc làm
- Tp Hồ Chí Minh
- Đào tạo
- Toàn thời gian
- Assist in optimizing AI models such as LLMs, VLMs, diffusion models, and multimodal models for inference and training on NVIDIA GPUs.
- Profile workloads and help identify performance bottlenecks across GPU compute, memory, networking, and storage.
- Support the development and integration of optimization techniques such as quantization, kernel fusion, parallelism, and memory efficiency improvements.
- Use tools including CUDA, TensorRT, Nsight, and NVIDIA acceleration libraries to analyze and improve model performance.
- Work with deep learning frameworks including PyTorch, JAX, and TensorFlow, as well as open-source inference frameworks like vLLM and SGLang.
- Contribute to performance benchmarking, testing, and internal tooling to improve optimization workflows.
- Partner with senior engineers and multi-functional teams to evaluate workload behavior and support future performance improvements.
- Bachelor’s or Master’s degree in Computer Science, Electrical Engineering, Computer Engineering, or related field (or equivalent experience).
- 2–4 years of experience, or strong academic/project experience, in deep learning, performance engineering, systems, or high-performance computing.
- Good understanding of deep learning fundamentals and modern AI model architectures, especially transformers.
- Familiarity with GPU architecture and parallel computing concepts such as CUDA, kernels, memory hierarchy, and streams.
- Exposure to profiling and performance analysis tools.
- Programming skills in Python.
- Experience with at least one major ML framework such as PyTorch, TensorFlow, or JAX.
- Internship, research, or project experience optimizing AI/ML workloads on GPUs.
- Hands-on experience with TensorRT, TensorRT-LLM, vLLM, SGLang, or similar inference/runtime frameworks.
- Familiarity with quantization, sparsity, or mixed-precision techniques.
- Experience with distributed training or inference concepts. Contributions to open-source ML systems, performance tools, or infrastructure projects.
- Proficiency in C++, strong debugging skills and interest in low-level performance optimization.