Principal Software Engineer - GPU Performance

Microsoft
United States, Washington, Redmond
Nov 20, 2024
OverviewMicrosoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world. The Artificial Intelligence (AI) Platform organization at Microsoft builds the end-to-end Azure AI stack/Platform as a Service (PaaS) and is core to Azure's innovation and differentiation, as well as all of Microsoft's flagship products, from Office to Teams, to Xbox. We are the team building Azure OpenAI, Azure Machine Learning (ML), Cognitive Services, and the global Azure AI infrastructure for running the largest AI workloads on the planet. We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.The Artificial Intelligence (AI) Frameworks team at Microsoft develops the AI software used to train and deploy the world's most advanced AI models. We collaborate with our hardware teams and partners to build the software stacks for Microsoft's next-generation supercomputers and the new Maia-100 AI accelerator. We work closely with ML researchers and developers to optimize and scale out model training and inference. We work directly with OpenAI on the models hosted on the Azure OpenAI service.We are hiring a Principal Software Engineer to work on graphics processing unit (GPU) performance analysis and optimization. As a member of this team, you will have the opportunity to work on the fundamental abstractions, programming models, runtimes, libraries and application programming interfaces (APIs) to enable large scale training and inferencing of models on novel AI hardware. This is a technical role: it requires hands on software design and development skills. We're looking for someone who has a demonstrated history of solving hard technical problems and is motivated to tackle the hardest problems in building a full end-to-end AI stack. An entrepreneurial approach and ability to take initiative and move fast are essential. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day.Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. ResponsibilitiesCollaborate broadly across multiple disciplines from hardware designers to ML developers Engage with key partners to understand and implement robust performance analysis and optimization for state-of-the-art large language models (LLMs) and other models. Perform software development in C/C++, Python, and GPU development in languages such as CUDA, ROCm, or Triton. Identify requirements, scope solutions, estimate work, schedule deliverables.Embody our Culture and Values