Senior Software Engineer (GPU Performance)
Microsoft | |
United States, Washington, Redmond | |
Nov 14, 2024 | |
OverviewThe Artificial Intelligence (AI) Frameworks team at Microsoft develops the AI software used to train and deploy the world's most advanced AI models. We collaborate with our hardware teams and partners to build the software stacks for Microsoft's next-generation supercomputers and the new Maia-100 AI accelerator. We work closely with ML researchers and developers to optimize and scale out model training and inference. We work directly with OpenAI on the models hosted on the Azure OpenAI service. We are hiring a Senior Software Engineer (GPU Performance) to work on GPU performance analysis and optimization. As a member of this team, you will have the opportunity to work on the fundamental abstractions, programming models, runtimes, libraries and APIs to enable large scale training and inferencing of models on novel AI hardware. This is a technical role focused on performance analysis and optimization of machine learning models: it requires hands-on software development skills. We're looking for someone who has a demonstrated history of solving hard technical problems and is motivated to tackle the hardest problems in building a full end-to-end AI stack. An entrepreneurial approach and ability to take initiative and move fast are essential. We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
ResponsibilitiesSoftware development in C/C++, Python, and in GPU languages such as CUDA, ROCm, or TritonWork with cutting-edge hardware stacks and a fast-moving software stack to deliver best-of-class inference and optimal cost.Engage with key partners to understand and implement performance analysis and optimization for state-of-the-art LLMs and other models.Embody our culture and values |