In this role you will have the opportunity to develop data generation methodologies for personalized model and search systems evaluations, online and offline metrics for quality assessment and build scalable systems for comprehensive evaluation. Our team is responsible for models and evaluation of search experiences that answer userβs questions using their personal documents with privacy at the forefront.
Role responsibilities include:
- Design and implement novel data generation and data quality assessment methods to support modeling and evaluation of personal Q&A.
- Build scalable, automated systems for large scale, end-to-end evaluation of models and search powered systems.
- Design and implement offline and online metrics to assess product and component level quality for personal Q&A.
- Collaborate with partner teams to define data and evaluation requirements and priorities, and to explore opportunities for enhancements to the Personal Q&A stack.
- Develop long-term technical vision for Personal Q&A quality; identify problem areas and drive solutions as part of a larger roadmap.
6+ years industry experience in building Machine Learning or ML evaluation systems at scale
Strong software engineering skills in mainstream programming languages, such as: Python, C/C++
Strong communication skills and ability to drive solutions in collaboration with partner teams
Bachelors in Computer Science, Engineering, Statistics or related field
Experience in design and building production ML systems and applications in search, NLP, recommendation systems, or information retrieval
Experience in data collection, data generation and/or data quality assessment of language, image or multi-modal data
Ability to quickly prototype ideas and solutions, and perform critical analysis
Strong skills for quality metrics development; interpretation of evaluations; and presentation to executive audience
Advanced degree (Masterβs or Ph.D.) in Computer Science, Engineering, Statistics, or related field, or equivalent industry work experience