DescriptionWe are looking for a Computational Scientist in Computational Biology and Machine Learning to join our growing translational research program at the Tisch Cancer Institute. Our team studies myeloproliferative neoplasms (MPNs), acute myeloid leukemia (AML), and related myeloid malignancies, combining single-cell multi-omics, clinical data, and artificial intelligence–based approaches to understand disease mechanisms, identify biomarkers, support drug development, and improve patient care.
The scientist will work closely with longitudinal patient datasets, integrating genomics, immune and cytokine profiling, treatment responses, and clinical trial outcomes.
The scientist will report directly to Dr. Md Babu Mia, lead of the Computational Biology and Machine Learning program within the MPN team.
ResponsibilitiesComputational Biology & Single-Cell Analytics
- Lead analysis of single-cell genomics datasets and build reproducible pipelines for data integration, clustering, differential expression, and clonal architecture reconstruction
- Apply rigorous statistical methods that appropriately account for sample-level replication, longitudinal structure, and multi-modal data
Machine Learning & Predictive Modeling
- Build machine learning models linking genomic drivers to clinical phenotypes, cytokine profiles, and treatment outcomes using ensemble methods and deep learning
- Develop interpretable risk stratification models for disease progression and treatment response, with a focus on clinical relevance
AI & Large Language Model Development
- Develop retrieval-augmented generation (RAG) systems and AI-assisted workflows that enable natural-language querying of clinical and genomic datasets
- Build LLM-powered pipelines for extracting structured information from clinical notes and pathology reports, with an emphasis on transparency and clinical usability
Data Integration & Infrastructure
- Build unified data models connecting treatments, laboratory results, cytokines, multi-omic biomarkers, and clinical trial endpoints
- Maintain HIPAA-compliant databases and ETL pipelines, and develop dashboards and APIs to support cross-institutional collaboration
Scientific Communication
- Prepare publication-ready figures and analyses, and contribute to manuscripts, grant applications, and research proposals
- Present findings at conferences and collaborate closely with lab scientists and clinicians
Qualifications- Masters degree or equivalent in a domain science; Ph.D in Computational Biology, Bioinformatics, Computer Science, Data Science, or related scientific domain preferred.
- 3 years, preferably in a scientific/academic computing environment or equivalent experience.
- Experience in batch HPC cluster environment with a parallel file system
- Experience installing and supporting bio and chemistry codes (NAMD, AMBER, Matlab, Gromacs, DESMOND) and laboratory equipment such as sequencers, etc.
- Experience with MPI, Open MP and numerical libraries
- Experience with scientific workflows
- Experience with instrumenting and optimizing application codes
- Experience in an academic or research community environment
- Programming experience in any applicable language
Preferred:
- Strong experience with next-generation sequencing data analysis, and proficiency in Python and R
- Demonstrated track record building machine learning models for biomedical applications; familiarity with LLM frameworks or RAG systems
- Experience with cloud or HPC environments, containerization (Docker), and database design
- Background in building dashboards, LLM fine-tuning, and working across laboratory, clinical, and computational teams