Research Fellow / Engineer (Vision-Language Models) - WS1
Role: Research Fellow / Engineer (Vision-Language Models) - WS1
Organisation: Singapore Institute of Technology (SIT)
Location: Singapore
Contract Type: Fixed Term, Full-time
Role Overview:
As a University of Applied Learning, the Singapore Institute of Technology (SIT) works closely with industry in its research pursuits. This position is situated within the SIT x NVIDIA AI Centre (SNAIC). This role is part of an industry innovation project with a large consumer goods company, where you will develop an evaluation framework for vision-language model (VLM) with applications in the personal care sector. The research focuses on fine-grained VLM capabilities such as spatial reasoning, temporal grounding, event tracking, and domain knowledge using a curated multimodal dataset.
Responsibilities:
- Manage the research project together with the Principal Investigator (PI) and industry partner to ensure all project deliverables are met
- Design and implement evaluation frameworks and metrics for vision-language models
- Develop annotated video datasets and capability-tagged evaluation tasks
- Build end-to-end evaluation pipelines and failure mode analysis tools to analyze VLM performance across reasoning dimensions
- Prepare technical reports, publications, and industry-facing deliverables
- Mentor student assistants
- Communicate with any internal or external parties to ensure project deliverables are met
- Perform any other ad-hoc duties as assigned by Supervisor
Requirements and Qualifications:
- PhD in Computer Science or related field
- Expertise in computer vision and vision-language models
- Experience with ML evaluation metrics and benchmarking
- Proficiency in Python and deep learning frameworks (e.g., PyTorch)
- Interest in applied, industry-collaborative research