Character.AI
Research Engineer, Multimodal
Redwood City, CA (Remote)$225k–$400kfulltimemidAdded 2 days ago
About this role
Join Character.AI's Multimodal team as a Research Engineer to develop advanced video and image generation models that power AI character interactions. You'll lead model training efforts including audio-visual generation and image-to-video capabilities, while collaborating across research, product, and infrastructure to scale visual AI experiences.
What you'll do
- Lead fine-tuning and training of video generation models including image-to-video and joint audio-visual generation
- Design and experiment with novel multimodal architectures with conditioning from voice, text, and reference images
- Apply techniques like LoRA, RLHF, and full-parameter fine-tuning to improve model quality
- Build large-scale data pipelines and automated annotation workflows for continuous improvement
- Explore model compression and inference optimization for efficient real-time video processing at scale
What they're looking for
- PyTorch proficiency with end-to-end ML experience
- Video and image generation architectures (diffusion models, DiT, ControlNet)
- Multimodal model training with audio, vision, and language
- Distributed training tools (FSDP, DeepSpeed)
- Large-scale data processing and dataset construction
- Model deployment and optimization
- Audio-visual or speech-conditioned generation (nice to have)
- ML deployment tools like Kubernetes, Docker, cloud platforms (nice to have)
Opens the official application on the employer’s site. No login required.