Skip to main content

Character.AI

Research Engineer, Multimodal

Redwood City, CA (Remote)$225k–$400kfulltimemidAdded 2 days ago

About this role

Join Character.AI's Multimodal team as a Research Engineer to develop advanced video and image generation models that power AI character interactions. You'll lead model training efforts including audio-visual generation and image-to-video capabilities, while collaborating across research, product, and infrastructure to scale visual AI experiences.

What you'll do

  • Lead fine-tuning and training of video generation models including image-to-video and joint audio-visual generation
  • Design and experiment with novel multimodal architectures with conditioning from voice, text, and reference images
  • Apply techniques like LoRA, RLHF, and full-parameter fine-tuning to improve model quality
  • Build large-scale data pipelines and automated annotation workflows for continuous improvement
  • Explore model compression and inference optimization for efficient real-time video processing at scale

What they're looking for

  • PyTorch proficiency with end-to-end ML experience
  • Video and image generation architectures (diffusion models, DiT, ControlNet)
  • Multimodal model training with audio, vision, and language
  • Distributed training tools (FSDP, DeepSpeed)
  • Large-scale data processing and dataset construction
  • Model deployment and optimization
  • Audio-visual or speech-conditioned generation (nice to have)
  • ML deployment tools like Kubernetes, Docker, cloud platforms (nice to have)
Apply on the employer's site

Opens the official application on the employer’s site. No login required.