Character.AI

Research Engineer, Multimodal

Redwood City, CA (Remote)$225k–$400kfulltimemidAdded 2 days ago

About this role

Join Character.AI's Multimodal team as a Research Engineer to develop advanced video and image generation models that power AI character interactions. You'll lead model training efforts including audio-visual generation and image-to-video capabilities, while collaborating across research, product, and infrastructure to scale visual AI experiences.

What you'll do

Lead fine-tuning and training of video generation models including image-to-video and joint audio-visual generation
Design and experiment with novel multimodal architectures with conditioning from voice, text, and reference images
Apply techniques like LoRA, RLHF, and full-parameter fine-tuning to improve model quality
Build large-scale data pipelines and automated annotation workflows for continuous improvement
Explore model compression and inference optimization for efficient real-time video processing at scale

What they're looking for

PyTorch proficiency with end-to-end ML experience
Video and image generation architectures (diffusion models, DiT, ControlNet)
Multimodal model training with audio, vision, and language
Distributed training tools (FSDP, DeepSpeed)
Large-scale data processing and dataset construction
Model deployment and optimization
Audio-visual or speech-conditioned generation (nice to have)
ML deployment tools like Kubernetes, Docker, cloud platforms (nice to have)

Apply on the employer's site →

Opens the official application on the employer’s site. No login required.