We raised a $25M Series A! 🚀.
We raised a $25M Series A! 🚀

MEET PROTEGE
MEET PROTEGE
The data layer for AI training
The data layer for AI training
The data layer for AI training
Explore the trusted source for finding and sharing AI training data
Explore the trusted source for finding and sharing AI training data
Explore the trusted source for finding and sharing AI training data
Trusted by the world's leading AI companies
Trusted by the world's leading AI companies
Trusted by the world's leading AI companies
Sourced with Integrity
Sourced with Integrity
Sourced with Integrity
Protege enables the ethical sourcing of hard-to-find, multimodal, and real-world AI training data at scale
Protege enables the ethical sourcing of hard-to-find, multimodal, and real-world AI training data at scale




Built by Scientists
Built by Scientists
Built by Scientists
We operate as scientific partners, curating datasets from our expansive catalogue aligned to specific use cases, research goals, and regulatory standards
We operate as scientific partners, curating datasets from our expansive catalogue aligned to specific use cases, research goals, and regulatory standards
Powered by Partnership
Powered by Partnership
Powered by Partnership
We help data holders turn underutilized assets into strategic and compliant revenue streams.
We help data holders turn underutilized assets into strategic and compliant revenue streams.


Explore Our Data Products
Explore Our Data Products
Curated datasets built specifically for AI training
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media
CLERK
Clinical and billing data at the encounter level curated for real-world use; tens of millions of Electronic Health Records (EHR) and claims
Healthcare
MOCAP
Motion capture datasets of audiovisual human motion content paired with precise sensor-based metadata
Motion Capture
FRAME
Multimodal imaging dataset that includes 1 million patients and over 10 million imaging studies
Healthcare
ROLL
Globally diverse catalog of full-length scripted and unscripted movies, TV, news, sports and more
Media

Design a Dataset with Us
Design a Dataset with Us
Contact us to create a proprietary dataset that best matches your needs.

Design a Dataset with Us
Contact us to create a proprietary dataset that best matches your needs.
Protege News
Protege News
Updates on our progress
Updates on our progress
News
Our $25M Series A led by Footwork
Our $25M Series A led by Footwork

News
Our Feature in The Information
Our Feature in The Information

News
Motion Capture by Protege
Motion Capture by Protege

News
Introducing Audio & Speech
Introducing Audio & Speech

News
Our $10M Seed Round led by CRV
Our $10M Seed Round led by CRV

News
Partnership with SA



Join the Conversation
Join the Conversation
Subscribe to our Substack for expert insights on AI data, ethics, and research breakthroughs