
Acquired by Check Point in 2026
Deepchecks is an enterprise-grade evaluation, observability, and monitoring platform for production AI. As organizations move from generative AI pilots into real production deployments, they are hitting a structural problem the existing tooling does not solve: LLM evaluation in the wild is not a problem you can address with benchmarks, unit tests, or an open-source script running LLM-as-a-judge. Generative AI introduces a new class of quality problems that require expert judgment, deep context, and repeated review, which makes quality assurance slow, inconsistent, and fragile, especially as models, prompts, and agentic workflows evolve. Most teams end up stitching together brittle infrastructure they cannot trust and cannot operate at scale. Deepchecks unifies that fragmented stack into a single platform. The system compares versions of prompts, models, agents, and AI systems against each other, sets up auto-scoring pipelines that handle nuanced constraints, generates evaluation datasets and LLM judges in minutes, and tests LLM applications inside CI/CD before pushing them into production monitoring. The result is the visibility, control, and trust that AI teams need to put generative systems in front of customers and keep them working there.
The platform is built for the requirements of regulated and security-conscious enterprises from day one, with SOC 2 Type 2, GDPR, and HIPAA compliance, single sign-on, AWS GovCloud support, and deployment options that span SaaS, virtual private cloud on GCP or Azure, bare-metal and air-gapped environments, and a native AWS-managed deployment via Amazon SageMaker Partner AI Apps with direct integrations into Bedrock and SageMaker. Customers include Anthem, Booking.com, Wix, AWS, Moovit, Takeda, America First Credit Union, MIT, and a Fortune 50 global pharmaceutical company using Deepchecks to evaluate its internal AI platform. The open-source ML testing library that the company began with continues to underpin the commercial platform and has over 4,000 GitHub stars, reflecting the depth of the team's roots in the research and developer community.







