Scale AI

Software Development

San Francisco, California 173,509 followers

The Data Engine that powers the most advanced AI models.

See jobs Follow

View all 4,028 employees

About us

At Scale, our mission is to accelerate the development of AI applications. We believe that to make the best models, you need the best data. The Scale Generative AI Platform leverages your enterprise data to customize powerful base generative models to safely unlock the value of AI. The Scale Data Engine consists of all the tools and features you need to collect, curate and annotate high-quality data, in addition to robust tools to evaluate and optimize your models. Scale powers the most advanced LLMs and generative models in the world through world-class RLHF, data generation, model evaluation, safety, and alignment. Scale is trusted by leading technology companies like Microsoft and Meta, enterprises like Fox and Accenture, Generative AI companies like Open AI and Cohere, U.S. Government Agencies like the U.S. Army and the U.S. Airforce, and Startups like Brex and OpenSea.

Website: https://scale.com
External link for Scale AI
Industry: Software Development
Company size: 501-1,000 employees
Headquarters: San Francisco, California
Type: Privately Held
Founded: 2016
Specialties: Computer Vision, Data Annotation, Sensor Fusion, Machine Learning, Autonomous Driving, APIs, Ground Truth Data, Training Data, Deep Learning, Robotics, Drones, NLP, and Document Processing

Locations

Primary

303 2nd St

South Tower, 5th FL

San Francisco, California 94107, US

Get directions

Employees at Scale AI

See all employees

Updates

Scale AI

173,509 followers
4mo Edited
Report this post
Today, we’re announcing Scale has closed $1B of financing at a $13.8B valuation, led by existing investor Accel. For 8 years, Scale has been the leading AI data foundry helping fuel the most exciting advancements in AI, including autonomous vehicles, defense applications, and generative AI. With today’s funding, we’re moving into the next phase of our journey: accelerating the abundance of frontier data to pave the road to Artificial General Intelligence (AGI). “Our vision is one of data abundance, where we have the means of production to continue scaling frontier LLMs many more orders of magnitude. We should not be data-constrained in getting to GPT-10.” - Alexandr Wang, CEO and founder of Scale AI. This new funding also enables Scale to build upon our prior model evaluation work with enterprise customers, the U.S. Department of Defense, and collaboration with the White House to deepen our capabilities and offerings for both public and private evaluations. There’s a lot left to do. If this challenge excites you, join us: https://scale.com/careers Read the full announcement: https://lnkd.in/gVBhaPZ5

Scale’s Series F: Expanding the Data Foundry for AI

scale.com

62 Comments

Like Comment Share
Scale AI

173,509 followers
6h
Report this post
Today we’re releasing two new SEAL LLM leaderboards: Agentic Tool Use (Chat) and Agentic Tool Use (Enterprise). The leaderboards measure how well leading LLMs, including OpenAI’s o1-preview, are able to use external tools including Python Interpreter and Google Search. See how they rank: https://lnkd.in/gRbsmERa
2 Comments

Like Comment Share
Scale AI

173,509 followers
3d Edited
Report this post
Introducing: Humanity’s Last Exam. Develop the most ambitious AI benchmark to date with Scale and Center for AI Safety. Share in $500k in prizes. https://lnkd.in/gkiT-_F5 Expert-level models are rapidly evolving. We need more difficult tests to measure their progress. At Scale, we are on a mission to craft benchmarks that truly measure AI's growing capabilities, and we need your expertise to make this possible. We are collecting the hardest and broadest set of questions ever to evaluate how close we are to achieving expert-level AI across diverse domains. If you have 5+ years in a technical field or hold/are pursuing a PhD, submit your questions by November 1, 2024. The top 50 questions will earn $5,000 each, and the next 500 will earn $500 each. All selected questions grant optional co-authorship on the resulting paper. Learn more and enter: agi.safe.ai/submit
2 Comments

Like Comment Share
Scale AI

173,509 followers
6d
Report this post
Scale’s headed to National Harbor for Air & Space Forces Association’s Air Space Cyber 2024! Swing by Booth MB135 and learn about Scale’s offerings for defense and government agencies including demos of Donovan, tabular data, and LLM evaluation for DoD-relevant benchmarks. Learn about Scale’s work with the public sector: https://lnkd.in/gC-v9T8g
4 Comments

Like Comment Share
Scale AI

173,509 followers
1w
Report this post
LLMs have become more capable with better training and data. But they haven’t figured out how to “think” through problems at test-time. The latest research from Scale finds that simply scaling inference compute–meaning, giving models more time or attempts to solve a problem–is not effective because the attempts are not diverse enough from each other. 👉 Enter PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language to encourage response diversity. PlanSearch enables the model to “think” through various strategies before generating code, making it more likely to solve the problem correctly. The Scale team tested PlanSearch on major coding benchmarks (HumanEval+, MBPP+, and LiveCodeBench) and found it consistently outperforms baselines, particularly in extended search scenarios. Overall performance improves by over 16% on LiveCodeBench from 60.6% to 77%. Here’s how it works: ✅ PlanSearch first generates high-level strategies, or "plans," in natural language before proceeding to code generation. ✅ These plans are then further broken down into structured observations and solution sketches, allowing for a wider exploration of possible solutions. This increases diversity, reducing the chance of the model recycling similar ideas. ✅ These plans are then combined before settling on the final idea and implementing the solution in code. Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. When PlanSearch is paired with filtering techniques—such as submitting only solutions that pass initial tests—we can get better results overall and achieve the top score of 77% with only 10 submission attempts. Big thanks to all collaborators on this paper including: Evan Wang, Hugh Zhang, Federico Cassano, Catherine Wu, Yunfeng Bai, William Song, Vaskar Nath, Ziwen H., Sean Hendryx, Summer Yue 👉 Read the full paper here: arxiv.org/abs/2409.03733
2 Comments

Like Comment Share
Scale AI

173,509 followers
2w
Report this post
We’ve added Mistral Large 2, GPT-4o (August 2024), and Gemini 1.5 Pro (August 27, 2024) to the SEAL LLM Leaderboards. See how they rank compared to leading LLMs across Coding, Instruction Following, Math, and Spanish domains: https://lnkd.in/g7N_Hs9p
1 Comment

Like Comment Share
Scale AI

173,509 followers
2w
Report this post
📣 Happening tomorrow! Get ahead of the curve and learn how to credibly assess LLM performance. 👇 How do you know if a model is truly solving problems or if it’s just repeating answers from its training? Scale Machine Learning Engineer, Hugh Zhang and his team tackled this question earlier in the year and discovered current benchmarks may be inadvertently compromising model evaluations due to data contamination. This is an issue because improving a model’s reasoning is key to advancing large language models overall. It’s critical the benchmarks we use accurately reflect LLMs' true reasoning capabilities. As a result, Hugh’s team developed GSM1k, a mathematical evaluation dataset comparable to GSM8k but built exclusively by humans, ensuring the models have never seen the problems they’re tested on. The researchers then used GSM1k to evaluate leading open-source and closed-source LLMs and compared it to the existing mathematical benchmark (GSM8k) performance results to discover any overfitting. Join Hugh tomorrow, Wednesday September 4 at 10AM PT, for a tech talk to learn what the team found. He will also cover the latest trends in LLM performance evaluation and how you can stay ahead of the curve in the rapidly evolving field of AI benchmarking and evaluation. Can’t make it? Register to receive the recording. Register here 👉 https://lnkd.in/gRFgMSVT
Like Comment Share
Scale AI

173,509 followers
1mo
Report this post
How do you know if a model is truly solving problems or if it’s just repeating answers from its training? Scale Machine Learning Engineer, Hugh Zhang and his team tackled this question earlier in the year and discovered current benchmarks may be inadvertently compromising model evaluations due to data contamination. This is an issue because improving a model’s reasoning is key to advancing large language models overall. It’s critical the benchmarks we use accurately reflect LLMs' true reasoning capabilities. As a result, Hugh’s team developed GSM1k, a mathematical evaluation dataset comparable to GSM8k but built exclusively by humans, ensuring the models have never seen the problems they’re tested on. The researchers then used GSM1k to evaluate leading open-source and closed-source LLMs and compared it to the existing mathematical benchmark (GSM8k) performance results to discover any overfitting. Join Hugh on Wednesday September 4 at 10AM PT for a tech talk to learn what the team found. He will also cover the latest trends in LLM performance evaluation and how you can stay ahead of the curve in the rapidly evolving field of AI benchmarking and evaluation. You don't want to miss this one. 👉 Register here: https://lnkd.in/gRFgMSVT
4 Comments

Like Comment Share
Scale AI

173,509 followers
1mo
Report this post
“The most valuable thing for most businesses going forward is the proprietary data that they have. You can point that model at your data and be able to extract information and value from that that no one else can.” Scale’s Managing Director and former CTO of the United States, Michael Kratsios went on the You Might be Right podcast from the Howard H. Baker Jr. School of Public Policy and Public Affairs at the University of Tennessee, Knoxville. Along with hosts, former Tennessee Governors Philip Bredesen and Bill Haslam, he discussed: • Has AI been as disruptive as we imagined a year ago? • Whether the United States is in a position to lead in AI • How businesses can use AI to their competitive advantage Listen → https://lnkd.in/dSBcT94m
2 Comments

Like Comment Share
Scale AI

173,509 followers
1mo
Report this post
Scale is on Forbes’ 2024 Cloud 100 list! The list recognizes the world’s top 100 cloud computing companies: https://lnkd.in/etRHGNfs Join us on our mission to accelerate the development of AI applications 👉 https://scale.com/careers
2 Comments

Like Comment Share

Browse jobs

Funding

Scale AI 8 total rounds

Last Round

Series F Jun 21, 2024

US$ 1.0B

Investors

Accel + 20 Other investors

See more info on crunchbase

Scale AI

Software Development

San Francisco, California 173,509 followers

The Data Engine that powers the most advanced AI models.

About us

Locations

Employees at Scale AI

Ofer SHOSHAN

Entrepreneur, Tech. investor

Petri Kaivanto

Freelance Translator at Iyuno SDI Media

Milind Mehere

Yieldstreet Founder, Cofounder Yodle, CEO, Investor & Board Member

Raja Aluri

Head of Developer Experience at Scale AI

Updates

Join now to see what you are missing

Similar pages

Outlier

Remotasks

OpenAI

Outlier AI

Anthropic

Cohere

Snorkel AI

Passes

Soul AI

Perplexity

Browse jobs

Scale AI jobs

Analyst jobs

Engineer jobs

Manager jobs

Director jobs

Intern jobs

Associate jobs

Scientist jobs

Developer jobs

Account Executive jobs

Executive jobs

Solutions Engineer jobs

Vice President jobs

Project Manager jobs

Software Engineer jobs

Marketing Manager jobs

Machine Learning Engineer jobs

Associate Product Manager jobs

Product Manager jobs

Head jobs

Funding