Today, we’re announcing Scale has closed $1B of financing at a $13.8B valuation, led by existing investor Accel. For 8 years, Scale has been the leading AI data foundry helping fuel the most exciting advancements in AI, including autonomous vehicles, defense applications, and generative AI. With today’s funding, we’re moving into the next phase of our journey: accelerating the abundance of frontier data to pave the road to Artificial General Intelligence (AGI). “Our vision is one of data abundance, where we have the means of production to continue scaling frontier LLMs many more orders of magnitude. We should not be data-constrained in getting to GPT-10.” - Alexandr Wang, CEO and founder of Scale AI. This new funding also enables Scale to build upon our prior model evaluation work with enterprise customers, the U.S. Department of Defense, and collaboration with the White House to deepen our capabilities and offerings for both public and private evaluations. There’s a lot left to do. If this challenge excites you, join us: https://scale.com/careers Read the full announcement: https://lnkd.in/gVBhaPZ5
Scale AI
Software Development
San Francisco, California 173,509 followers
The Data Engine that powers the most advanced AI models.
About us
At Scale, our mission is to accelerate the development of AI applications. We believe that to make the best models, you need the best data. The Scale Generative AI Platform leverages your enterprise data to customize powerful base generative models to safely unlock the value of AI. The Scale Data Engine consists of all the tools and features you need to collect, curate and annotate high-quality data, in addition to robust tools to evaluate and optimize your models. Scale powers the most advanced LLMs and generative models in the world through world-class RLHF, data generation, model evaluation, safety, and alignment. Scale is trusted by leading technology companies like Microsoft and Meta, enterprises like Fox and Accenture, Generative AI companies like Open AI and Cohere, U.S. Government Agencies like the U.S. Army and the U.S. Airforce, and Startups like Brex and OpenSea.
- Website
-
https://scale.com
External link for Scale AI
- Industry
- Software Development
- Company size
- 501-1,000 employees
- Headquarters
- San Francisco, California
- Type
- Privately Held
- Founded
- 2016
- Specialties
- Computer Vision, Data Annotation, Sensor Fusion, Machine Learning, Autonomous Driving, APIs, Ground Truth Data, Training Data, Deep Learning, Robotics, Drones, NLP, and Document Processing
Locations
-
Primary
303 2nd St
South Tower, 5th FL
San Francisco, California 94107, US
Employees at Scale AI
Updates
-
Today we’re releasing two new SEAL LLM leaderboards: Agentic Tool Use (Chat) and Agentic Tool Use (Enterprise). The leaderboards measure how well leading LLMs, including OpenAI’s o1-preview, are able to use external tools including Python Interpreter and Google Search. See how they rank: https://lnkd.in/gRbsmERa
-
Introducing: Humanity’s Last Exam. Develop the most ambitious AI benchmark to date with Scale and Center for AI Safety. Share in $500k in prizes. https://lnkd.in/gkiT-_F5 Expert-level models are rapidly evolving. We need more difficult tests to measure their progress. At Scale, we are on a mission to craft benchmarks that truly measure AI's growing capabilities, and we need your expertise to make this possible. We are collecting the hardest and broadest set of questions ever to evaluate how close we are to achieving expert-level AI across diverse domains. If you have 5+ years in a technical field or hold/are pursuing a PhD, submit your questions by November 1, 2024. The top 50 questions will earn $5,000 each, and the next 500 will earn $500 each. All selected questions grant optional co-authorship on the resulting paper. Learn more and enter: agi.safe.ai/submit
-
Scale’s headed to National Harbor for Air & Space Forces Association’s Air Space Cyber 2024! Swing by Booth MB135 and learn about Scale’s offerings for defense and government agencies including demos of Donovan, tabular data, and LLM evaluation for DoD-relevant benchmarks. Learn about Scale’s work with the public sector: https://lnkd.in/gC-v9T8g
-
LLMs have become more capable with better training and data. But they haven’t figured out how to “think” through problems at test-time. The latest research from Scale finds that simply scaling inference compute–meaning, giving models more time or attempts to solve a problem–is not effective because the attempts are not diverse enough from each other. 👉 Enter PlanSearch, a novel method for code generation that searches over high-level "plans" in natural language to encourage response diversity. PlanSearch enables the model to “think” through various strategies before generating code, making it more likely to solve the problem correctly. The Scale team tested PlanSearch on major coding benchmarks (HumanEval+, MBPP+, and LiveCodeBench) and found it consistently outperforms baselines, particularly in extended search scenarios. Overall performance improves by over 16% on LiveCodeBench from 60.6% to 77%. Here’s how it works: ✅ PlanSearch first generates high-level strategies, or "plans," in natural language before proceeding to code generation. ✅ These plans are then further broken down into structured observations and solution sketches, allowing for a wider exploration of possible solutions. This increases diversity, reducing the chance of the model recycling similar ideas. ✅ These plans are then combined before settling on the final idea and implementing the solution in code. Enabling LLMs to reason more deeply at inference time via search is one of the most exciting directions in AI right now. When PlanSearch is paired with filtering techniques—such as submitting only solutions that pass initial tests—we can get better results overall and achieve the top score of 77% with only 10 submission attempts. Big thanks to all collaborators on this paper including: Evan Wang, Hugh Zhang, Federico Cassano, Catherine Wu, Yunfeng Bai, William Song, Vaskar Nath, Ziwen H., Sean Hendryx, Summer Yue 👉 Read the full paper here: arxiv.org/abs/2409.03733
-
We’ve added Mistral Large 2, GPT-4o (August 2024), and Gemini 1.5 Pro (August 27, 2024) to the SEAL LLM Leaderboards. See how they rank compared to leading LLMs across Coding, Instruction Following, Math, and Spanish domains: https://lnkd.in/g7N_Hs9p
-
📣 Happening tomorrow! Get ahead of the curve and learn how to credibly assess LLM performance. 👇 How do you know if a model is truly solving problems or if it’s just repeating answers from its training? Scale Machine Learning Engineer, Hugh Zhang and his team tackled this question earlier in the year and discovered current benchmarks may be inadvertently compromising model evaluations due to data contamination. This is an issue because improving a model’s reasoning is key to advancing large language models overall. It’s critical the benchmarks we use accurately reflect LLMs' true reasoning capabilities. As a result, Hugh’s team developed GSM1k, a mathematical evaluation dataset comparable to GSM8k but built exclusively by humans, ensuring the models have never seen the problems they’re tested on. The researchers then used GSM1k to evaluate leading open-source and closed-source LLMs and compared it to the existing mathematical benchmark (GSM8k) performance results to discover any overfitting. Join Hugh tomorrow, Wednesday September 4 at 10AM PT, for a tech talk to learn what the team found. He will also cover the latest trends in LLM performance evaluation and how you can stay ahead of the curve in the rapidly evolving field of AI benchmarking and evaluation. Can’t make it? Register to receive the recording. Register here 👉 https://lnkd.in/gRFgMSVT
-
How do you know if a model is truly solving problems or if it’s just repeating answers from its training? Scale Machine Learning Engineer, Hugh Zhang and his team tackled this question earlier in the year and discovered current benchmarks may be inadvertently compromising model evaluations due to data contamination. This is an issue because improving a model’s reasoning is key to advancing large language models overall. It’s critical the benchmarks we use accurately reflect LLMs' true reasoning capabilities. As a result, Hugh’s team developed GSM1k, a mathematical evaluation dataset comparable to GSM8k but built exclusively by humans, ensuring the models have never seen the problems they’re tested on. The researchers then used GSM1k to evaluate leading open-source and closed-source LLMs and compared it to the existing mathematical benchmark (GSM8k) performance results to discover any overfitting. Join Hugh on Wednesday September 4 at 10AM PT for a tech talk to learn what the team found. He will also cover the latest trends in LLM performance evaluation and how you can stay ahead of the curve in the rapidly evolving field of AI benchmarking and evaluation. You don't want to miss this one. 👉 Register here: https://lnkd.in/gRFgMSVT
-
“The most valuable thing for most businesses going forward is the proprietary data that they have. You can point that model at your data and be able to extract information and value from that that no one else can.” Scale’s Managing Director and former CTO of the United States, Michael Kratsios went on the You Might be Right podcast from the Howard H. Baker Jr. School of Public Policy and Public Affairs at the University of Tennessee, Knoxville. Along with hosts, former Tennessee Governors Philip Bredesen and Bill Haslam, he discussed: • Has AI been as disruptive as we imagined a year ago? • Whether the United States is in a position to lead in AI • How businesses can use AI to their competitive advantage Listen → https://lnkd.in/dSBcT94m
-
Scale is on Forbes’ 2024 Cloud 100 list! The list recognizes the world’s top 100 cloud computing companies: https://lnkd.in/etRHGNfs Join us on our mission to accelerate the development of AI applications 👉 https://scale.com/careers