AI Submissions for Tue Apr 08 2025
Obituary for Cyc
Submission URL | 408 points | by todsacerdoti | 216 comments
In the realm of artificial intelligence, few names evoke as much intrigue as Douglas Lenat's, particularly due to his monumental Cyc project. Begun in 1985, Cyc aimed to crack the nut of artificial general intelligence through an ambitious strategy: scaling symbolic logic by encoding vast amounts of common sense knowledge. Lenat believed that for AI to truly learn and reason like humans, it needed a massive repository of world knowledge to draw from—a theory steeped in his experiences with past projects like the Automated Mathematician and EURISKO.
Despite Lenat's unwavering dedication and the project's staggering investment—culminating in 30 million assertions at the cost of $200 million and 2,000 person-years—the breakthroughs promised never materialized. The Cyc system, conceived as a groundbreaking "knowledge pump," never achieved the self-sustaining knowledge expansion and learning Lenat envisioned. Instead, its practical applications remained rooted in established AI methodologies, akin to those employed by tech giants like Oracle and IBM, offering no discernible competitive edge.
The project's long-term sustainability can partly be attributed to substantial funding from military and commercial sectors. However, this financial footing didn’t translate into revolutionary academic contributions or public success stories. As academia largely shunned Cyc for its inaccessibility and lack of performance on public benchmarks, the project became an insular endeavor, closed off from the outside innovation swirling around in AI research.
Despite the ultimate lack of success in achieving its grand ambitions, the Cyc project stands as a fascinating chapter in AI history—a testament to the challenges of scaling symbolic logic for general intelligence. The story of Cyc, and the "secret history" behind it, underline the complexities and limits of symbolic approaches in AI, raising questions—and lessons—for the field's ongoing evolution. Today, with much of Cyc's archival materials now available for public scrutiny on platforms like GitHub, the legacy of Douglas Lenat's quest sheds light on the perpetual struggle to teach machines to understand the world as deeply as humans do.
The Hacker News discussion revolves around the legacy of Douglas Lenat’s Cyc project and its contrast with modern AI approaches like Large Language Models (LLMs). Here’s a concise summary of key points:
Cyc’s Ambitions vs. Reality
- Cyc’s Goal: Encode "common sense" via symbolic logic (e.g., understanding that a person can’t shave with an unplugged electric razor).
- Critique: Despite decades of effort and millions of hand-coded assertions, Cyc struggled with scalability, brittleness, and practical relevance. Participants noted its failure to achieve self-sustaining learning or outperform traditional rule-based systems (e.g., Prolog) or newer data-driven methods.
Symbolic AI vs. LLMs
- Symbolic AI (Cyc/GOFAI): Relied on explicit rules and logic but was limited by manual curation and inability to handle real-world ambiguity. Examples like SAT solvers and planners were praised for structured problems but criticized for lacking flexibility.
- LLMs: Praised for scaling via vast data and "fuzzy" contextual understanding. While not perfect, LLMs excel at generating plausible outputs by recognizing patterns, even if they lack true reasoning. Critics argued LLMs still struggle with logical rigor (e.g., transitive relations).
Hybrid Approaches
- Bridging the Gap: Some suggested combining LLMs (for natural language understanding) with symbolic systems (for structured reasoning). For example, using LLMs to translate natural language into Prolog/Python code or structured formats (JSON) for traditional computation.
- Use Cases: LLMs could handle ambiguous tasks (e.g., parsing nutrition labels), while symbolic systems manage precise logic (e.g., scheduling, math).
Legacy and Lessons
- Cyc’s Legacy: A bold experiment highlighting the challenges of manual knowledge engineering. Its closed, insular development contrasted with today’s open, collaborative AI research.
- Modern Parallels: Debates echoed around whether LLMs are merely "stochastic parrots" or stepping stones toward AGI. Participants acknowledged that neither symbolic nor statistical methods alone suffice for general intelligence.
Final Takeaway
The discussion underscores a shift from rigid symbolic systems (Cyc) to flexible, data-driven models (LLMs), while acknowledging that hybrid approaches—leveraging the strengths of both paradigms—might be the future of AI. Cyc remains a cautionary tale about scalability and the limits of human-curated knowledge, even as LLMs redefine what’s possible.
Solving a “Layton Puzzle” with Prolog
Submission URL | 101 points | by Tomte | 27 comments
Have you ever wanted to solve a brain-tickling puzzle using logic programming? Here's a delightful tale of solving a "Layton Puzzle" using Prolog—a classic logic programming language. Imagine a test with 10 questions, each with true/false options, and you'd like to determine the fourth student's score based on the scores of the first three students with known answers. Pablo Meier once tackled a similar challenge, using Prolog to crack the code, but this tale goes a step further with an elegant solution in fewer lines of code.
Firstly, here's the setup: You know how three students scored and their answers, and need to predict the elusive score of the fourth student. With a quick dive into Prolog, the programmer begins by importing essential modules for handling inequalities and setting up constraints. Using Prolog’s pattern matching prowess, we create a recursive function to calculate a score by comparing answers to a key, cleverly bypassing more procedural if-else statements traditional programming uses.
The magic of Prolog shines here with its bidirectionality; you can use the program both to verify a known score and to deduce an unknown one. Through predicates and mappings, Prolog also generates potential answer keys, further showcasing the elegant and multipurpose nature of the language.
When the program runs, it finds four possible keys all producing the same score for the mystery student, revealing that the puzzle isn't broken but intriguingly self-consistent. This logic solution not only solves the problem but does so with remarkable efficiency—15 lines to accomplish what took 80 lines in another version. It's puzzles like these and solutions like this that show the enduring magic of logic programming.
So if you find yourself pondering logic puzzles or practical applications with Prolog, keep in mind the potential simplicity behind seemingly complex challenges. Plus, who doesn't want a chance to show up a fellow programmer with fewer lines of smarter code?
The Hacker News discussion revolves around solving a Professor Layton logic puzzle using Prolog and alternative methods, with key points summarized below:
Technical Solutions
-
Prolog & Constraint Programming
- Users shared code snippets using Prolog’s
clpfd
library to model the puzzle, emphasizing brevity (e.g., 15-line solution). The approach leverages constraints to find valid answer keys, revealing Colin’s score as 60 across four possible configurations. - Debates arose about Prolog’s practicality, comparing implementations (SWI-Prolog, SICStus) and praising its bidirectional reasoning for logic puzzles.
- Users shared code snippets using Prolog’s
-
Z3 in Python
- A user demonstrated a concise 7-line solution using Z3, a theorem prover, to model the problem declaratively. Others highlighted its simplicity compared to Prolog (example blog post).
Human-Logic Approaches
- A manual, heuristic-driven breakdown analyzed overlaps in student answers to deduce Colin’s score. For instance, comparing Mary’s and Dan’s incorrect answers narrowed down possible correct keys, concluding Colin scored 60 by leveraging shared patterns.
Nostalgia & Puzzle Design
- Fans reminisced about the Professor Layton game series (specifically Diabolical Box), praising its puzzle design. Some linked the challenge to classic logic puzzles like the Zebra Puzzle, often tackled with Prolog.
Broader Discussions
- Prolog’s Relevance: A tangent debated Prolog’s decline in popularity despite its strengths in constraint-solving, referencing a 2023 thread on its historical context.
- Educational Value: Users emphasized puzzles like these for teaching constraint programming and reducing trial-and-error through structured logic.
Key Takeaways
- The puzzle highlights the elegance of logic programming (Prolog) and modern solvers (Z3).
- While Prolog’s syntax and ecosystem quirks drew criticism, its suitability for combinatorial problems remains unmatched.
- Community appreciation for both computational and manual problem-solving underscored the puzzle’s design and educational appeal.
Neural Graffiti – Liquid Memory Layer for LLMs
Submission URL | 103 points | by vessenes | 25 comments
In today's tech buzz, a fascinating project called "Neural Graffiti" is making waves on Hacker News. Developed by the user "babycommando," this innovative approach takes the artistry of graffiti and marries it with the neuroplasticity of the brain, applied to large language models (LLMs). This experimental layer, known as the "Spray Layer," is positioned within the final stages of transformer model inference, requiring no fine-tuning or retraining.
Inspired by liquid neural networks, Neural Graffiti allows for real-time behavior modulation of pre-trained models. By injecting memory traces directly into vector embeddings, it subtly changes the "thinking" of a model over time, encouraging the model to lean into certain concepts more naturally as it interacts. This method is akin to a "soft whisper" influencing the model's perception and internal state.
What makes this technique intriguing is its potential for developing AI with a more active personality and enhanced curiosity, akin to a digital persona finding itself. However, the developers note that while this may not be suited for commercial deployments, it's a creative exploration in AI self-awareness and memory.
The project is open-source, hosted on GitHub, and the demo is available on Google Colab, inviting AI enthusiasts and researchers to explore its potential further. As AI continues to evolve, Neural Graffiti is a step towards creating AI entities with unique character traits and rich internal mental landscapes.
The Hacker News discussion on the Neural Graffiti project reflects a mix of technical curiosity, skepticism, and broader debates about AI behavior. Key points from the comments include:
-
Technical Scrutiny: Users debated the mechanics of the "Spray Layer," comparing it to existing methods like LoRA (Low-Rank Adaptation) and EMA (Exponential Moving Average) vectors. Some questioned whether the approach genuinely introduces novel memory retention or merely resembles established techniques, with discussions around random weight initialization and reservoir computing.
-
Skepticism About Efficacy: Several users tested the demo and expressed doubts about its ability to retain context or influence model behavior meaningfully. One user noted the model failed to remember concepts even after repeated prompts, sparking debates about whether observed changes were due to actual memory or random artifacts.
-
Comparisons to Commercial AI: The conversation veered into critiques of mainstream models like ChatGPT and Claude, highlighting issues like "sycophancy" (overly agreeable responses) and inconsistent behavior. Users contrasted OpenAI’s approach with alternatives, suggesting Claude handles conversations more naturally.
-
Criticism of Novelty: Some dismissed Neural Graffiti as a "buzzword-laden" project, questioning its innovation compared to frequent reinventions in the industry. Others acknowledged the creative naming (e.g., "liquid memory layer") but remained unconvinced of its technical substance.
-
Safety and Ethics: Concerns were raised about AI models inadvertently producing harmful outputs, with mentions of challenges in moderating responses related to controversial topics or figures.
Overall, the discussion highlighted both intrigue about Neural Graffiti’s experimental goals and skepticism about its execution, while broader themes around AI behavior and industry trends dominated tangential threads.
Vishap Oberon Compiler
Submission URL | 18 points | by sevoves | 7 comments
The Vishap Oberon Compiler (voc) has proudly made its presence felt on Hacker News! This open-source project brings the elegance of the Oberon-2 language to modern operating systems, including Linux, BSD, Android, macOS, and Windows, by leveraging a C backend with popular compilers like gcc, clang, and tcc.
Complete with libraries borrowed from the Ulm, oo2c, and Ofront Oberon compilers, Vishap’s implementation adheres to the Oakwood Guidelines, ensuring robust functionality for all users. The documentation is thorough, offering a step-by-step installation process and examples to help users get started with compiling their first Oberon programs.
Whether you're a Linux user, BSD aficionado, or even a Windows enthusiast, the Vishap Oberon Compiler has you covered with its comprehensive installation instructions. The community around this project has contributed code and expertise, elevating it to a polished and capable tool for anyone interested in exploring the Oberon-2 language.
With its open-source nature under the GPL-3.0 license, this project invites developers to tinker, improve, and potentially contribute back to the ecosystem. If you want a taste of Oberon's simplicity and power, Vishap's Compiler is right up your alley. Visit the project repository to start compiling, and perhaps contribute your own code to this active community!
Summary of Discussion:
The discussion around the Vishap Oberon Compiler highlights efforts to modernize Oberon-2 and integrate it into contemporary systems while acknowledging historical challenges. Key points include:
-
UNIX Legacy & Vision: A user alludes to the difficulty of transcending UNIX-based paradigms, metaphorically framing it as an unfulfilled task for modern systems. This sets a backdrop for Oberon’s role in exploring alternative approaches.
-
Compiler Integration & Standards:
- The compiler’s adherence to the Oakwood Guidelines ensures system-independent functionality, with libraries like NAppGUI and SDL bindings enabling cross-platform graphical interfaces.
- Discussions mention native compilation aspirations and projects like Dusk OS21, which aim to embed Oberon into graphical systems using SDL.
-
Technical Challenges:
- Users note transpiling efforts (e.g., leveraging C backends) and compatibility hurdles, such as integrating OS-specific functions (e.g., file operations) via libraries like
libc
and HTTP packages. - Existing codebases (e.g., Vishap’s compiler and VIPack) are cited for enabling practical workflows, though scattered compatibility layers and code portability remain challenges.
- Users note transpiling efforts (e.g., leveraging C backends) and compatibility hurdles, such as integrating OS-specific functions (e.g., file operations) via libraries like
-
Community Contributions:
- Appreciation is shown for alternative Oberon implementations (e.g., Rochus Keller’s OberonSystem3) and tools that blend Oberon with modern syntax or libraries.
- Projects like http libraries and package managers demonstrate ongoing efforts to expand Oberon’s ecosystem.
The conversation underscores a blend of nostalgia for Oberon’s design principles and pragmatic steps to evolve it—using modern tooling, community-driven libraries, and cross-platform frameworks—to stay relevant in today’s programming landscape.
Meta got caught gaming AI benchmarks
Submission URL | 328 points | by pseudolus | 157 comments
Meta recently found itself in hot water after it was revealed that the tech giant may have gamed AI benchmarks to make its new Llama 4 models appear more competitive. Over the weekend, Meta released two new models, Scout and Maverick, asserting that Maverick outperformed OpenAI's GPT-4o and others on LMArena, a popular comparison site for AI outputs. However, eagle-eyed AI researchers discovered a crucial bit of information buried in Meta’s documentation: the version of Maverick tested was an "experimental chat version" specifically designed to excel in conversational tasks.
This revelation prompted LMArena to clarify its policies, emphasizing the importance of fair and reproducible evaluations. Meta's spokesperson responded, highlighting their extensive experimentation with custom versions of their AI models. While not violating direct LMArena rules, the incident raises serious concerns about the integrity of AI benchmarks. Developers often rely on these scores to make informed decisions about which AI models to use, but if companies like Meta submit souped-up versions that aren't publicly available, these benchmarks lose their reliability as tools for assessing true performance.
The timing of the release further fueled speculation, as it dropped on a Saturday—a rarity in the tech world. Meta CEO Mark Zuckerberg simply noted, "That’s when it was ready." The mixed messages and questionable tactics have left many in the AI community scratching their heads, pondering whether benchmarks are becoming more about gaming the system rather than providing genuine insights into AI capabilities. This saga underscores the increasing competitiveness in AI development as companies vie for leadership and recognition.
The Hacker News discussion revolves around skepticism toward Meta’s AI benchmarking practices and connects it to broader criticisms of the company’s internal culture and management strategies. Key themes include:
-
Benchmark Gaming Concerns:
Users speculate that Meta’s claim of outperforming GPT-4o with its experimental "Maverick" model may reflect a focus on optimizing for benchmarks rather than genuine performance. This ties to fears that benchmarks are becoming marketing tools instead of reliable evaluations. -
Cultural Critiques:
- "Move Fast and Break Things" Legacy: Commenters argue Meta’s historical emphasis on speed over quality fosters rushed, half-baked releases. This leads to technical debt and undermines long-term product reliability.
- Performance-Driven Pressures: Employees are incentivized to prioritize metrics (e.g., launching features quickly) over quality to meet review cycles, exacerbating issues like technical debt and unstable AI models.
-
Management and Incentive Structures:
- Layoffs and high turnover are critiqued for reducing institutional knowledge, leaving fewer skilled workers to manage complex projects.
- The Hawthorne Effect and McNamara Fallacy are cited: Management’s focus on short-term metrics (e.g., quarterly results) creates temporary performance boosts but neglects sustainable progress, leading to employee burnout and rushed releases.
-
Data Quality Questions:
Skepticism arises about whether Meta’s AI improvements stem from better data or are merely tactics to game benchmarks. Users question if the company’s internal data management can support claims of superior model performance. -
Broader Tech Industry Parallels:
Comparisons to Netflix’s high-pressure culture highlight systemic issues in tech, where aggressive performance incentives prioritize rapid delivery over innovation and employee well-being.
Conclusion: The discussion reflects distrust in Meta’s benchmarking claims, linking them to a culture that prioritizes short-term wins and metrics over transparency and quality. Critics argue this undermines both the reliability of AI evaluations and the sustainability of Meta’s advancements.
Cogito Preview: IDA as a path to general superintelligence
Submission URL | 38 points | by parlam | 3 comments
In an exciting leap towards achieving general superintelligence, Cogito has unveiled its latest suite of large language models (LLMs) in various sizes from 3 billion to 70 billion parameters. These models, launched under an open license, set a new benchmark by outperforming all existing open-source models of equivalent sizes, even beating advanced models like the recently released Llama 4 109B MoE.
The magic behind these superior LLMs is the innovative Iterated Distillation and Amplification (IDA) training strategy. IDA offers a pathway to break free from the inherent limitations set by human overseers, ensuring consistent self-improvement without their bounds. This method enhances the model's reasoning by iteratively amplifying intelligence through computational power before distilling these capabilities back into the model, creating a powerful feedback loop of growth.
What this means is these models can think like traditional LLMs, and also engage in self-reflective reasoning, making them robust tools for complex problem-solving and agentic tasks. As a testament to their efficiency, the Cogito team developed these models swiftly in just over two months, indicating the scalability and time effectiveness of their method.
The excitement doesn’t stop here—larger models, including the ambitious 671 billion parameter model, are on the horizon. For developers and researchers eagerly awaiting to tinker with these cutting-edge models, they're readily accessible on platforms like Huggingface or through APIs on Fireworks AI and Together AI.
The success of Cogito’s models in industry-standard benchmarks not only validates IDA's potential but paves a structured path to transcend today's AI intelligence limits, nudging closer to the dawn of general superintelligence. As this journey unfolds, these models promise to adapt well beyond dry benchmarks, aiming to deliver powerful real-world solutions tailored to user needs.
Summary of Discussion:
-
Ethical Concerns (rndphs):
Commenters express apprehension about developing AGI (Artificial General Intelligence) targeting superintelligence, citing ethical risks. Superintelligent AI, they argue, could displace humanity—a widely acknowledged danger among philosophers and researchers. -
Technical Explanation of IDA (Reubend):
The Iterated Distillation and Amplification (IDA) method is outlined:- Amplification: Leveraging computational power to create higher intelligence.
- Distillation: Encoding these advanced capabilities into model parameters.
Reubend highlights IDA’s potential to enable self-improvement beyond human-generated training data, possibly leading to superintelligence. They praise the open-source availability (Hugging Face, Ollama, etc.) as a significant step forward.
-
Skepticism and Hype-Checking (bbr):
Skeptics question the submission’s claims, suspecting overhyped marketing or gatekeeping. They demand concrete benchmarks and evidence over promotional screenshots. Concerns are raised about IDA’s unpredictability, including the risk of an uncontrollable “intelligence explosion” surpassing human oversight. -
Mixed Reactions:
While some applaud the technical ambition and open-access approach, others urge caution, emphasizing the need for transparency and validation. The discussion reflects both excitement for progress and wariness of unproven claims or existential risks.
Key Themes:
- Ethical dilemmas of superintelligent AI.
- Technical optimism vs. skepticism about IDA’s feasibility.
- Calls for empirical proof over marketing.
- Open-source accessibility as a positive step.
Tom and Jerry One-Minute Video Generation with Test-Time Training
Submission URL | 79 points | by walterbell | 16 comments
Introducing a breakthrough in video generation technology, researchers have innovatively incorporated Test-Time Training (TTT) layers into pre-trained Transformers, allowing them to craft more coherent and expressive one-minute videos from text prompts. This development marks a significant leap from traditional self-attention layers, which struggle with long contexts, and alternatives like Mamba layers that falter with intricate stories. Utilizing a dataset drawn from classic Tom and Jerry cartoons, the experiment revealed that TTT layers vastly improved video coherence and storytelling, outperforming other methods such as Mamba 2, Gated DeltaNet, and sliding-window attention by 34 Elo points in human evaluations.
While promising, the generated content still grapples with certain artifacts, attributed to the limited capabilities of the current 5B model. Characters and scenes occasionally exhibit inconsistencies, such as color variations and unrealistic motion physics. Despite these challenges, the approach showcases strong potential for generating longer and more complex video narratives in the future.
Credit goes to a collaboration among researchers from institutions including NVIDIA, Stanford, UCSD, UC Berkeley, and UT Austin, with support from Hyperbolic Labs. As a proof of concept, this advancement lays the groundwork for future developments in video generation, pushing boundaries towards refined storytelling via AI.
Summary of Discussion:
-
Technical Feasibility and Cost:
- Users debated the computational resources required (50+ hours on 256 H100 GPUs) to train the model. While some found this "impressively low" for research-grade work, others questioned the expense, especially for generating short video clips. Skyylr noted skepticism about whether the cost justifies the output quality, while brgrkng contrasted these costs with the resource constraints faced by human creators in traditional industries.
-
Impact on Creativity and Jobs:
- A heated debate emerged about AI’s role in creativity. Critics like brgrkng argued that AI-generated content risks displacing human creativity, reducing art to "copy-pasting" and undermining industries like film (e.g., LOTR, Star Wars). Others, however, countered that AI could democratize content creation, enabling new forms of expression despite concerns about homogenization.
-
Quality and Progress:
- While kfrwsmn acknowledged flaws in current outputs (e.g., artifacts, inconsistencies), they praised the rapid progress compared to older models. Skeptics like nmrsp questioned whether video-generation advancements truly represent meaningful progress, dismissing hype around "unlimited customizable content" as overblown.
-
Ethical and Cultural Implications:
- quantumHazer raised concerns about personalized content creating "filter bubbles," limiting exploration of diverse ideas. spfrdmms drew a literary parallel to Steven Millhauser’s Cat 'N' Mouse, hinting at deeper questions about AI’s role in storytelling and cultural narratives.
-
Future Potential:
- Despite criticisms, many acknowledged the project’s promise for generating longer, more coherent narratives as models scale. Test-Time Training (TTT) was highlighted as a practical step forward, though its real-world utility remains to be seen.
Key Takeaway: The discussion reflects both optimism about AI’s potential to revolutionize video generation and skepticism about its costs, ethical implications, and impact on human creativity. Critics stress the need to balance technical progress with cultural and economic considerations.