diff --git a/README.md b/README.md index c51dba987..5d16fd988 100644 --- a/README.md +++ b/README.md @@ -30,7 +30,6 @@

- # Table of contents - [Introduction](#introduction) - [Installation](#installation) @@ -87,8 +86,6 @@ SELECT pgml.transform( ] ``` - - **Sentiment Analysis** *SQL query* @@ -117,7 +114,6 @@ SELECT pgml.transform( - [Millions of transactions per second](https://postgresml.org/blog/scaling-postgresml-to-one-million-requests-per-second) - [Horizontal scalability](https://github.com/postgresml/pgcat) - **Training a classification model** *Training* @@ -242,7 +238,6 @@ SELECT pgml.transform( ``` The default model used for text classification is a fine-tuned version of DistilBERT-base-uncased that has been specifically optimized for the Stanford Sentiment Treebank dataset (sst2). - *Using specific model* To use one of the over 19,000 models available on Hugging Face, include the name of the desired model and `text-classification` task as a JSONB object in the SQL query. For example, if you want to use a RoBERTa model trained on around 40,000 English tweets and that has POS (positive), NEG (negative), and NEU (neutral) labels for its classes, include this information in the JSONB object when making your query. @@ -681,7 +676,6 @@ SELECT pgml.transform( Sampling methods involve selecting the next word or sequence of words at random from the set of possible candidates, weighted by their probabilities according to the language model. This can result in more diverse and creative text, as well as avoiding repetitive patterns. In its most basic form, sampling means randomly picking the next word $w_t$ according to its conditional probability distribution: $$ w_t \approx P(w_t|w_{1:t-1})$$ - However, the randomness of the sampling method can also result in less coherent or inconsistent text, depending on the quality of the model and the chosen sampling parameters such as temperature, top-k, or top-p. Therefore, choosing an appropriate sampling method and parameters is crucial for achieving the desired balance between creativity and coherence in generated text. You can pass `do_sample = True` in the arguments to use sampling methods. It is recommended to alter `temperature` or `top_p` but not both. @@ -821,7 +815,6 @@ SELECT * from tweet_embeddings limit 2; |"QT @user In the original draft of the 7th book, Remus Lupin survived the Battle of Hogwarts. #HappyBirthdayRemusLupin"|{-0.1567948312,-0.3149209619,0.2163394839,..}| |"Ben Smith / Smith (concussion) remains out of the lineup Thursday, Curtis #NHL #SJ"|{-0.0701668188,-0.012231146,0.1304316372,.. }| - ## Step 2: Indexing your embeddings using different algorithms After you've created embeddings for your data, you need to index them using one or more indexing algorithms. There are several different types of indexing algorithms available, including B-trees, k-nearest neighbors (KNN), and approximate nearest neighbors (ANN). The specific type of indexing algorithm you choose will depend on your use case and performance requirements. For example, B-trees are a good choice for range queries, while KNN and ANN algorithms are more efficient for similarity searches. @@ -860,7 +853,6 @@ SELECT * FROM items, query ORDER BY items.embedding <-> query.embedding LIMIT 5; |5 RT's if you want the next episode of twilight princess tomorrow| |Jurassic Park is BACK! New Trailer for the 4th Movie, Jurassic World -| - - # LLM Fine-tuning In this section, we will provide a step-by-step walkthrough for fine-tuning a Language Model (LLM) for differnt tasks. @@ -1036,7 +1027,6 @@ Fine-tuning a language model requires careful consideration of training paramete * hub_token: Your Hugging Face API token to push the fine-tuned model to the Hugging Face Model Hub. Replace "YOUR_HUB_TOKEN" with the actual token. * push_to_hub: A boolean flag indicating whether to push the model to the Hugging Face Model Hub after fine-tuning. - #### 5.3 Monitoring During training, metrics like loss, gradient norm will be printed as info and also logged in pgml.logs table. Below is a snapshot of such output. @@ -1151,7 +1141,6 @@ Here is an example pgml.transform call for real-time predictions on the newly mi Time: 175.264 ms ``` - **Batch predictions** ```sql @@ -1247,7 +1236,6 @@ SELECT pgml.tune( By following these steps, you can effectively restart training from a previously trained model, allowing for further refinement and adaptation of the model based on new requirements or insights. Adjust parameters as needed for your specific use case and dataset. - ## 8. Hugging Face Hub vs. PostgresML as Model Repository We utilize the Hugging Face Hub as the primary repository for fine-tuning Large Language Models (LLMs). Leveraging the HF hub offers several advantages: diff --git a/pgml-apps/pgml-chat/README.md b/pgml-apps/pgml-chat/README.md index aed2ae173..737a82914 100644 --- a/pgml-apps/pgml-chat/README.md +++ b/pgml-apps/pgml-chat/README.md @@ -14,7 +14,6 @@ Before you begin, make sure you have the following: - Python version >=3.8 - (Optional) OpenAI API key - # Getting started 1. Create a virtual environment and install `pgml-chat` using `pip`: ```bash @@ -104,7 +103,6 @@ model performance, as well as integrated notebooks for rapid iteration. Postgres If you have any further questions or need more information, please feel free to send an email to team@postgresml.org or join the PostgresML Discord community at https://discord.gg/DmyJP3qJ7U. ``` - ### Slack **Setup** @@ -128,7 +126,6 @@ Once the slack app is running, you can interact with the chatbot on Slack as sho ![Slack Chatbot](./images/slack_screenshot.png) - ### Discord **Setup** @@ -194,8 +191,6 @@ pip install . 4. Check the [roadmap](#roadmap) for features that you would like to work on. 5. If you are looking for features that are not included here, please open an issue and we will add it to the roadmap. - - # Roadmap - ~~Use a collection for chat history that can be retrieved and used to generate responses.~~ - Support for file formats like rst, html, pdf, docx, etc. diff --git a/pgml-cms/blog/.gitbook/assets/landscape.png b/pgml-cms/blog/.gitbook/assets/landscape.png new file mode 100644 index 000000000..18560da84 Binary files /dev/null and b/pgml-cms/blog/.gitbook/assets/landscape.png differ diff --git a/pgml-cms/blog/.gitbook/assets/machine-learning-platform.png b/pgml-cms/blog/.gitbook/assets/machine-learning-platform.png new file mode 100644 index 000000000..247da5930 Binary files /dev/null and b/pgml-cms/blog/.gitbook/assets/machine-learning-platform.png differ diff --git a/pgml-cms/blog/.gitbook/assets/open-weight-models.png b/pgml-cms/blog/.gitbook/assets/open-weight-models.png new file mode 100644 index 000000000..f3571634c Binary files /dev/null and b/pgml-cms/blog/.gitbook/assets/open-weight-models.png differ diff --git a/pgml-cms/blog/SUMMARY.md b/pgml-cms/blog/SUMMARY.md index 83810411d..6c419021b 100644 --- a/pgml-cms/blog/SUMMARY.md +++ b/pgml-cms/blog/SUMMARY.md @@ -1,13 +1,14 @@ # Table of contents * [Home](README.md) -* [Meet us at the 2024 Postgres Conference!](meet-us-at-the-2024-postgres-conference.md) -* [The 1.0 SDK is Here](the-1.0-sdk-is-here.md) -* [Using PostgresML with Django and embedding search](using-postgresml-with-django-and-embedding-search.md) -* [PostgresML is going multicloud](postgresml-is-going-multicloud.md) * [Introducing the OpenAI Switch Kit: Move from closed to open-source AI in minutes](introducing-the-openai-switch-kit-move-from-closed-to-open-source-ai-in-minutes.md) * [Speeding up vector recall 5x with HNSW](speeding-up-vector-recall-5x-with-hnsw.md) * [How-to Improve Search Results with Machine Learning](how-to-improve-search-results-with-machine-learning.md) +* [LLMs are commoditized; data is the differentiator](llms-are-commoditized-data-is-the-differentiator.md) +* [PostgresML is going multicloud](postgresml-is-going-multicloud.md) +* [The 1.0 SDK is Here](the-1.0-sdk-is-here.md) +* [Using PostgresML with Django and embedding search](using-postgresml-with-django-and-embedding-search.md) +* [Meet us at the 2024 Postgres Conference!](meet-us-at-the-2024-postgres-conference.md) * [pgml-chat: A command-line tool for deploying low-latency knowledge-based chatbots](pgml-chat-a-command-line-tool-for-deploying-low-latency-knowledge-based-chatbots-part-i.md) * [Announcing Support for AWS us-east-1 Region](announcing-support-for-aws-us-east-1-region.md) * [LLM based pipelines with PostgresML and dbt (data build tool)](llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md) diff --git a/pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md b/pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md index 54d27c2ba..55008a223 100644 --- a/pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md +++ b/pgml-cms/blog/announcing-support-for-aws-us-east-1-region.md @@ -27,12 +27,8 @@ To demonstrate the impact of moving the data closer to your application, we've c
-\\ -
-\\ - ## Using the New Region To take advantage of latency savings, you can [deploy a dedicated PostgresML database](https://postgresml.org/signup) in `us-east-1` today. We make it as simple as filling out a very short form and clicking "Create database". diff --git a/pgml-cms/blog/data-is-living-and-relational.md b/pgml-cms/blog/data-is-living-and-relational.md index 806e14fc2..d285a3770 100644 --- a/pgml-cms/blog/data-is-living-and-relational.md +++ b/pgml-cms/blog/data-is-living-and-relational.md @@ -56,6 +56,4 @@ Meanwhile, denormalized datasets: We think it’s worth attempting to move the machine learning process and modern data architectures beyond the status quo. To that end, we’re building the PostgresML Gym, a free offering, to provide a test bed for real world ML experimentation, in a Postgres database. Your personal Gym will include the PostgresML dashboard, several tutorial notebooks to get you started, and access to your own personal PostgreSQL database, supercharged with our machine learning extension. - - Many thanks and ❤️ to all those who are supporting this endeavor. We’d love to hear feedback from the broader ML and Engineering community about applications and other real world scenarios to help prioritize our work. diff --git a/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md b/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md index e14dd18f6..317a0d346 100644 --- a/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md +++ b/pgml-cms/blog/generating-llm-embeddings-with-open-source-models-in-postgresml.md @@ -216,8 +216,6 @@ For comparison, it would cost about $299 to use OpenAI's cheapest embedding mode | GPU | 17ms | $72 | 6 hours | | OpenAI | 300ms | $299 | millennia | -\\ - You can also find embedding models that outperform OpenAI's `text-embedding-ada-002` model across many different tests on the [leaderboard](https://huggingface.co/spaces/mteb/leaderboard). It's always best to do your own benchmarking with your data, models, and hardware to find the best fit for your use case. > _HTTP requests to a different datacenter cost more time and money for lower reliability than co-located compute and storage._ diff --git a/pgml-cms/blog/llms-are-commoditized-data-is-the-differentiator.md b/pgml-cms/blog/llms-are-commoditized-data-is-the-differentiator.md new file mode 100644 index 000000000..68acc10bf --- /dev/null +++ b/pgml-cms/blog/llms-are-commoditized-data-is-the-differentiator.md @@ -0,0 +1,57 @@ +# LLMs are Commoditized; Data is the Differentiator + +
+ +
Author
+ +
+ +Montana Low + +April 14, 2024 + +## Introduction + +Last year, OpenAI’s GPT-4 launched to great fanfare and was widely hailed as the arrival of AI. Last week, Meta’s Llama 3 surpassed the launch performance of GPT-4, making AI truly available to all with an open-weight model. + +The closed-source GPT-4 is rumored to be more than 1 trillion parameters, more than 10x larger and more expensive to operate than the latest 70 billion open-weight model from Meta. Yet, the smaller open-weight model achieves indistinguishable quality responses when judged by English speaking human evaluators in a side-by-side comparison. Meta is still training a larger 405B version of Llama 3, and plans to release the weights to the community in the next couple of months. + +Not only are open-weight models leading in high-end performance, further optimized and scaled down open-weight versions are replacing many of the tasks that were only serviceable by proprietary vendors last year. Mistral, Qwen, Yi and a host of community members regularly contribute high quality fine-tuned models optimized for specific tasks at a fraction of the operational cost. + +
GPT-4 progress has stagnated across recent updates. We look forward to continuing the trend lines when Llama 3 405B and other models are tested soon.
+ +## Increasing Complexity + +At the same time, few of the thinly implemented LLM wrapper applications survived their debut last year. Quality, latency, security, complexity and other concerns have stymied many efforts. + +The machine learning infrastructure required to deliver value continues to grow increasingly complex, despite or perhaps because of advances on multiple fronts. Tree based approaches still outperform LLMs on tabular data. Older, encoder models can easily handle tasks like sentiment analysis orders of magnitude more efficiently. LLMs and vector databases are a couple of the many commoditized components of the machine learning stack, part of a toolkit that continues to grow. + +
Original diagram credit to a16z.com
+ +The one aspect that remains consistent is that data differentiates open-source algorithms and models. In the modern age of LLMs, fine-tuning, RAG, re-ranking, and RLHF; they all require data. Implementing high quality search, personalization, recommendation, anomaly detection, forecasting, classification and so many more use cases, all depend on the data. + +The hard part of AI & ML systems has always been managing that data. Vastly more engineers have a full-time job managing data pipelines than models. Vastly more money is spent on data management systems than LLMs, and this will continue to be the case, because data is the bespoke differentiator. + +Getting the data to the models in a timely manner often spans multiple teams and multiple disciplines collaborating for multiple quarters. When the landscape is changing as quickly as modern AI & ML, many applications are out of date before they launch, and unmaintainable long term. Unfortunately, for those teams, the speed of innovation is only increasing. + +Keeping up with the latest innovations in just one small area of the field is a full time job, and wiring all of those together with ever-changing business requirements is a bunch of other people’s. That’s the force that created the previous diagram with a ton of siloed solutions and interconnections. Only the most lucrative businesses can afford the engineers and services required by the status quo. + +### _Move models to the data, rather than constantly pulling data to the models_ + +In-database machine learning represents a strategic shift to leverage data more effectively. By enabling machine learning operations directly within database environments, even organizations outside of the “magnificent seven” can make real-world applications that are more efficient, effective and reactive to real-time data changes. How? + +- *Reduced engineering overhead* Eliminate the need for an excess of engineers managing data pipelines full-time. +- *Increased efficiency* Reduce the number of external network calls from your data to the models, which are costly in both speed, spend, and uptime. +- *Enhanced security* No need to send your data to multiple third parties, or worry about new attack vectors on unproven technology. +- *Scalability* Store and scale your data with a proven platform handling millions of requests per second and billion row datasets. +- *Flexibility* Open-weight models on an open source platform gives you greater control for upgrades, use cases and deployment options. + +## How PostgresML fits in +We built PostgresML after a series of hard lessons learned building (and re-building) and then scaling the machine learning platform at Instacart during one of the companies’ highest-ever growth periods. At the end of the day, nothing worked better than building it all on a trusted, 35-year-old RDBMS. That’s why I’m confident that in-database machine learning is the future of real-world AI applications. +PostgresML brings AI & ML capabilities directly into a PostgreSQL database. It allows users to train, deploy, and predict using models inside the database. It’s all the benefits of in-database machine learning, packaged in a few easy to access ways. You can use our open-source extension or our hosted cloud. You can get started quickly with SDKs in Python and JavaScript, or you can get complete AI & ML capabilities with just a few SQL calls. That means generating embeddings, performing vector operations, using transformers for NLP – all directly where your data resides. Real-world applications range from predicting customer behaviors to automating financial forecasts. + +
Original diagram credit to a16z.com
+ +## Conclusion +The practical benefits of in-database machine learning are many, and we built PostgresML to deliver those benefits in the simplest way. By running LLMs and other predictive models inside the database, PostgresML enhances the agility and performance of software engineering teams. For developers, this means less context switching and greater ease of use, as they can manage data and model training in the environment they are already familiar with. Users benefit from reduced latency and improved accuracy in their predictive models. Organizations benefit from more performant applications, but also from the flexibility of a platform that can be easily updated with the latest models once a week rather than once a year. +Feel free to give PostgresML a try and let us know what you think. We’re open source, and welcome contributions from the community, especially when it comes to the rapidly evolving ML/AI landscape. diff --git a/pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md b/pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md index 00cca46fb..bacb8a6f1 100644 --- a/pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md +++ b/pgml-cms/blog/meet-us-at-the-2024-postgres-conference.md @@ -22,7 +22,6 @@ Why should you care? It's not every day you get to dive headfirst into the world Save 25% on your ticket with our discount code: 2024\_POSTGRESML\_25 {% endhint %} -\ PostgresML CEO and founder, Montana Low, will kick off the event on April 17th with a keynote about navigating the confluence of hardware evolution and machine learning technology. We’ll also be hosting a masterclass in retrieval augmented generation (RAG) on April 18th. Our own Silas Marvin will give hands-on guidance to equip you with the ability to implement RAG directly within your database. @@ -37,4 +36,3 @@ So, why sit on the sidelines when you could be right in the thick of it, soaking See you there! -\\ diff --git a/pgml-cms/blog/mindsdb-vs-postgresml.md b/pgml-cms/blog/mindsdb-vs-postgresml.md index 4d631d8b4..9b92bd851 100644 --- a/pgml-cms/blog/mindsdb-vs-postgresml.md +++ b/pgml-cms/blog/mindsdb-vs-postgresml.md @@ -47,8 +47,6 @@ Both Projects integrate several dozen machine learning algorithms, including the | Full Text Search | - | ✅ | | Geospatial Search | - | ✅ | -\\ - Both MindsDB and PostgresML support many classical machine learning algorithms to do classification and regression. They are both able to load ~~the latest LLMs~~ some models from Hugging Face, supported by underlying implementations in libtorch. I had to cross that out after exploring all the caveats in the MindsDB implementations. PostgresML supports the models released immediately as long as underlying dependencies are met. MindsDB has to release an update to support any new models, and their current model support is extremely limited. New algorithms, tasks, and models are constantly released, so it's worth checking the documentation for the latest list. Another difference is that PostgresML also supports embedding models, and closely integrates them with vector search inside the database, which is well beyond the scope of MindsDB, since it's not a database at all. PostgresML has direct access to all the functionality provided by other Postgres extensions, like vector indexes from [pgvector](https://github.com/pgvector/pgvector) to perform efficient KNN & ANN vector recall, or [PostGIS](http://postgis.net/) for geospatial information as well as built in full text search. Multiple algorithms and extensions can be combined in compound queries to build state-of-the-art systems, like search and recommendations or fraud detection that generate an end to end result with a single query, something that might take a dozen different machine learning models and microservices in a more traditional architecture. @@ -70,8 +68,6 @@ The architectural implementations for these projects is significantly different. | On Premise | ✅ | ✅ | | Web UI | ✅ | ✅ | -\\ - The difference in architecture leads to different tradeoffs and challenges. There are already hundreds of ways to get data into and out of a Postgres database, from just about every other service, language and platform that makes PostgresML highly compatible with other application workflows. On the other hand, the MindsDB Python service accepts connections from specifically supported clients like `psql` and provides a pseudo-SQL interface to the functionality. The service will parse incoming MindsDB commands that look similar to SQL (but are not), for tasks like configuring database connections, or doing actual machine learning. These commands typically have what looks like a sub-select, that will actually fetch data over the wire from configured databases for Machine Learning training and inference. MindsDB is actually a pretty standard Python microservice based architecture that separates data from compute over the wire, just with an SQL like API, instead of gRPC or REST. MindsDB isn't actually a DB at all, but rather an ML service with adapters for just about every database that Python can connect to. @@ -298,8 +294,6 @@ PostgresML is the clear winner in terms of performance. It seems to me that it c | translation\_en\_to\_es | t5-base | 1573 | 1148 | 294 | | summarization | sshleifer/distilbart-cnn-12-6 | 4289 | 3450 | 479 | -\\ - There is a general trend, the larger and slower the model is, the more work is spent inside libtorch, the less the performance of the rest matters, but for interactive models and use cases there is a significant difference. We've tried to cover the most generous use case we could between these two. If we were to compare XGBoost or other classical algorithms, that can have sub millisecond prediction times in PostgresML, the 20ms Python service overhead of MindsDB just to parse the incoming query would be hundreds of times slower. ## Clouds diff --git a/pgml-cms/blog/postgres-full-text-search-is-awesome.md b/pgml-cms/blog/postgres-full-text-search-is-awesome.md index 8cc8a8205..c1cab12b5 100644 --- a/pgml-cms/blog/postgres-full-text-search-is-awesome.md +++ b/pgml-cms/blog/postgres-full-text-search-is-awesome.md @@ -105,6 +105,4 @@ LIMIT 100; If you'd like to play through an interactive notebook to generate models for search relevance in a Postgres database, try it in the Gym. An exercise for the curious reader, would be to combine all three scores above into a single algebraic function for ranking, and then into a fourth learned model... - - Many thanks and ❤️ to all those who are supporting this endeavor. We’d love to hear feedback from the broader ML and Engineering community about applications and other real world scenarios to help prioritize our work. diff --git a/pgml-cms/blog/postgresml-is-going-multicloud.md b/pgml-cms/blog/postgresml-is-going-multicloud.md index 0100a2162..d6388a65c 100644 --- a/pgml-cms/blog/postgresml-is-going-multicloud.md +++ b/pgml-cms/blog/postgresml-is-going-multicloud.md @@ -10,7 +10,6 @@ Lev Kokotov Jan 18, 2024 - We started PostgresML two years ago with the goal of making machine learning and AI accessible and easy for everyone. To make this a reality, we needed to deploy PostgresML as closely as possible to our end users. With that goal mind, today we're proud to announce support for a new cloud provider: Azure. ### How we got here diff --git a/pgml-cms/blog/postgresml-is-moving-to-rust-for-our-2.0-release.md b/pgml-cms/blog/postgresml-is-moving-to-rust-for-our-2.0-release.md index 8b642a255..623d3e006 100644 --- a/pgml-cms/blog/postgresml-is-moving-to-rust-for-our-2.0-release.md +++ b/pgml-cms/blog/postgresml-is-moving-to-rust-for-our-2.0-release.md @@ -158,7 +158,6 @@ LIMIT 1; {% tab title="BLAS" %} - ```rust #[pg_extern(immutable, strict, parallel_safe)] fn dot_product_blas(vector: Vec, other: Vec) -> f32 { diff --git a/pgml-cms/docs/api/apis.md b/pgml-cms/docs/api/apis.md index 15f1dd37f..146d83be8 100644 --- a/pgml-cms/docs/api/apis.md +++ b/pgml-cms/docs/api/apis.md @@ -11,7 +11,6 @@ We also provide Client SDKs that implement the best practices on top of the SQL ## SQL Extension PostgreSQL is designed to be _**extensible**_. This has created a rich open-source ecosystem of additional functionality built around the core project. Some [extensions](https://www.postgresql.org/docs/current/contrib.html) are include in the base Postgres distribution, but others are also available via the [PostgreSQL Extension Network](https://pgxn.org/).\ -\ There are 2 foundational extensions included in a PostgresML deployment that provide functionality inside the database through SQL APIs. * **pgml** - provides Machine Learning and Artificial Intelligence APIs with access to more than 50 ML algorithms to train classification, clustering and regression models on your own data, or you can perform dozens of tasks with thousands of models downloaded from HuggingFace. diff --git a/pgml-cms/docs/api/sql-extension/pgml.train/joint-optimization.md b/pgml-cms/docs/api/sql-extension/pgml.train/joint-optimization.md index dac67f25a..b65812045 100644 --- a/pgml-cms/docs/api/sql-extension/pgml.train/joint-optimization.md +++ b/pgml-cms/docs/api/sql-extension/pgml.train/joint-optimization.md @@ -13,6 +13,4 @@ SELECT * FROM pgml.train_join( ); ``` - - You can read more in [scikit-learn](https://scikit-learn.org/stable/modules/classes.html#module-sklearn.multioutput) documentation. diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/copy.md b/pgml-cms/docs/introduction/getting-started/import-your-data/copy.md index 131c4a0fd..1e590cb87 100644 --- a/pgml-cms/docs/introduction/getting-started/import-your-data/copy.md +++ b/pgml-cms/docs/introduction/getting-started/import-your-data/copy.md @@ -30,7 +30,6 @@ If you're using another data store, it will almost always provide a CSV export f Create a table in PostgresML with the correct schema: - {% tabs %} {% tab title="SQL" %} diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/README.md b/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/README.md index fceec7f42..11de28b51 100644 --- a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/README.md +++ b/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/README.md @@ -12,7 +12,6 @@ Setting up & maintaining logical replication requires a few steps, but once you' First things first, make sure your primary database is configured to support logical replication. To do so, make sure the following settings are set: - | Setting | Value | |-------------------------|----------------| | `wal_level` | `logical` | @@ -50,7 +49,6 @@ FOR TABLE your_list_of_tables; where `your_list_of_tables` are the tables you'd like to replicate. For example, if you have two tables, _users_ and _blog_posts_, you can create a publication for those two tables using this command: - {% tabs %} {% tab title="SQL" %} diff --git a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md b/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md index 9f77b300a..4c45db575 100644 --- a/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md +++ b/pgml-cms/docs/introduction/getting-started/import-your-data/logical-replication/inside-a-vpc.md @@ -5,7 +5,6 @@ and we also provide an nginx-based Docker image than can be used without any add
- ## PostgresML IPs by region | Region | List of IP addresses | diff --git a/pgml-cms/docs/product/pgcat/README.md b/pgml-cms/docs/product/pgcat/README.md index f92de63bd..326252032 100644 --- a/pgml-cms/docs/product/pgcat/README.md +++ b/pgml-cms/docs/product/pgcat/README.md @@ -22,7 +22,6 @@ description: Nextgen PostgreSQL Pooler - PgCat, like PostgresML, is free and open source, distributed under the MIT license. It's currently running in our [cloud](https://postgresml.org/signup), powering both Serverless and Dedicated databases. ## [Features](features) diff --git a/pgml-cms/docs/product/vector-database.md b/pgml-cms/docs/product/vector-database.md index b1db1fc6b..71db1684f 100644 --- a/pgml-cms/docs/product/vector-database.md +++ b/pgml-cms/docs/product/vector-database.md @@ -41,7 +41,6 @@ ALTER TABLE {% endtab %} {% endtabs %} - #### Generating embeddings At first, the column is empty. To generate embeddings, we can use the PostgresML [pgml.embed()](/docs/api/sql-extension/pgml.embed) function and generate an embedding of another column in the same (or different) table. This is where machine learning inside the database really shines: @@ -49,7 +48,6 @@ At first, the column is empty. To generate embeddings, we can use the PostgresML {% tabs %} {% tab title="SQL" %} - ```sql UPDATE usa_house_prices @@ -62,7 +60,6 @@ SET embedding = pgml.embed( {% endtab %} {% tab title="Output" %} - ``` UPDATE 5000 ``` @@ -133,7 +130,6 @@ LIMIT 3; {% endtab %} {% tab title="Output" %} - ``` address ---------------------------------------- @@ -181,7 +177,6 @@ SELECT round(sqrt(5000000)) AS lists; {% endtab %} {% endtabs %} - #### Creating an IVFFlat index You can create an IVFFlat index with just one query: @@ -261,7 +256,6 @@ REINDEX {% endtab %} {% endtabs %} - As of this writing, _pgvector_ doesn't provide monitoring tools for index degradation. The user should monitor recall from their vector search operations, and if it starts dropping, run a reindex. ### HNSW @@ -292,7 +286,6 @@ CREATE INDEX {% endtab %} {% endtabs %} - #### Maintaining an HNSW index HNSW requires little to no maintenance. When new vectors are added, they are automatically inserted at the optimal place in the graph. However, as the graph gets bigger, rebalancing it becomes more expensive, and inserting new rows becomes slower. We address this trade-off and how to solve this problem in [Partitioning](../resources/data-storage-and-retrieval/partitioning.md). diff --git a/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md b/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md index 508110db5..030a84398 100644 --- a/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md +++ b/pgml-cms/docs/resources/benchmarks/making-postgres-30-percent-faster-in-production.md @@ -20,8 +20,6 @@ This is not only a performance benefit, but also a usability improvement for cli ## Benchmark -\\ -
The benchmark was conducted using `pgbench` with 1, 10, 100 and 1000 clients sending millions of queries to PgCat, which itself was running on a different EC2 machine alongside the database. This is a simple setup often used in production. Another configuration sees a pooler use its own machine, which of course increases latency but improves on availability. The clients were on another EC2 machine to simulate the latency experienced in typical web apps deployed in Kubernetes, ECS, EC2 and others. diff --git a/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md b/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md index 2faa141c3..414068275 100644 --- a/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md +++ b/pgml-cms/docs/resources/benchmarks/mindsdb-vs-postgresml.md @@ -35,7 +35,6 @@ Both Projects integrate several dozen machine learning algorithms, including the | Full Text Search | - | ✅ | | Geospatial Search | - | ✅ | -\ Both MindsDB and PostgresML support many classical machine learning algorithms to do classification and regression. They are both able to load ~~the latest LLMs~~ some models from Hugging Face, supported by underlying implementations in libtorch. I had to cross that out after exploring all the caveats in the MindsDB implementations. PostgresML supports the models released immediately as long as underlying dependencies are met. MindsDB has to release an update to support any new models, and their current model support is extremely limited. New algorithms, tasks, and models are constantly released, so it's worth checking the documentation for the latest list. Another difference is that PostgresML also supports embedding models, and closely integrates them with vector search inside the database, which is well beyond the scope of MindsDB, since it's not a database at all. PostgresML has direct access to all the functionality provided by other Postgres extensions, like vector indexes from [pgvector](https://github.com/pgvector/pgvector) to perform efficient KNN & ANN vector recall, or [PostGIS](http://postgis.net/) for geospatial information as well as built in full text search. Multiple algorithms and extensions can be combined in compound queries to build state-of-the-art systems, like search and recommendations or fraud detection that generate an end to end result with a single query, something that might take a dozen different machine learning models and microservices in a more traditional architecture. @@ -44,8 +43,6 @@ Another difference is that PostgresML also supports embedding models, and closel The architectural implementations for these projects is significantly different. PostgresML takes a data centric approach with Postgres as the provider for both storage _and_ compute. To provide horizontal scalability for inference, the PostgresML team has also created [PgCat](https://github.com/postgresml/pgcat) to distribute workloads across many Postgres databases. On the other hand, MindsDB takes a service oriented approach that connects to various databases over the network. -\\ -
| | MindsDB | PostgresML | @@ -59,8 +56,6 @@ The architectural implementations for these projects is significantly different. | On Premise | ✅ | ✅ | | Web UI | ✅ | ✅ | -\\ - The difference in architecture leads to different tradeoffs and challenges. There are already hundreds of ways to get data into and out of a Postgres database, from just about every other service, language and platform that makes PostgresML highly compatible with other application workflows. On the other hand, the MindsDB Python service accepts connections from specifically supported clients like `psql` and provides a pseudo-SQL interface to the functionality. The service will parse incoming MindsDB commands that look similar to SQL (but are not), for tasks like configuring database connections, or doing actual machine learning. These commands typically have what looks like a sub-select, that will actually fetch data over the wire from configured databases for Machine Learning training and inference. MindsDB is actually a pretty standard Python microservice based architecture that separates data from compute over the wire, just with an SQL like API, instead of gRPC or REST. MindsDB isn't actually a DB at all, but rather an ML service with adapters for just about every database that Python can connect to. @@ -287,8 +282,6 @@ PostgresML is the clear winner in terms of performance. It seems to me that it c | translation\_en\_to\_es | t5-base | 1573 | 1148 | 294 | | summarization | sshleifer/distilbart-cnn-12-6 | 4289 | 3450 | 479 | -\\ - There is a general trend, the larger and slower the model is, the more work is spent inside libtorch, the less the performance of the rest matters, but for interactive models and use cases there is a significant difference. We've tried to cover the most generous use case we could between these two. If we were to compare XGBoost or other classical algorithms, that can have sub millisecond prediction times in PostgresML, the 20ms Python service overhead of MindsDB just to parse the incoming query would be hundreds of times slower. ## Clouds diff --git a/pgml-cms/docs/resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md b/pgml-cms/docs/resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md index d67fb8b70..80e9be8a2 100644 --- a/pgml-cms/docs/resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md +++ b/pgml-cms/docs/resources/data-storage-and-retrieval/llm-based-pipelines-with-postgresml-and-dbt-data-build-tool.md @@ -2,8 +2,6 @@ In the realm of data analytics and machine learning, text processing and large language models (LLMs) have become pivotal in deriving insights from textual data. Efficient data pipelines play a crucial role in enabling streamlined workflows for processing and analyzing text. This blog explores the synergy between PostgresML and dbt, showcasing how they empower organizations to build efficient data pipelines that leverage large language models for text processing, unlocking valuable insights and driving data-driven decision-making. - - ## PostgresML PostgresML, an open-source machine learning extension for PostgreSQL, is designed to handle text processing tasks using large language models. Its motivation lies in harnessing the power of LLMs within the familiar PostgreSQL ecosystem. By integrating LLMs directly into the database, PostgresML eliminates the need for data movement and offers scalable and secure text processing capabilities. This native integration enhances data governance, security, and ensures the integrity of text data throughout the pipeline. diff --git a/pgml-cms/docs/resources/developer-docs/contributing.md b/pgml-cms/docs/resources/developer-docs/contributing.md index 3648acbe3..b5d53f55d 100644 --- a/pgml-cms/docs/resources/developer-docs/contributing.md +++ b/pgml-cms/docs/resources/developer-docs/contributing.md @@ -115,7 +115,6 @@ CREATE EXTENSION pgml; That's it, PostgresML is ready. You can validate the installation by running: - {% tabs %} {% tab title="SQL" %} ```sql @@ -214,7 +213,6 @@ cargo watch --exec run The website can be packaged for distribution. You'll need to copy the static files along with the `target/release` directory to your server. - ## General We are a cross-platform team, some of us use WSL and some use Linux or Mac OS. Keeping that in mind, it's good to use common line endings for all files to avoid production errors, e.g. broken Bash scripts. diff --git a/pgml-cms/docs/use-cases/chatbots/README.md b/pgml-cms/docs/use-cases/chatbots/README.md index 06ed2741d..333cbfa8f 100644 --- a/pgml-cms/docs/use-cases/chatbots/README.md +++ b/pgml-cms/docs/use-cases/chatbots/README.md @@ -300,12 +300,10 @@ What is your name?<|im_end|> <|im_start|>assistant """ - async def main(): model_output = await model.transform([user_input], {"max_new_tokens": 1000}) print(model_output[0][0]["generated_text"], "\n") - asyncio.run(main()) ``` @@ -341,12 +339,10 @@ What did I just ask you? assistant """ - async def main(): model_output = await model.transform([user_input], {"max_new_tokens": 1000}) print(model_output[0][0]["generated_text"], "\n") - asyncio.run(main()) ``` @@ -441,7 +437,6 @@ pipeline = Pipeline("test-pipeline-1", model, splitter) # Create a collection to house these documents collection = Collection("chatbot-knowledge-base-1") - async def main(): # Add the pipeline to the collection await collection.add_pipeline(pipeline) @@ -461,7 +456,6 @@ async def main(): ) print(most_relevant_section[0][1]) - asyncio.run(main()) ``` @@ -509,12 +503,10 @@ system_message = """You are a friendly and helpful chatbot named Hermes. Given t history = [{"role": "system", "content": ""}] - def build_history_with_context(context): history[0]["content"] = system_message.replace("{context}", context) return history - async def main(): while True: user_input = input("=> ") @@ -537,7 +529,6 @@ async def main(): ) print(model_output["choices"][0]["message"]["content"], "\n") - asyncio.run(main()) ``` diff --git a/pgml-cms/docs/use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml.md b/pgml-cms/docs/use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml.md index 8ed3a34f8..396301e59 100644 --- a/pgml-cms/docs/use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml.md +++ b/pgml-cms/docs/use-cases/embeddings/generating-llm-embeddings-with-open-source-models-in-postgresml.md @@ -198,8 +198,6 @@ For comparison, it would cost about $299 to use OpenAI's cheapest embedding mode | GPU | 17ms | $72 | 6 hours | | OpenAI | 300ms | $299 | millennia | -\\ - You can also find embedding models that outperform OpenAI's `text-embedding-ada-002` model across many different tests on the [leaderboard](https://huggingface.co/spaces/mteb/leaderboard). It's always best to do your own benchmarking with your data, models, and hardware to find the best fit for your use case. > _HTTP requests to a different datacenter cost more time and money for lower reliability than co-located compute and storage._ diff --git a/pgml-cms/docs/use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md b/pgml-cms/docs/use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md index 0e70c569d..d6094233b 100644 --- a/pgml-cms/docs/use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md +++ b/pgml-cms/docs/use-cases/embeddings/personalize-embedding-results-with-application-data-in-your-database.md @@ -12,7 +12,6 @@ This article is the third in a multipart series that will show you how to build 4. Optimizing semantic results with an XGBoost ranking model - coming soon! - _Embeddings can be combined into personalized perspectives when stored as vectors in the database._ ## Personalization diff --git a/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md b/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md index fad02dcb6..7e762128b 100644 --- a/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md +++ b/pgml-cms/docs/use-cases/embeddings/tuning-vector-recall-while-generating-query-embeddings-in-the-database.md @@ -1,7 +1,6 @@ # Tuning vector recall while generating query embeddings in the database - PostgresML makes it easy to generate embeddings using open source models and perform complex queries with vector indexes unlike any other database. The full expressive power of SQL as a query language is available to seamlessly combine semantic, geospatial, and full text search, along with filtering, boosting, aggregation, and ML reranking in low latency use cases. You can do all of this faster, simpler and with higher quality compared to applications built on disjoint APIs like OpenAI + Pinecone. Prove the results in this series to your own satisfaction, for free, by signing up for a GPU accelerated database. ## Introduction @@ -16,7 +15,6 @@ This article is the second in a multipart series that will show you how to build The previous article discussed how to generate embeddings that perform better than OpenAI's `text-embedding-ada-002` and save them in a table with a vector index. In this article, we'll show you how to query those embeddings effectively. - _Embeddings show us the relationships between rows in the database, using natural language._ Our example data is based on 5 million DVD reviews from Amazon customers submitted over a decade. For reference, that's more data than fits in a Pinecone Pod at the time of writing. Webscale: check. Let's start with a quick refresher on the data in our `pgml.amazon_us_reviews` table: diff --git a/pgml-dashboard/content/blog/benchmarks/python_microservices_vs_postgresml/README.md b/pgml-dashboard/content/blog/benchmarks/python_microservices_vs_postgresml/README.md index 4e45061b0..93f875b34 100644 --- a/pgml-dashboard/content/blog/benchmarks/python_microservices_vs_postgresml/README.md +++ b/pgml-dashboard/content/blog/benchmarks/python_microservices_vs_postgresml/README.md @@ -95,4 +95,3 @@ ab -n 10000 -c 10 -T application/json -k -p ab.txt http://localhost:8000/ ``` - diff --git a/pgml-dashboard/static/css/bootstrap-5.3.0-alpha1/README.md b/pgml-dashboard/static/css/bootstrap-5.3.0-alpha1/README.md index 9f9374ced..cceb1f9a8 100644 --- a/pgml-dashboard/static/css/bootstrap-5.3.0-alpha1/README.md +++ b/pgml-dashboard/static/css/bootstrap-5.3.0-alpha1/README.md @@ -21,12 +21,10 @@ Blog

- ## Bootstrap 5 Our default branch is for development of our Bootstrap 5 release. Head to the [`v4-dev` branch](https://github.com/twbs/bootstrap/tree/v4-dev) to view the readme, documentation, and source code for Bootstrap 4. - ## Table of contents - [Quick start](#quick-start) @@ -41,7 +39,6 @@ Our default branch is for development of our Bootstrap 5 release. Head to the [` - [Thanks](#thanks) - [Copyright and license](#copyright-and-license) - ## Quick start Several quick start options are available: @@ -55,7 +52,6 @@ Several quick start options are available: Read the [Getting started page](https://getbootstrap.com/docs/5.3/getting-started/introduction/) for information on the framework contents, templates, examples, and more. - ## Status [![Build Status](https://img.shields.io/github/actions/workflow/status/twbs/bootstrap/js.yml?branch=main&label=JS%20Tests&logo=github)](https://github.com/twbs/bootstrap/actions?query=workflow%3AJS+Tests+branch%3Amain) @@ -74,7 +70,6 @@ Read the [Getting started page](https://getbootstrap.com/docs/5.3/getting-starte [![Sponsors on Open Collective](https://img.shields.io/opencollective/sponsors/bootstrap?logo=opencollective&logoColor=fff)](#sponsors) ![OpenSSF Scorecard](https://img.shields.io/ossf-scorecard/github.com/twbs/bootstrap) - ## What's included Within the download you'll find the following directories and files, logically grouping common assets and providing both compiled and minified variations. @@ -135,12 +130,10 @@ Within the download you'll find the following directories and files, logically g We provide compiled CSS and JS (`bootstrap.*`), as well as compiled and minified CSS and JS (`bootstrap.min.*`). [Source maps](https://developers.google.com/web/tools/chrome-devtools/javascript/source-maps) (`bootstrap.*.map`) are available for use with certain browsers' developer tools. Bundled JS files (`bootstrap.bundle.js` and minified `bootstrap.bundle.min.js`) include [Popper](https://popper.js.org/). - ## Bugs and feature requests Have a bug or a feature request? Please first read the [issue guidelines](https://github.com/twbs/bootstrap/blob/main/.github/CONTRIBUTING.md#using-the-issue-tracker) and search for existing and closed issues. If your problem or idea is not addressed yet, [please open a new issue](https://github.com/twbs/bootstrap/issues/new/choose). - ## Documentation Bootstrap's documentation, included in this repo in the root directory, is built with [Hugo](https://gohugo.io/) and publicly hosted on GitHub Pages at . The docs may also be run locally. @@ -162,7 +155,6 @@ You can find all our previous releases docs on . - ## Community Get updates on Bootstrap's development and chat with the project maintainers and community members. @@ -183,14 +174,12 @@ Get updates on Bootstrap's development and chat with the project maintainers and - Implementation help may be found at Stack Overflow (tagged [`bootstrap-5`](https://stackoverflow.com/questions/tagged/bootstrap-5)). - Developers should use the keyword `bootstrap` on packages which modify or add to the functionality of Bootstrap when distributing through [npm](https://www.npmjs.com/browse/keyword/bootstrap) or similar delivery mechanisms for maximum discoverability. - ## Versioning For transparency into our release cycle and in striving to maintain backward compatibility, Bootstrap is maintained under [the Semantic Versioning guidelines](https://semver.org/). Sometimes we screw up, but we adhere to those rules whenever possible. See [the Releases section of our GitHub project](https://github.com/twbs/bootstrap/releases) for changelogs for each release version of Bootstrap. Release announcement posts on [the official Bootstrap blog](https://blog.getbootstrap.com/) contain summaries of the most noteworthy changes made in each release. - ## Creators **Mark Otto** @@ -203,7 +192,6 @@ See [the Releases section of our GitHub project](https://github.com/twbs/bootstr - - - ## Thanks @@ -218,7 +206,6 @@ Thanks to [BrowserStack](https://www.browserstack.com/) for providing the infras Thanks to [Netlify](https://www.netlify.com/) for providing us with Deploy Previews! - ## Sponsors Support this project by becoming a sponsor. Your logo will show up here with a link to your website. [[Become a sponsor](https://opencollective.com/bootstrap#sponsor)] @@ -234,14 +221,12 @@ Support this project by becoming a sponsor. Your logo will show up here with a l [![OC sponsor 8](https://opencollective.com/bootstrap/sponsor/8/avatar.svg)](https://opencollective.com/bootstrap/sponsor/8/website) [![OC sponsor 9](https://opencollective.com/bootstrap/sponsor/9/avatar.svg)](https://opencollective.com/bootstrap/sponsor/9/website) - ## Backers Thank you to all our backers! 🙏 [[Become a backer](https://opencollective.com/bootstrap#backer)] [![Backers](https://opencollective.com/bootstrap/backers.svg?width=890)](https://opencollective.com/bootstrap#backers) - ## Copyright and license Code and documentation copyright 2011–2022 the [Bootstrap Authors](https://github.com/twbs/bootstrap/graphs/contributors). Code released under the [MIT License](https://github.com/twbs/bootstrap/blob/main/LICENSE). Docs released under [Creative Commons](https://creativecommons.org/licenses/by/3.0/). diff --git a/pgml-dashboard/static/images/gym/quick_start.md b/pgml-dashboard/static/images/gym/quick_start.md index a493f8e32..3662b0c45 100644 --- a/pgml-dashboard/static/images/gym/quick_start.md +++ b/pgml-dashboard/static/images/gym/quick_start.md @@ -68,10 +68,8 @@ The `diabetes` dataset is a toy (small, not realistic) dataset published by Scik | s6 | Blood sugar level. | float | | **target** | Quantitative measure of disease progression one year after baseline. | float | - This dataset is not realistic because all data is perfectly arranged and normalized, which won't be the case with most real world datasets you'll run into, but it's perfect for our quick tutorial. - Alright, we're ready to do some machine learning! ## First project diff --git a/pgml-extension/examples/dbt/embeddings/README.md b/pgml-extension/examples/dbt/embeddings/README.md index 2190edf51..a46f8636e 100644 --- a/pgml-extension/examples/dbt/embeddings/README.md +++ b/pgml-extension/examples/dbt/embeddings/README.md @@ -90,7 +90,6 @@ Here's a summary of the key parameters: These configuration parameters offer a specific setup for the task, allowing for customization and flexibility in performing embeddings with the chosen splitter, model, table, query, and result limit. - # Models dbt models form the backbone of data transformation and analysis pipelines. These models allow you to define the structure and logic for processing your data, enabling you to extract insights and generate valuable outputs. @@ -103,7 +102,6 @@ The Splitters [model](./models/splitters.sql) serves as a central repository for ## Models The Models [model](./models/models.sql) serves as a repository for storing information about different embeddings models and their associated hyperparameters. This model allows you to keep track of the various embedding techniques used in your data pipeline and their specific configuration settings. - ## Embeddings [Embeddings](./models/embeddings.sql) focus on generating feature embeddings from chunks using an embedding model in models table. These embeddings capture the semantic representation of textual data, facilitating more effective machine learning models. diff --git a/pgml-extension/src/bindings/transformers/whitelist.rs b/pgml-extension/src/bindings/transformers/whitelist.rs index 44ab2703f..6c00a9c28 100644 --- a/pgml-extension/src/bindings/transformers/whitelist.rs +++ b/pgml-extension/src/bindings/transformers/whitelist.rs @@ -17,7 +17,9 @@ pub fn verify_task(task: &Value) -> Result<(), Error> { let model_is_allowed = whitelisted_models.is_empty() || whitelisted_models.contains(&task_model); if !model_is_allowed { - bail!("model {task_model} is not whitelisted. Consider adding to `pgml.huggingface_whitelist` in postgresql.conf"); + bail!( + "model {task_model} is not whitelisted. Consider adding to `pgml.huggingface_whitelist` in postgresql.conf" + ); } let task_trust = get_trust_remote_code(task);