lakeFS is an open source layer that delivers resilience and manageability to object-storage based data lakes.
With lakeFS you can build repeatable, atomic and versioned data lake operations - from complex ETL jobs to data science and analytics.
lakeFS is API compatible with AWS S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc.
For more information see the Official Documentation.
Atomic Operations - lakeFS allows data producers to manipulate multiple objects as a single, atomic operation. If something fails half-way, all changes can be instantly rolled back.
Consistency - lakeFS enables object-level and cross-collection consistency:
- object-level consistency ensures all operations within a branch are strongly consistent (read-after-write, list-after-write, read-after-delete, etc).
- cross-collection consistency is achieved by providing snapshot isolation. Using branches, writers can provide consistency guarantees across different logical collections - merging to “main” is only done after several datasets have been created successfully.
History - Commits are retained for a configurable duration, so readers can query data from the latest commit, or from any other point in time. Writers can atomically and safely rollback changes to previous versions.
Cross-Lake Isolation - Creating a lakeFS branch provides you with a snapshot of the entire lake at a given point in time. All reads from that branch are guaranteed to always return the same results.
Data CI/CD - The ability to define automated rules and tests that are required to pass before committing or merging changes to data.
-
Ensure you have Docker installed on your computer. The MacOS and Windows installations include Docker Compose by default.
-
Create a
docker-compose.yaml
file, containing the following configuration:--- version: '3' services: lakefs: image: "treeverse/lakefs:latest" ports: ["8000:8000"] links: ["postgres"] environment: LAKEFS_AUTH_ENCRYPT_SECRET_KEY: "some random secret string" LAKEFS_DATABASE_CONNECTION_STRING: postgres://lakefs:lakefs@postgres/postgres?sslmode=disable LAKEFS_BLOCKSTORE_TYPE: local LAKEFS_BLOCKSTORE_LOCAL_PATH: /home/lakefs entrypoint: ["/app/wait-for", "postgres:5432", "--", "/app/lakefs", "run"] postgres: image: "postgres:11" environment: POSTGRES_USER: lakefs POSTGRES_PASSWORD: lakefs
-
From the directory that contains our new docker-compose.yaml file, run the following command:
$ docker-compose up
-
Open http://localhost:8000/setup in your web browser to set up an initial admin user, used to login and send API requests.
Alternatively, you can download the lakeFS binaries and run them directly.
Binaries are available at https://releases.lakefs.io.
Please follow the Guide to Get Started to set up your local lakeFS installation.
For more detailed inforamation on how to setup lakeFS, please visit the documentation
Keep up to date and get lakeFS support via:
- Join our Community Slack Channel to get help from our team and other users.
- Follow us on Twitter.
- Contact us
- lakeFS documentation
- If you would like to contribute, check out our contributing guide.
lakeFS is completely free and open source and licensed under the Apache 2.0 License.