Skip to content

A dummy project to try to generate data lienage from sql files since I'm working in a very messy project

License

Notifications You must be signed in to change notification settings

ilkernator/simple_sql_lineage

 
 

Repository files navigation

simple_sql_lineage

Get lineage from SQL files. This might be of help if your project is a mess. Oh, my life is better after this one.

Install

As simple as:

poetry install

Run

  1. Run docker compose
docker compose up -d
  1. Create a .env file from the .env-template, filling the value of SIMPLE_LINEAGE_ROOT_FOLDER variable.

  2. Do:

poetry run python3 simple_lineage_generator/simple_lineage_generator.py

It might take a while to do everything needed.

  1. After finished, open Neo4j. http://localhost:7474

First time you access, you might face this screen, just choose "No Authentication" Neo4j login

Shutting down

  1. Kill docker compose
docker compose down
  1. If you wanna get rid of your neo4j data:
sudo rm -rf .neo4j

Useful queries:

  • Get all tables that directly source from a table
MATCH (t:Table {name: 'table name'})<-[r:SOURCES_FROM]-(x:Table) RETURN t, x
  • Get all column that inherit from a column up to 10 connection levels
MATCH (n:Column {name: 'column name'})-[:SOURCES_FROM*1..10]->(m:Column)
RETURN n, m
  • Get all columns up to 3 levels frm the original column and also all tables in 1 level distant
MATCH (x:Column {name: 'column  name'})<-[:SOURCES_FROM*1..3]-(y:Column),
      (x)-[:HAS_COLUMN*1]->(z:Table)
RETURN y, z
  • Get the table lineage database with more conenctions to it
MATCH (t:Table)-[r]-(x)
WITH distinct t,x, 
    COUNT(r) as con_count
ORDER BY con_count DESC
RETURN t
LIMIT 1

Now plot these 1st level realtionships

MATCH (t:Table {name: 'table name'})-[r]-(x)
RETURN t, x

Needed improvements

  • Docstrings. This thing was created in just 3 days, so no time to do it properly
  • Unit tests. Same reason as above
  • Improve parsing for less errors.

Thanks

  • Lineage was built using sqllineage

  • Thanks, Caetano Veloso, for providing an incredible soundtrack for coding this.

About

A dummy project to try to generate data lienage from sql files since I'm working in a very messy project

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 99.6%
  • Dockerfile 0.4%