Experiment evaluations for paper "Making RDBMSs Efficient on Graph Workloads Through Predefined Joins".
Please see graindb/
, graphflowdb/
, and neo4j/
directories for detailed instructions to run experiments related to each system.
End to end benchmarks: JOB, SNB-M, TPC-H.
- Prepare input csv files:
- Merge
graindb/evaluations/job_duckdb_avg.out
andgraindb/evaluations/job_graindb_avg.out
intoresult/end2end_job.csv
. - Merge
graindb/evaluations/snb_duckdb_avg.out
,graindb/evaluations/mv_duckdb_avg.out
,graindb/evaluations/snb_graindb_avg.out
andgraphflowdb/evaluations/snb_gfdb_avg.out
intoresult/end2end_snb.csv
. - Merge
graindb/evaluations/tpch_duckdb_avg.out
andgraindb/evaluations/tpch_graindb_avg.out
intoresult/end2end_tpch.csv
.
- Merge
- Plot the graphs:
> python3 scripts/plot_boxplot_job.py result/end2end_job.csv
> python3 scripts/plot_boxplot_snb.py result/end2end_snb.csv
> python3 scripts/plot_boxplot_tpch.py result/end2end_tpch.csv
Ablation
- Merge all perfromance of the ablation study into a single csv file result/ablation.csv.
Each configuration takes a column in the final csv file as the following order:
'DUCKDB', 'GR-JM-RSJ', 'GR-JM', 'GR-FULL'
. - Plot the graph:
> python3 scripts/plot_boxplot_ablation.py result/ablation.csv
-
Prepare input csv files:
- Create an empty file
result/micro_p.csv
. Append "Selectivity" as the first column in the csv file. - Append performance columns from
graindb/evaluations/micro_p_duckdb_avg.out
,graindb/evaluations/micro_p_graindb_avg.out
,graphflowdb/evaluations/micro_p_gfdb_avg.out
, andneo4j/micro_p_neo_results.csv
intoresult/micro_p.csv
. - The final csv file's header is
Selectivity,DuckDB,GRainDB,GFDB,Neo4j
. result/micro_k.csv
is prepared in a similar way.
- Create an empty file
-
MICRO-P
> python3 scripts/plot_selectivity.py result/micro_p.csv
- MICRO-K
> python3 scripts/plot_selectivity.py result/micro_k.csv
- Prepare input csv files:
For each query, organize the performance of DuckDB and GRainDB under different plans in a single csv file.
The header of the csv file is
DuckDB, GRainDB
. Each row in the csv file corresponds to the performance number of DuckDB and GRainDB under the same join order.
> python3 scripts/plot_spectrum.py graindb/evaluations/spectrum_q1.csv -t q1a
> python3 scripts/plot_spectrum.py graindb/evaluations/spectrum_q2.csv -t q2a
> python3 scripts/plot_spectrum.py graindb/evaluations/spectrum_q3.csv -t q3a
> python3 scripts/plot_spectrum.py graindb/evaluations/spectrum_q4.csv -t q4a
> python3 scripts/plot_spectrum.py graindb/evaluations/spectrum_q5.csv -t q5a
> python3 scripts/plot_spectrum.py graindb/evaluations/spectrum_q6.csv -t q6a
screen
or tmux
are recommemded when running these experiments as some might take very long time.