-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Storing data in a cross-language form #71
Comments
Would be really useful to have cross-language datasets. Maybe a spDatapy or spDatax repo could be worthwhile, to avoid issues with CRAN.. |
@martinfleis what do you have in mind? Do you want to store the files in some python package? Many of the datasets from spData are available in inst/shapes -- https://github.com/Nowosad/spData/tree/master/inst/shapes (although we plan to remove shapefiles soon from there -- #62). Do you need any other dataset from spData as a file? |
Missed that! That is what I was looking for. If these links are considered stable, I would just include them in |
Yes, they are v. stable. (Except the .shp files, which will be removed in ~two months) |
@martinfleis .gal files are at https://github.com/Nowosad/spData/tree/master/inst/weights |
I have exposed those datasets that live in |
The rest of them are .rda object -- do you want all of the datasets from the README available (except the one we discussed yesterday)? If so, I could just create another GH repo for that. |
It would be nice for independence of R and Python examples depending on the same data. The tiny snippet @Robinlovelace used during SDSL required R running prior to Python to load the file and dump it to the disk before it could be read by geopandas. Having it available directly would allow more freedom in what runs first and in what runs at all. |
+1 to increasing modularity and x-language compat (without having to depend on either for shared examples). |
@martinfleis I took a look at the data available in R files -- they consist of spatial vector data, some raster data, a few tables, and also some graph data. Do you have any suggestions on the data formats you would prefer for each of the data types (e.g., vector -- gpkg, raster -- geotiff, etc)? |
As long as GDAL can read it I don't really care. |
@martinfleis take a look at https://github.com/Nowosad/spData_files and let me know what you think. @rsbivand Roger, do you maybe also have any suggestions on how to store and share these datasets? |
I think it will work for me if we don't touch it (so the shasum won't change). |
I agree that using file formats that have active GDAL drivers is sensible; for larger data sets maybe SOzip from when GDAL has provided that? Otherwise for vector zipped files may reduce bandwidth: r-spatial/sf#2433 and #62 (comment). |
Thanks, Roger. @martinfleis -- what do you think? Should I compress the gpkg files or keep them as they are? |
@Nowosad given the largest file in that repo is 905KB, I don't think compression is worth it. |
Maybe also https://github.com/Nowosad/spDataLarge/tree/master/inst and the |
Good idea, @rsbivand -- I just added some additional files from spData and all of the files from spDataLarge to https://github.com/Nowosad/spData_files Comments/suggestions are welcomed |
This all looks fine with me. I'll wait a bit if there are any further comments and then expose these in |
Hi,
would you be keen on storing the data in some open formats alongside rda so we could link to it from Python? We have a
geodatasets
package that holds metadata and some tooling to cache the data locally so if you including the data here as GeoJSON, CSV, GPKG or whatever is needed we could include them ingeodatasets
allowing easier access to the same data from R and Python, avoiding the need of running R first to save the data Python can read.The text was updated successfully, but these errors were encountered: