-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot size 0
breaks smart clones with OpenShift Virtualization
#129
Comments
Welcome! Hmm, we may need to work jointly with them to hone in on the meaning of the spec:
Speaking from a more theoretical standpoint the size is genuinely unspecified. Using the zfs snapshot feature the size is 0 and will only change over time as the data of the parent begins to differ from the snapshot. Furthermore, creating a new volume from a snapshot (the purpose of the field) requires 0 extra bytes as well (again, until the new volume begins to differ from the snapshot). In short it was intentionally set to 0 but I'm open to discussion for sure. |
Hello @travisghansen, thank you for the warm welcome 😄
I get that and it makes sense from the perspective of the storage system, however, from the perspective of k8s, you couldn't create a zero sized PVC - it needs to be at least large enough to hold the data within the source volume. For what it's worth, here's the
The |
Yes understood. We had some good discussion about the matter on slack: https://kubernetes.slack.com/archives/C8EJ01Z46/p1635807374020000 In any case, the result of the conversation was this:
In any case, I'll be working on revisiting the logic but it will take me a little bit. It's actually more nuanced than it seems because I support I'm glad you brought it up because I otherwise would never have known :) |
@acsulli out of curiosity do you happen to know how k8s/CDI responds when the value is simply omitted from responses? It may be easier for me to not try to manage it at all.. |
I do not =\ |
OK, I've removed the attribute from the responses entirely for now: Can you update your image tags to |
Looks to be the same behavior as before.
|
OK thanks for giving it a try. I'll work on returning a value, hopefully won't take too long. |
What exactly is Example 1: A Example 2: Same as above but using zvol/iscsi (ie: block based with an fs) then no matter what the Thoughts? More info? |
Disclaimer: I am not a developer, so please take my interpretation of code with a massive grain of salt. From what I understand, it is using the restore size for the initial PVC request, then expanding it to the size requested by the |
OK, lemme commit a tweak for your scenario (and what I think is the proper response value outside the Thanks for the patience and info! Excited to get the feature working for you. |
Ok, I’ve made a new commit which attempts to handle this properly. I’ve a bit more research to do to make sure the value being returned will always be high enough but it should be sane enough for lab testing. Force the deployment to update to the most recent |
Apologies for the delay, that works exactly as expected, thank you! |
Awesome! I’ll put a little more polish on the feature and then close this down once it’s merged into a release. |
One interesting thing I've noticed is the clone volume dependency tree(s). What I understand to be happening is that a CSI snapshot is created, resulting in a ZFS snap, then a new PV/PVC is created from the snapshot - which results in a dependent clone volume in ZFS. Deleting the original volume (the one which had the snapshot created) results in the PVC being in a released state and the
I don't think this behavior is limited to just CDI / OpenShift Virtualization, I would expect it any time a PVC is cloned and the original deleted. More importantly, I'm not sure that this is incorrect based on my (admittedly primitive) understanding of how ZFS clones work. I do think it simply means that users need to be aware of how it works. Is there an option to do a true "full" clone of the volume? NetApp's Trident does this via a backend configuration called |
@acsulli yeah, super glad you're running through this stuff! This is partially what I was alluding to earlier when I mentioned the whole situation is actually more complex than it seems for several reasons (it's a large matrix of potential paths). In
The particular sequence of events you've hit is likely unable to be handled differently than the behavior you're seeing. In the case of deleting snapshots I can flag the removal of the Frankly there are probably some gaps in the logic currently that aren't being dealt with entirely because there's so many possibilities. For example, if you enable 1 (but the parent is on another pool) and do not enable 2 then it will blow up, as you can't From a theoretical standpoint there are no silver bullets but I tried to cover what I felt would be the most likely use-cases. EDIT: The csi spec is purposely ambiguous about the relationship of snapshots to parent volumes specifically around the delete behavior (it doesn't take a stance...at least last I knew). ie: snapshots may or may not have a joint life-cycle to the parent volume. I'm interested in knowing of other specific configurations/combinations that would better serve the kubevirt use-case if you have them. |
Here is a nice summary of the problem. Thinking out loud, it would make sense that having deduplication turned on at the top level dataset / volume - where both the But, AFAIK, it doesn't work that way 😞 I do think that you could save capacity with the above configuration, but it would still rely on the |
Yeah the linked article is a decent representation of the various trade offs depending on which route you go. If you don’t care about quotas and wanted to run dedup you could create a single dataset and share it over nfs and then use the There are simply trade offs between capacity, performance, etc that need to be looked at and chosen which makes the most sense for the circumstances :( If you have any bright ideas do share! EDIT: correction, dedup is pool wide according to this: https://unix.stackexchange.com/questions/382303/does-zfs-deduplicates-across-datasets-or-only-inside-a-single-dataset In such a case you can enable dedup on the parent datasets and you should be covered. |
How has this been working? There is 1 last potential issue that I'm aware of and this is zfs reporting bad values because all pending transaction have not been flushed. |
It's been working exactly as expected, thank you! I've created a several dozen PVC clones for VMs without issue. |
Released in |
There are three settings that change behaviour from `zfs clone` to `zfs send | zfs receive`. For explanation, see democratic-csi/democratic-csi#129 (comment). Try 2 settings that detach volumes created from snapshots and other volumes.
Summary
When using the
freenas-nfs
driver with OpenShift 4.8+ and OpenShift Virtualization 4.8+, offloaded "smart" clones do not work. This appears to be a result of how the Containerized Data Importer looks at the size of the snapshot to determine the size of the newly created volume.I would expect this will also affect KubeVirt with other Kubernetes distributions as well.
Recreating the issue
Deploy OpenShift 4.8 or later.
Deploy OpenShift Virtualization 4.8 or later.
Configure and deploy the
freenas-nfs
driver, with snapshot support.Add a VM template disk image, using the OpenShift Virtualization GUI or by creating a PVC with a template disk inside.
The PVC and PV are successfully created and populated with data.
DataVolume
, using the previous as the source.VolumeSnapshot
andVolumeSnapShotContents
objects are successfully created, however there is no ZFS snap.Note that the
RESTORESIZE
field for both is0
.DataVolume
stays in theSnapshotForSmartCloneInProgress
status indefinitely.The logs for the CDI deployment Pod (
oc logs cdi-deployment-<identifier> -n openshift-cnv
) have these messages:I believe that is the result of this line of code, which statically sets the size to
0
.Expected result
The objects should have a valid value in the size field, allowing the
DataVolume
clone operation to succeed.The text was updated successfully, but these errors were encountered: