-
-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Container doesn't populate resolv.conf properly #135
Comments
I'm seeing the same issue running in Kubernetes. Might be related to this bug in Alpine. |
I ported to node:12-slim to successfully work around the problem. I'm running into a lot of DNS issues on alpine based images. Not sure if it's my k8s cluster's configuration, or what. |
Thanks for the research on this. I'm not entirely against switching the base image from Alpine to Debian. I'd like give upstream a bit of time to resolve this before jumping ship. @jdmarble what was the impact to the image size using Debian-slim? |
I expected the Debian (even slim) based image to be larger than the Alpine one. I was surprised, although I'm not sure I can trust the results because I don't understand them. I'm getting different numbers depending on the source.
The Gitlab repo reports for my "slim" spin: 32.56 MiB (edit: 34.14MB) Maybe Regardless, I wouldn't suggest something as drastic as a base image change only to fix this type of problem, but if a slightly smaller image size is interesting (if it's true). :) |
I think I'm being affected by this issue too, but in the weirdest way I could imagine. I've spent the last 4 hours debugging and searching lol. I'm spinning this up in Kubernetes. Started when I got errors talking about rejected certs during the download process. I managed to get a shell into a container, and voila! (all four commands ran in quick succession) The 404s are from my on public facing traefik instance, and then it eventually curls correctly, randomly. The next request was back to the 404s I'm going to try building the image myself from different bases like @jdmarble did, but this is just an impact report I guess Edit Bless you jdmarble you forked and pushed your port. May the coding gods smile upon you |
Update: Looks like that was unsuccessful. I was able to build the image successfully, but I still have the same problem. Sorry for the noise. Considering this may be unrelated, I can move my information to another ticket if you prefer. |
I also have this networking issue in my k3s cluster. @jdmarble's repo worked :D |
I had hoped upstream would have fixed this issue in busybox, but that doesn't seem to be happening. Also, this is starting to affect more people. I have started a branch using the I'm a little concerned about the size increase (but it is not a show stopper): ❱ docker images | grep foundry
felddy/foundryvtt 0.7.9-slim ce29f9a2bc03 44 minutes ago 195MB
felddy/foundryvtt 0.8.0 f676a803cfcb 3 weeks ago 126MB
felddy/foundryvtt release e3706094d2a7 2 months ago 103MB
felddy/foundryvtt release-0.7.9 38a78b0459a4 2 months ago 103MB The bigger issue that I need to resolve is that only half of the architectures supported by Alpine are offered by Debian:
I don't have any idea how many users this would impact. I'd guess that loss of In any case, if you'd like to test the image from this branch it is available to be pulled as If you have any comments about the limited architectures, that would also be helpful. |
Could I also get folks to try running this and posting the results. I'm unable to reproduce the behavior here, and want to verify that it hasn't been fixed upstream: ❱ docker run -it --rm --dns 8.8.8.8 node:14-alpine nslookup foundryvtt.com
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Non-authoritative answer:
Name: foundryvtt.com
Address: 44.234.61.225 |
Sure thing. I'll test it this evening (or possibly tomorrow if I run out of time) and I'll post back here |
In case this is helpful.
Results Locally
inside k3s: (yaml included) (this also worked setting the dns server to 8.8.8.8)
|
@annonch Those are promising results. When you get a chance could you check if the nightly build is exhibiting the same behavior as the last release: If 🤞 |
Unfortunately I can't provide such good results. I'm running these in kubernetes I just curled the foundry website to to test resolution. Here I used I tried this against And against nightly it was all 4s.. I didn't get a single good hit to the foundry website. Now, if I'm the only one here I'm willing to concede that its just my setup, this may be unrelated, and I'm just making noise 😆 I can work around by setting my DNS policy to None and manually assigning DNS servers. |
@adam8797 how are you running your K8S I never had an issue with the DNS resolution using the alpine container. I did a 1000 requests in a row using @felddy example command and they all came out clean. I know that with K8S sometimes policies or security groups if you are using in AWS can result in some inconsistent DNS resolutions. I'm running foundry today in RPi4 with k3s, local with composer, and in a server with KIND and k8s for development and testing. If you guys have any other set of tests that I could run please let me know. |
I am also having this issue on a k8s cluster setup via kubeadm. This is the only container exhibiting the behavior and does so on both nightly and release-0.7.9. Not sure if it matters, but my k8s cluster is using CoreDNS and not kube-dns. |
Hi @aetaric how the network on your clusters are configured, I saw problems with k8s and CoreDNS naming resolution due to the security groups and firewalls connections between the nodes. On all my environments I never had issues and my K8S development that runs on AWS with EKS also has CoreDNS and doesn't have the problem. |
Well, I am using flannel as the backing network fabric. So no network policy antics should be going on. I am running in vxlan mode for communication between nodes so that might have something to do with it? As for physical and logical networking, all k8s nodes are same VLAN, same ToR switch, same subnet. As I mentioned before, other containers are able to resolve DNS without issue and |
So I might have some insight into what the container is doing weird here. I was reviewing my DNS query logs and it seems the container is appending the search domain from DHCP options to the foundry address.
|
I've resolved the DNS issue I've been having while running this and other Alpine based images in Kubernetes clusters on my network. Short answer: I turned off DNSSEC for my domain name managed by Cloudflare and everything started working. Read on for details. Some information about my setup:
Some general information about what causes the problem for me (and possibly for you):
Here are some links that helped me figure this out: I could verify that this was a problem and that my fix worked using alpine/git and dig. Before fix: [jdmarble@jdmarble-desktop ~]$ kubectl run alpine-git --image=alpine/git --restart=Never -it --rm clone https://github.com/octocat/Spoon-Knife.git
fatal: unable to access 'https://github.com/octocat/Spoon-Knife.git/': Could not resolve host: github.com
... (note that [jdmarble@jdmarble-desktop ~]$ dig github.com.mylocaldomain.tld
...
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 26637
...
;; AUTHORITY SECTION:
mylocaldomain.tld. 1720 IN SOA cleo.ns.cloudflare.com. dns.cloudflare.com. ...
... (note the After fix: [jdmarble@jdmarble-desktop ~]$ kubectl run alpine-git --image=alpine/git --restart=Never -it --rm clone https://github.com/octocat/Spoon-Knife.git
Cloning into 'Spoon-Knife'...
remote: Enumerating objects: 16, done.
remote: Total 16 (delta 0), reused 0 (delta 0), pack-reused 16
Receiving objects: 100% (16/16), done.
Resolving deltas: 100% (3/3), done. (note that [jdmarble@jdmarble-desktop ~]$ dig github.com.myinternaldomain.tld
...
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 56469
...
;; AUTHORITY SECTION:
myinternaldomain.tld. 1044 IN SOA cleo.ns.cloudflare.com. dns.cloudflare.com. ...
... (note the In my case, it was an easy decision to disable DNSSEC because the domain is only used internally and I'm not using Cloudflare for normal records. If you want to keep DNSSEC on, you may have to get creative or switch away from Cloudflare. |
I have upgraded my K8S Cluster to 1.22 and first time I got this error. Just to let registered here the fix for me was ensure that
|
I have not been able to fixt his yet but I suspect this may be an issue with core DNS. Lookups for foundryvtt.com appear to be failing because passthrough does not seem to be working from coredns logs
no lookups for foundryvtt.com though. |
I'll test again this on my 3 k8s clusters with the Alpine image (my default), and update here and in the other thread too. I'm still have the 8.8.8.8 on my CoreDNS so I'll try both, and edit this post My 3 clusters runs today K8S Version Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.0", GitCommit:"c2b5237ccd9c0f1d600d3072634ca66cefdf272f", GitTreeState:"clean", BuildDate:"2021-08-04T20:01:24Z", GoVersion:"go1.16.6", Compiler:"gc", Platform:"linux/amd64"} Runnin CoreDNS k8s.gcr.io/coredns/coredns:v1.8.4
Confirmed with the same
I have found something interesting that may solve the issue. Though the call to dns.lookup() will be asynchronous from JavaScript's perspective, it is implemented as a synchronous call to getaddrinfo(3) that runs on libuv's threadpool. This can have surprising negative performance implications for some applications, see the UV_THREADPOOL_SIZE documentation for more information. https://nodejs.org/api/cli.html#cli_uv_threadpool_size_size This solved my issue running 200 deployments. |
This issue has been automatically marked as stale because it has been inactive for 28 days. To reactivate the issue, simply post a comment with the requested information to help us diagnose this issue. If this issue remains inactive for another 7 days, it will be automatically closed. |
This issue has been automatically closed due to inactivity. If you are still experiencing problems, please open a new issue. |
Improve Dockerfile
🐛 Bug Report
I'd like to supply my own dns server with the container using the --dns attribute. Yet this is not correctly picked up and inserted into the /etc/resolv.conf. This makes it impossible for the container to run in bridge networking mode
To Reproduce
Steps to reproduce the behavior:
Expected behavior
I'm expecting the DNS to be populated into the /etc/resolv.conf. I don't know how ever why it isn't
This works fine for all my other containers I'm running.
Any helpful log output
I use docker-compose
Paste the results here:
The /etc/resolv.conf
The text was updated successfully, but these errors were encountered: