Skip to content

"Token not active" error when logging in via OIDC #8537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smolinari opened this issue Jul 17, 2023 · 22 comments · Fixed by #8950
Closed

"Token not active" error when logging in via OIDC #8537

smolinari opened this issue Jul 17, 2023 · 22 comments · Fixed by #8950
Assignees
Labels
need-help Assign this label prompts an engineer to check the issue. Only humans may set this. s2 Broken use cases or features (with a workaround). Only humans may set this.

Comments

@smolinari
Copy link

smolinari commented Jul 17, 2023

Hi,

Before 0.25, I could log in to my Keycloak instance via another application and then come to my Coder instance and click on the "Open ID Connect" button and since my OIDC session was already started, I would be logged in immediately.

Now, when I click the button, I get this:

{"message":"Could not refresh expired Oauth token.","detail":"oauth2: \"invalid_grant\" \"Token is not active\""}

If I delete all cookies, I get the Keycloak log in screen, and I can log in (again) via Keycloak and then everything works as it should.

I believe something is (still?) wrong with the OIDC refresh process and I'm uncertain it is now my setup or an actual issue.

Scott

Coder version: 0.26.1
Environment: k8s

@cdr-bot cdr-bot bot added the bug label Jul 17, 2023
@smolinari smolinari changed the title "Token not active" error when logging in "Token not active" error when logging in via OIDC Jul 17, 2023
@smolinari
Copy link
Author

smolinari commented Jul 18, 2023

I can confirm this happened again today. OIDC token for Coder is expired, but I've logged myself into a new session with Keycloak via another application and Coder doesn't catch that I'm logged in to a new Keycloak session and gives me the error above.

If I delete only the Coder cookie and hit the "Open ID Connect" button, then I'm directly logged in (so no additional login). After that, everything works as it should.

To note once more, this login process was working perfectly before 0.25.

Scott

@smolinari
Copy link
Author

smolinari commented Jul 20, 2023

Update. Yesterday although I was already logged in, Coder logged me out after several hours (not sure how many, but more than 5, and less than 10), despite the refresh token from Keycloak being set for 10 hours. So, I clicked on the "Open ID Connect" button to log in and got the error above, deleted the cookie, got the Keycloak login screen (although my Keycloak session was active) and logged in and was in.

This morning, I logged in via a different app with Keycloak as I usually do, then moved to our Coder instance and clicked the "Open ID Connect" button and was immediately logged in, as if nothing was broken. I'll report again tomorrow about how the login goes again.

Question or suggestion though, couldn't Coder follow the refresh timeout of the Keycloak (OIDC provider) token to log me out, when the token expires expires?

Scott

@matifali matifali added need-help Assign this label prompts an engineer to check the issue. Only humans may set this. and removed need-help Assign this label prompts an engineer to check the issue. Only humans may set this. labels Jul 20, 2023
@smolinari
Copy link
Author

Ok. Last report. This morning everything worked like it should. Clicked on the "Open ID Connect" button in Coder after logging in via another app and I was immediately logged in.

I'm closing this issue. If I can figure out why I was having the issue initially, I'll report back again, but I doubt I will. Sorry to have wasted anyone's time.

Scott

@bpmct
Copy link
Member

bpmct commented Jul 21, 2023

Hmm... thanks for the rundown and sorry that you're running into issues. If this happens again, could you ensure CODER_VERBOSE is set to true and send any server logs occurring at that time?

@smolinari
Copy link
Author

@bpmct - Sure thing Ben. Willdo.

Scott

@smolinari
Copy link
Author

smolinari commented Jul 22, 2023

Small update: Got this refreshing the workspaces page with an expired token.

image

Refreshed the page, hit the "Open ID Connect" button and was faced with this again.

image

I didn't have the logs set to verbose. Doing that now. I'll report back again, once I have the logs. I'll leave it to you all to open the issue up again or not. 😁

Scott

@smolinari
Copy link
Author

smolinari commented Jul 24, 2023

Ok. This is the log (in VERBOSE mode) in the coder server, when I try to log in and get the error shown in the OP. Can't say I can see that it sheds any light on the situation. But, for sure I'm very far from any expert. 😁

2023-07-24 07:17:39.706 [warn]  coderd: run replica update loop ...
    error= update replica:
               github.com/coder/coder/enterprise/replicasync.(*Manager).syncReplicas
                   /home/runner/actions-runner/_work/coder/coder/enterprise/replicasync/replicasync.go:304
             - sql: no rows in result set
2023-07-24 07:17:44.350 [debu]  coderd.metrics_cache: deployment stats metrics refreshed  took=45.830902ms  interval=30s
2023-07-24 07:17:44.474 [debu]  coderd: GET  host=coder-zeus.m8a.dev  path=/api/v2/appearance  proto=HTTP/1.1  remote_addr=10.42.5.38  took=52.23784ms  status_code=200  latency_ms=52  request_id=12832d4d-804c-4c72-b528-8d4e0f783500
2023-07-24 07:17:44.618 [warn]  coderd: run replica update loop ...
    error= update replica:
               github.com/coder/coder/enterprise/replicasync.(*Manager).syncReplicas
                   /home/runner/actions-runner/_work/coder/coder/enterprise/replicasync/replicasync.go:304
             - sql: no rows in result set
2023-07-24 07:17:45.588 [debu]  coderd: read stats report  request_id=01a51e28-e49e-4f83-8b2c-f72e540e3cc7  interval=30s  workspace_agent_id=571877d9-e43e-459d-80de-275b801c25c7  workspace_id=ebeddb84-ce0e-4760-a3c6-51289cc343ae  payload={"connections_by_proto":{"TCP":1},"connection_count":1,"connection_median_latency_ms":47.265,"rx_packets":371,"rx_bytes":240380,"tx_packets":302,"tx_bytes":28908,"session_count_vscode":1,"session_count_jetbrains":0,"session_count_reconnecting_pty":0,"session_count_ssh":0,"metrics":[{"name":"derp_home_change","type":"counter","value":0},{"name":"issue_869_workaround","type":"counter","value":0},{"name":"magicdns_packet_in","type":"counter","value":0},{"name":"magicsock_disco_recv_bad_key","type":"counter","value":0},{"name":"magicsock_disco_recv_bad_parse","type":"counter","value":0},{"name":"magicsock_disco_recv_bad_peer","type":"counter","value":0},{"name":"magicsock_disco_recv_bpf_ipv4","type":"counter","value":0},{"name":"magicsock_disco_recv_bpf_ipv6","type":"counter","value":0},{"name":"magicsock_disco_recv_callmemaybe","type":"counter","value":16909},{"name":"magicsock_disco_recv_callmemaybe_bad_disco","type":"counter","value":0},{"name":"magicsock_disco_recv_callmemaybe_bad_node","type":"counter","value":0},{"name":"magicsock_disco_recv_derp","type":"counter","value":96859},{"name":"magicsock_disco_recv_ping","type":"counter","value":51619},{"name":"magicsock_disco_recv_pong","type":"counter","value":28331},{"name":"magicsock_disco_recv_udp","type":"counter","value":0},{"name":"magicsock_disco_send_derp","type":"counter","value":87341},{"name":"magicsock_disco_send_udp","type":"counter","value":109574},{"name":"magicsock_disco_sent_callmemaybe","type":"counter","value":7358},{"name":"magicsock_disco_sent_derp","type":"counter","value":87341},{"name":"magicsock_disco_sent_ping","type":"counter","value":80954},{"name":"magicsock_disco_sent_pong","type":"counter","value":51619},{"name":"magicsock_disco_sent_udp","type":"counter","value":52590},{"name":"magicsock_netmap_num_peers","type":"gauge","value":1},{"name":"magicsock_num_derp_conns","type":"gauge","value":1},{"name":"magicsock_rebind_calls","type":"counter","value":496},{"name":"magicsock_recv_data_derp","type":"counter","value":536995},{"name":"magicsock_recv_data_ipv4","type":"counter","value":0},{"name":"magicsock_recv_data_ipv6","type":"counter","value":0},{"name":"magicsock_restun_calls","type":"counter","value":3506},{"name":"magicsock_send_data","type":"counter","value":505479},{"name":"magicsock_send_data_network_down","type":"counter","value":0},{"name":"magicsock_send_derp","type":"counter","value":592790},{"name":"magicsock_send_derp_error","type":"counter","value":27},{"name":"magicsock_send_derp_error_chan","type":"counter","value":0},{"name":"magicsock_send_derp_error_closed","type":"counter","value":0},{"name":"magicsock_send_derp_error_queue","type":"counter","value":0},{"name":"magicsock_send_derp_queued","type":"counter","value":592820},{"name":"magicsock_send_udp","type":"counter","value":52590},{"name":"magicsock_send_udp_error","type":"counter","value":0},{"name":"magicsock_update_endpoints","type":"counter","value":3498},{"name":"netcheck_https_measure","type":"counter","value":503},{"name":"netcheck_report","type":"counter","value":3497},{"name":"netcheck_report_error","type":"counter","value":0},{"name":"netcheck_report_full","type":"counter","value":262},{"name":"netcheck_stun_recv_ipv4","type":"counter","value":3614},{"name":"netcheck_stun_recv_ipv6","type":"counter","value":0},{"name":"netcheck_stun_send_ipv4","type":"counter","value":3698},{"name":"netcheck_stun_send_ipv6","type":"counter","value":0},{"name":"packet_reflect_to_os","type":"counter","value":0},{"name":"portmap_pcp_address_mismatch","type":"counter","value":0},{"name":"portmap_pcp_not_authorized","type":"counter","value":0},{"name":"portmap_pcp_ok","type":"counter","value":0},{"name":"portmap_pcp_sent","type":"counter","value":3497},{"name":"portmap_pcp_unhandled_response_code","type":"counter","value":0},{"name":"portmap_pmp_network_failure","type":"counter","value":0},{"name":"portmap_pmp_not_authorized","type":"counter","value":0},{"name":"portmap_pmp_ok","type":"counter","value":0},{"name":"portmap_pmp_out_of_resources","type":"counter","value":0},{"name":"portmap_pmp_sent","type":"counter","value":3497},{"name":"portmap_pmp_unhandled_opcode","type":"counter","value":0},{"name":"portmap_pmp_unhandled_response_code","type":"counter","value":0},{"name":"portmap_pxp_response","type":"counter","value":0},{"name":"portmap_upnp_ok","type":"counter","value":0},{"name":"portmap_upnp_parse_err","type":"counter","value":0},{"name":"portmap_upnp_response","type":"counter","value":0},{"name":"portmap_upnp_sent","type":"counter","value":3497},{"name":"portmap_upnp_updated_meta","type":"counter","value":0},{"name":"wgengine_major_changes","type":"counter","value":0},{"name":"wgengine_minor_changes","type":"counter","value":0},{"name":"agent_reconnecting_pty_connections_total","type":"counter","value":0},{"name":"agent_sessions_total","type":"counter","value":2,"labels":[{"name":"magic_type","value":"vscode"},{"name":"pty","value":"no"}]},{"name":"agent_ssh_server_failed_connections_total","type":"counter","value":0},{"name":"agent_ssh_server_sftp_connections_total","type":"counter","value":0},{"name":"agent_ssh_server_sftp_server_errors_total","type":"counter","value":0}]}
2023-07-24 07:17:45.735 [debu]  coderd.activity_bump: bumped deadline from activity  request_id=01a51e28-e49e-4f83-8b2c-f72e540e3cc7  workspace_id=ebeddb84-ce0e-4760-a3c6-51289cc343ae
2023-07-24 07:17:45.787 [debu]  coderd: POST  host=coder-zeus.m8a.dev  path=/api/v2/workspaceagents/me/report-stats  proto=HTTP/1.1  remote_addr=10.42.5.38  took=254.735534ms  status_code=200  latency_ms=254  request_id=01a51e28-e49e-4f83-8b2c-f72e540e3cc7
2023-07-24 07:17:46.081 [debu]  coderd: GET  host=10.42.6.103:8080  path=/healthz  proto=HTTP/1.1  remote_addr=10.42.6.1  took=86.291µs  status_code=200  latency_ms=0  request_id=3710ef79-e942-432e-92a5-78e72d665fbb
2023-07-24 07:17:46.082 [debu]  coderd: GET  host=10.42.6.103:8080  path=/healthz  proto=HTTP/1.1  remote_addr=10.42.6.1  took=44.202µs  status_code=200  latency_ms=0  request_id=6370d08b-2c86-4ddb-8ae1-e6dce8e7ac86
2023-07-24 07:17:49.620 [warn]  coderd: run replica update loop ...
    error= update replica:
               github.com/coder/coder/enterprise/replicasync.(*Manager).syncReplicas
                   /home/runner/actions-runner/_work/coder/coder/enterprise/replicasync/replicasync.go:304
             - sql: no rows in result set
2023-07-24 07:17:54.612 [warn]  coderd: run replica update loop ...
    error= update replica:
               github.com/coder/coder/enterprise/replicasync.(*Manager).syncReplicas
                   /home/runner/actions-runner/_work/coder/coder/enterprise/replicasync/replicasync.go:304
             - sql: no rows in result set
2023-07-24 07:17:56.082 [debu]  coderd: GET  host=10.42.6.103:8080  path=/healthz  proto=HTTP/1.1  remote_addr=10.42.6.1  took=174.717µs  status_code=200  latency_ms=0  request_id=65f4cf66-21a3-4a55-b836-d3e50413891e
2023-07-24 07:17:56.082 [debu]  coderd: GET  host=10.42.6.103:8080  path=/healthz  proto=HTTP/1.1  remote_addr=10.42.6.1  took=411.288µs  status_code=200  latency_ms=0  request_id=cbe6aa92-b064-4859-b9ea-1219108cbdcb

Scott

@bpmct bpmct reopened this Jul 24, 2023
@bpmct bpmct added the need-help Assign this label prompts an engineer to check the issue. Only humans may set this. label Jul 24, 2023
@matifali matifali added need-help Assign this label prompts an engineer to check the issue. Only humans may set this. and removed need-help Assign this label prompts an engineer to check the issue. Only humans may set this. labels Jul 25, 2023
@deansheather
Copy link
Member

When you login with OIDC we generate a coder API key with fixed lifespan (not affected by the OIDC expiry) and store an "oauth link" in the DB with the access token, refresh token and expiry from the provider. The Coder API key's lifespan is increased automatically when you make requests. When you make a request using an API key generated through OIDC login, we verify that the OIDC access token hasn't expired yet, and if it has we will try to refresh it. If the refresh fails then we delete the API key which logs you out.

The oauth2: "invalid_grant" "Token is not active" error comes from your OIDC provider when we try to refresh the token. Since it's happening during login it probably means that the token isn't active or is already expired before Coder receives it via token exchange.

Searching online for keycloak "Token is not active" says that it could be an issue with the server time/timezone, or it could be an issue with the "SSO session time" and "revoke refresh token" settings.

@smolinari
Copy link
Author

smolinari commented Jul 25, 2023

I'd like to agree it could be a Keycloak issue, but the OIDC login with Coder with an OIDC session already started was working perfectly before upgrading to 0.25.

It seems to me, once logged out, instead of requesting a new refresh token, Coder is trying to refresh with the old token. Could that be?

You wrote:

Since it's happening during login it probably means that the token isn't active or is already expired before Coder receives it via token exchange.

If I'm logged out of Coder, there should be no current (active or inactive/ expired) token AFAIK. Or, let me ask, which token are you speaking of? I believe, if I'm already logged in via an OIDC session to Keycloak, Coder should ping Keycloak and since I'm already logged in, Coder should get a brand new token.

Or, put another way. once logged out of Coder and I don't have an OIDC session open, then I should get the Keycloak login screen. This works correctly. At that point, I can log into Keycloak and I'm successfully logged in to Coder with a new token and a new OIDC session. After that, I can also go and do "SSO" logins with my other apps with no issues.

Unfortunately, it doesn't work the other way around with Coder and it did before 0.25. And I'm pretty much certain I've made no changes to Keycloak to possibly break the SSO login via OIDC (session) either.

Scott

@smolinari
Copy link
Author

smolinari commented Jul 27, 2023

I just noticed something. My usual morning routine is to reopen the windows in Chrome I had open the day before. When I get to Coder, it redirects me to the login page with

https://coder-zeus.m8a.dev/login?redirect=%2F%40smolinari%2Fzeus-dev

Today, on a whim, I decided to just go back to the URL only, so I shortened the URL to https://coder-zeus.m8a.dev in the browser, hit enter, and I was again redirected to the login page. But, this time the login worked as expected. Not sure this will help find the issue, but I hope it will.

I'll experiment some more tomorrow and report back.

Scott

@smolinari
Copy link
Author

smolinari commented Jul 28, 2023

Today, because I had overwritten my Chrome history and couldn't open up the whole window of tabs, I opened a brand new window and went to my Coder instance (after getting a new OIDC session via another app) and the log in worked as it should with one click.

So, I believe the issue is when I access a page other than the index page in Coder and the process of the redirect to the login page isn't doing what it should.

The issue also happened to me yesterday, after a few hours of non-use of Coder. Navigating to another page ended up in the redirect to the login page, which ended up in the error after pressing the "Open ID Connect" button. My OIDC sessions are set to 10 hours, so the session was still active. Is there a way to extend Coder's session timeout to match the OIDC's session timeout? If that should be another issue (suggestion), please let me know.

Scott

@matifali
Copy link
Member

@smolinari to set coder session duration you can use https://coder.com/docs/v2/latest/cli/server#--session-duration

@smolinari
Copy link
Author

@matifali - Thanks. According to those docs, the session timeout is 24 hours. I'm not seeing that. It's much shorter. Like 4 hours. Could that be?

I'll set it to the same 10 hours as the OIDC session and see what happens.

Scott

@smolinari
Copy link
Author

smolinari commented Jul 29, 2023

So, after setting Coder's token session duration to 10 hours, I could log in normally after Coder redirected me from the workspaces page (which was saved as an open tab) to the login page.

I'll continue to monitor this for a couple of days and if all works as it should, I think we can close this issue again.

Scott

@smolinari
Copy link
Author

Today I opened a "history" page in the browser of our coder instance, one from two days ago and with definitely old cookies. It caused a redirect to the login page. And, I was met with the error in the OP. Instead of deleting all cookies, I went into dev tools and deleted cookies one by one to see which one was the culprit. It was the coder_session_token. After deleting it only, I could log in normally.

Scott

@shawnweeks
Copy link

shawnweeks commented Aug 1, 2023

This definitely seems like a bug in Coder, I'm running several other ODIC apps like GitLab, WikiJS, and Rancher and none of them do this. It seems like Coder should force a re-login when this happens instead of making the user have to manually delete the cookie from their browser. A token being expired is a normal occurrence and could happen for any number of reasons such as a forced logout on the IDM side. Issue 8179 is probably the same issue. Verified this is still happening in v0.27.2

@matifali matifali added the s2 Broken use cases or features (with a workaround). Only humans may set this. label Aug 2, 2023
@z3ky
Copy link

z3ky commented Aug 5, 2023

Same behavior with our system

@z3ky
Copy link

z3ky commented Aug 6, 2023

@smolinari, problem is probably resolved in 2.0.0. i haven't had any issues since the update.

@mforell
Copy link

mforell commented Aug 8, 2023

We've just updated to v2.0.0, but we're still facing the issue with the token. First, we receive this message in Coder when the window is not in focus for a while:

Could not refresh expired OAuth token.
oauth2: "invalid_grant" "Session lacks the necessary client"

Upon reloading the page, we get logged out. After logging back in, we encounter the following error:
{"message":"Could not refresh expired OAuth token.","detail":"oauth2: \"invalid_grant\" \"Refresh token issued prior to the start of the client session\""}.

maybe it is connected to this issue (but I am not sure) #8179

@shawnweeks
Copy link

Confirming this still happens on 2.0.0 for me as well. Only fix is to delete the cookie or users can't login ever again.

@smolinari
Copy link
Author

I upgraded to 2.0.2 and the issue is now gone. Thank you for the fix!

Scott

@mforell
Copy link

mforell commented Aug 14, 2023

With 2.0.2 we have no more problems either. A big thanks to you guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
need-help Assign this label prompts an engineer to check the issue. Only humans may set this. s2 Broken use cases or features (with a workaround). Only humans may set this.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants