-
Notifications
You must be signed in to change notification settings - Fork 417
MSC4104: Auth Lock: Soft-failure-be-gone! #4104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
MSC4104: Auth Lock: Soft-failure-be-gone! #4104
Conversation
We need to say explicitly that all `m.auth_lock` events that have been created that match the `auth_event` are considered.
It's allowed because servers will still retain the history and an admin can easily recanonicalise the state so that joining servers can see the relevant history again.
| The receiving server must combine the `extremities` from both events | ||
| such that the canonical history becomes C -> B -> A. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if Alice sees Chelsea send B -> A but Bob sees C -> A instead, because Chelsea crafted two extremities from the same event, delivering one to Alice's HS and the other to Bob's HS?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a better example. However, provided Bob receives Alice's m.auth_lock, then Bob would reject any further events sent by Chelsea referencing A or C. Bob could also send their own m.auth_lock referencing Chelsea's prior membership that includes C within extremities. Then Alice will be able to pull in C when Alice gets Bob's m.auth_lock. Alice could then redact C.
Though this presents another problem when Bob is unprivileged and can't send m.auth_lock. The reason why we give bob the capability to send m.auth_lock is because we trust that he will observe and apply Alice's m.auth_lock and not leak more of Chelsea's events to us. And this means he's most likely already a room admin. This is a problem that already exists that soft failure doesn't prevent either element-hq/synapse#9329. It would be nice to find some kind of solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe negotiation of extremities could to considered separately, such that servers with unprivileged users could propose them and room admins could preview them and choose to accept or reject them. That does feel a little bit like over engineering for and entrenching a bad state of affairs though. But at least it's something that would work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I should probably clarify somewhere in the proposal that we don't propose eagerly issuing an m.auth_lock with every ban, we simply do not soft fail anymore. If we then see events being authorized that would have normally been candidates for soft failure, the room admin or their tooling would then issue m.auth_lock. In principle, this should reduce the number of instances where element-hq/synapse#9329 occurs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
negotiation
I'm forgetting that a server can ask for the prev_events, find the diverged extremity provide a preview to an admin and allow them to choose incorporate it, all without manual intervention from another server (other than continuing to participate)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dkasak I have rewritten the potential issues section to try cover this concern and all the solutions that I have discovered so far
| We counteract this issue by requiring that servers maintain all | ||
| existing `extremities` and their `prev_events`, regardless of | ||
| whether they are missing from the `extremities` field of | ||
| an `m.auth_lock` event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But what is "existing" at the point of receipt of the m.auth_lock event could differ across different servers participating in the room. Wouldn't this strategy lock them into having diverging views of the room?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, because users on those servers can issue their own m.auth_lock events for the same locked_event_id, which would allow them to converge again. Am I understanding this correctly as the same problem described in the second paragraph of https://github.com/matrix-org/matrix-spec-proposals/pull/4104/files#r1494856367 ?
| @@ -0,0 +1,204 @@ | |||
| # MSC4104: Auth Lock: Soft-failure-be-gone! | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding the proposal very hard to understand, and there seems to be some deal breakers in the potential issues section, like being able to lock all auth events and effectively locking the room..?
I suggest adding diagrams for each scenario to explain what is going on.
I am very interested in a solution better than soft failure, so if I can understand this proposal better than maybe this is something we can work with. However, the scenarios presented here make it seem like this is not the final form of this proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, the scenarios presented here make it seem like this is not the final form of this proposal.
It isn't, the MSC is still draft.
I'm finding the proposal very hard to understand
Is there a way to improve this outside of diagrams for the scenarios in the potential issues section?
I am very interested in a solution better than soft failure
Me too but I no longer know if this is the right approach
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Diagrams for normal operations would help too! Diagrams are great.
As of yesterday I think I now have an idea of how we can do away with soft fail, but I need to check various scenarios and think more on the performance implications. The gist is CALM theorem + state resolve literally all events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CALM theorem
I'm unfamiliar with CALM, is this the best resource to explain what this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's basically it.
In the intervening 9 months since writing that comment, I've been almost entirely consumed by DAG related work. There's a host of tradeoffs we'd need to make to get the semantics you're describing in this proposal. However, it's interesting that some of my thoughts are similar enough to what this proposal is doing. I have a few thoughts now on this proposal:
state_key: A sha256 of the locked_event_id concatenated with the mxid of the event sender. We do this so that different room admins can optionally specify their own versions of the extremities that will combine to form a complete set to be considered during authorization.
This makes no sense, as users on the same server will see the same events. Soft-failure is designed to protect against backdating, it isn't a general moderation tool (so multiple admins on the same server should always agree on the same set of extremities in practice).
extremities: A list of extremities that should be shared to other servers representing the canonicalised version of the room history. This is protected from redaction.
This is really no different from the server sending this event just sharing their forward extremities, which is prev_events, so just use that.
For each auth_event all m.auth_lock events with a locked_event_id matching the event_id of the auth_event are found from the room's current state. From each m.auth_lock event relevant to one auth_event, the extremities fields are combined to form a set of all extremities relevant to the auth_event. If the considered auth_event's event_id matches any m.auth_lock's locked_event_id field AND the event_id of the event being authorized is absent from both the extremities set AND the event_id of any connected event referenced via each extremity's prev_events, reject.
This is hard to parse, but I think what you're saying is "look at an event's auth_events, are they in the auth chain from the locked_event_id? If yes, this means they are at-risk of being rejected since they refer to a locked section of the DAG. Are the auth_events in the auth chain from extremities? If yes, then they are in the protected region of the DAG, so it's allowed. If no, then this is a concurrent fork, so reject."
The receiving server must combine the extremities from both events
You need to be more specific here, particularly when the boundary between locked/protected regions disagree. Using an intersection of locked regions ensures we don't accidentally reject events which we genuniely sent concurrently, but this demands servers never go away (as a dead server will never updated the locked region).
Solution A: Dedicated API for showing diverged events to room admins
This isn't user-friendly. My grandma would never be able to understand and use this for her group.
Solution B: Room administrators on other servers issue their own m.auth_lock
I like the idea of coordinating to converge on a consistent set of auth locks, but given auth locks block messages, this may be hard to achieve.
Solution C: Unprivileged users propose changes to the extremities of m.auth_lock
Similarly, I like the idea but it has the same problems as solution B. It's also not clear how unprivileged users would even know of this considering the "Dedicated API" is admin only.
Solution D: Recrafting PDUs
I've been calling this "rebasing" but yeah, it has its own set of problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes no sense, as users on the same server will see the same events. Soft-failure is designed to protect against backdating, it isn't a general moderation tool (so multiple admins on the same server should always agree on the same set of extremities in practice).
This is to allow room admins on different servers to also send the event because they might not see the same events. Which is why they might like to adjust the canonical room history to include the events that only they can currently see.
As for whether it's a moderation tool? it isn't, but the only reason this exists is because power levels cannot be revoked and instead Matrix uses soft failure to cover our ass when stale auth events are used, so unfortunately this does mean it may be necessary for us to micromanage soft failure.
This is really no different from the server sending this event just sharing their forward extremities, which is prev_events, so just use that.
ok.
This is hard to parse, but I think what you're saying is "look at an event's auth_events, are they in the auth chain from the locked_event_id? If yes, this means they are at-risk of being rejected since they refer to a locked section of the DAG. Are the auth_events in the auth chain from extremities? If yes, then they are in the protected region of the DAG, so it's allowed. If no, then this is a concurrent fork, so reject."
I think so.
You need to be more specific here, particularly when the boundary between locked/protected regions disagree. Using an intersection of locked regions ensures we don't accidentally reject events which we genuniely sent concurrently, but this demands servers never go away (as a dead server will never updated the locked region).
I don't understand what you mean here?
Rendered
Signed-off-by: Gnuxie [email protected]