linux-nerds.org

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Backfilling Conversations: Two Major Approaches

ActivityPub

9 Beiträge 5 Kommentatoren 0 Aufrufe

J This user is from outside of this forum
J This user is from outside of this forum
julian@community.nodebb.org

schrieb zuletzt editiert von julian@community.nodebb.org

#1

In February 2025, I presented a topic at FOSDEM in Brussels entitled The Fediverse is Quiet — Let's Fix That! In it, I outlined several "hard problems" endemic to the fediverse, focusing on one particular complaint that is often voiced by newcomers and oldtimers alike; that the fediverse is quiet because you don't ever see the full conversation due to some design considerations made at the protocol level.

Since then there have been a number of approaches toward solving this problem, and it is worth spending the time to review the two main approaches and their pros and cons.

N.B. I have a conflict of interest in this subject as I am a proponent of one of the approaches (FEP 7888/f228) outlined below. This article should be considered an opinion piece.

Crawling of the reply tree

First discussed 15 April 2024 and merged into Mastodon core on 12 Mar 2025, jonny@neuromatch.social pioneered this approach to "fetch all replies" by crawling the entirety of the reply tree. When presented with an object, the Mastodon service would make a call to the context endpoint, and if supported(?) would start to crawl the reply tree via the replies collection, generating a list of statuses to ingest.

This approach is advantageous for a number of reasons, most notably that inReplyTo and replies are properties that are ubiquitous among nearly all implementations and their usage tends not to differ markedly from one another.

N.B. I am not certain whether the service would crawl up the inReplyTo chain first, before expanding downwards, or whether context is set in intermediate and leaf nodes that point to the root-level object.

One disadvantage is this approach's susceptibility to network fragility. If a single node in the reply tree is temporarily or permanently inaccessible, then every branch of the reply tree emanating from that node is inaccessible as well.

Another disadvantage is the reliance on intermediate nodes for indexing the reply tree. The amount of work (CPU time, network requests, etc.) scales linearly with the size of the reply tree, and more importantly discoverability of new branches of the reply tree necessitate a re-crawl of the entire reply tree. For fast-growing trees, this may not net you a complete tree depending on when you begin crawling.

Lastly, in the ideal case, a full tree crawl would net you a complete tree with all branches and leaves. Great!

Mastodon is the sole implementor of this approach, although it is not proprietary or special to Mastodon by any means.

FEP 7888/f228, or FEP 171b/f228

Summarized by silverpill@mitra.social in FEP f228 (as an extension of FEPs 7888 by trwnh@mastodon.social and 171b by mikedev@fediversity.site), this conversational backfill approach defines the concept of a "context owner" as referenced by compatible nodes in the tree. This context owner returns an OrderedCollection containing all members of the context.

A major advantage of this approach centers around the pseudo-centralization provided by the context owner. This "single source of truth" maintains the index of objects (or activities) and supplies their IDs (or signed full activities) on request. Individual implementations then retrieve the objects (or activities). It is important to note that should the context owner become inaccessible, then backfill is no longer possible to achieve. On the other hand, a dead or unresponsive intermediate node will not affect the ability of the downstream nodes to be processed.

The context owner is only able to respond with a list of objects/activities that it knows about. This does mean that downstream branches that do not propagate upwards back to the root will not be known to the context owner.

Additionally, consumers are also able to query the context owner for an index without needing to crawl the entire reply tree. The ability to de-duplicate objects at this level reduces the overall number of network requests (and CPU time from parsing retrieved objects) required, making this approach relatively more efficient.

Additional synchronization methods (via id hashsums) could be leveraged to reduce the number of network calls further.

A number of implementors follow this approach to backfill, including NodeBB, Discourse, WordPress, Frequency, Mitra, and Streams. Additional implementors like Lemmy and Piefed have expressed interest.

One technical hurdle with this approach is technical buy-in from implementors themselves. Unlike crawling a reply tree, this approach only works when the context owner supports it, and thus should be combined with various other backfill strategies as part of an overall conversational backfill solution.

Conclusion

2025 is shaping up to be an exciting year for resolving some of the harder technical and social problems endemic to the open social web/fediverse. It is this author's opinion that we may be able to make good headway towards resolving the "quiet fedi" problem with these two approaches.

It is important to note that neither approach conflicts with the other. Implementations are free to utilise multiple approaches to backfill a conversation. Both methods presented here have pros and cons, and a combination of both (or more) could be key.

Feel free to use this as a starting point for discussions regarding either approach. Does one speak to you more than the other? Are the cons of either approach significant enough for you to disregard it? What other approaches or changes could you recommend?
1 Antwort Letzte Antwort

0
R This user is from outside of this forum
R This user is from outside of this forum
robz@toot.robzazueta.com

schrieb zuletzt editiert von

#2

@julian Quick, somewhat unrelated note - I follow you on Mastodon and see your posts with the HTML tags showing. Is NodeBB escaping those tags prior to sending out the AP message?
J 1 Antwort Letzte Antwort

0
T This user is from outside of this forum
T This user is from outside of this forum
trwnh@mastodon.social

schrieb zuletzt editiert von

#3

@julian unrelated to the post, but the links to the FEPs are malformed and seem to be missing the https: scheme
J 1 Antwort Letzte Antwort

0
T trwnh@mastodon.social

@julian unrelated to the post, but the links to the FEPs are malformed and seem to be missing the https: scheme
J This user is from outside of this forum
J This user is from outside of this forum
julian@community.nodebb.org

schrieb zuletzt editiert von

#4

trwnh@mastodon.social thanks, I've updated them to add the protocol. I guess you can't rely on support for protocol-relative URLs everywhere
1 Antwort Letzte Antwort

0
R robz@toot.robzazueta.com

@julian Quick, somewhat unrelated note - I follow you on Mastodon and see your posts with the HTML tags showing. Is NodeBB escaping those tags prior to sending out the AP message?
J This user is from outside of this forum
J This user is from outside of this forum
julian@community.nodebb.org

schrieb zuletzt editiert von

#5

Hi robz@toot.robzazueta.com! This could be related to some better support for non-Note types introduced by Mastodon in later versions. Your instance is running v4.1.18 which is 11 months behind the latest version.

That isn't necessarily cause for concern, but I think that might be why you're seeing the HTML tags?
1 Antwort Letzte Antwort

0
R This user is from outside of this forum
R This user is from outside of this forum
robz@toot.robzazueta.com

schrieb zuletzt editiert von

#6

@julian Ah... that actually may make more sense - thanks.
I'm working on my own AP implementation and hadn't yet run into this issue, so assumed.
Time to upgrade!
1 Antwort Letzte Antwort

0
S This user is from outside of this forum
S This user is from outside of this forum
silverpill@mitra.social

schrieb zuletzt editiert von

#7

@julian @trwnh @mikedev
neither approach conflicts with the other
I don't fully agree with this statement, because these "threading paradigms" suggest two different solutions to the problem of moderation. If the OP is the single source of truth, they can moderate the entire conversation (represented by context collection: Streams). If not, then each reply is independent and authors moderate only the direct replies (represented by replies collections: GoToSocial).
In theory two solutions can be combined, but at the cost of significantly increased complexity.
J 1 Antwort Letzte Antwort

0
J This user is from outside of this forum
J This user is from outside of this forum
jonny@neuromatch.social

schrieb zuletzt editiert von

#8

@julian
N.B. I am not certain whether the service would crawl up the inReplyTo chain first, before expanding downwards, or whether context is set in intermediate and leaf nodes that point to the root-level object.
Current impl starts at the expanded post and goes down - one can start a crawl at any point in a tree. If one starts at a lower point in the tree and then triggers a crawl higher up in the tree, lower part only gets crawled once within a configurable cooldown period to avoid double crawling.
1 Antwort Letzte Antwort

0
S silverpill@mitra.social

@julian @trwnh @mikedev
neither approach conflicts with the other
I don't fully agree with this statement, because these "threading paradigms" suggest two different solutions to the problem of moderation. If the OP is the single source of truth, they can moderate the entire conversation (represented by context collection: Streams). If not, then each reply is independent and authors moderate only the direct replies (represented by replies collections: GoToSocial).
In theory two solutions can be combined, but at the cost of significantly increased complexity.
J This user is from outside of this forum
J This user is from outside of this forum
julian@community.nodebb.org

schrieb zuletzt editiert von

#9

silverpill@mitra.social said:
> If the OP is the single source of truth, they can moderate the entire conversation (represented by context collection: Streams). If not, then each reply is independent and authors moderate only the direct replies (represented by replies collections: GoToSocial).

That is a good point. The approaches are broadly compatible when top-down moderation by the context owner is not assumed.

In a moderated scenario, crawling the reply tree would not be useful unless paired with some sort of "is member of" validation with the context owner... at which point the served collection would be more performant.

It could be useful for discovery by the context owner itself though.
1 Antwort Letzte Antwort

0

Anmelden zum Antworten

J

Fun with Federation: Lemmy edition
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub nodebb lemmy activitypub
5

0 Stimmen

5 Beiträge

57 Aufrufe

J

nutomic@lemmy.ml let me know if I got any of the details wrong. Much thanks to your team for the assist in debugging!
G

#activitypub #mastodev
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub Test Kategorie activitypub mastodev
3

1

0 Stimmen

3 Beiträge

46 Aufrufe

J

thisismissem@hachyderm.io oh god do I have to handle this too
J

Automatic category/community assignment on received object
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub activitypub threadiverse
3

0 Stimmen

3 Beiträge

117 Aufrufe

J

silverpill@mitra.social I thought about checking against the outbox, but there's a potential race condition that could occur if I receive the Create(Note) at roughly the same time as the community, but the community hasn't processed the activity yet. In that scenario, the activity would not be in the outbox for checking. The same thing would happen if there was some out-of-band check for object membership in a collection (not that there is one right now).
A

@FediTips If I reply to a post from someone who has restricted who can follow them, then who can see my reply?
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub Test Kategorie mastodev activitypub groups
6

0 Stimmen

6 Beiträge

142 Aufrufe

J

feditips@social.growyourown.services ahaldorsen@tutoteket.no feel free to reach out if you have trouble setting up or administering NodeBB. We're on the fediverse, and happy to be here!
R

We are implementing the final version of RFC9421 (HTTP Signatures) in Mastodon, and would like to test this with other ActivityPub implementations.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub Test Kategorie activitypub
3

0 Stimmen

3 Beiträge

101 Aufrufe

J

@renchap@oisaur.com how does RFC9421 differ from Mastodon's existing support for HTTP Signatures? Does this mean you're moving away from cavage-12? That's important to know, and if you're looking for an implementor to handle double-knocking, that is something I can put together for you.. we don't do it at current.
H

As far as I understand, most (all?) fediverse #ActivityPub software does not use the Client-to-server protocol from the specs (https://www.w3.org/TR/activitypub/#client-to-server-interactions) but rather use custom APIs instead.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub Test Kategorie activitypub
62

0 Stimmen

62 Beiträge

3k Aufrufe

J

@trwnh@mastodon.social but why must a separate account be made? Account fragmentation is yet another unsolved problem because the new user on account B is functionally useless: no followers, etc. and the content isn't automatically available to the followers of the user on instance A.
R

Previews in ActivityPub / ActivityStreams is what should bind the disparate software and user-experiences on the Fediverse.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub Test Kategorie themastodoninth activitypub activitystreams deso fedidev
5

0 Stimmen

5 Beiträge

103 Aufrufe

J

@julian yes we've had this for 6+ months
C

Had a great conversation with @mike on the history of the hashtag, why Elon is wrong about it, the future of social networks (vis a vis #ActivityPub, #ATProto, etc), and how LLMs may intersect with social media moderation.
Beobachtet Ignoriert Geplant Angeheftet Gesperrt Verschoben ActivityPub Test Kategorie activitypub atproto
2

0 Stimmen

2 Beiträge

51 Aufrufe

J

@chrismessina@mastodon.xyz sounds like a great episode! Can't wait.

linux-nerds.org

Backfilling Conversations: Two Major Approaches

Crawling of the reply tree

FEP 7888/f228, or FEP 171b/f228

Conclusion