hashtag

How might adding search change Mastodon?

March 11, 2023

with Michael Maes & Davide Grossi

1. Arguments about adding search to Mastodon

Mastodon presently does not support full text search: it is not possible to search for words that are not accompanied by # hashtag, that is, it is not possible to search for words that have not been intentionally made available for search. Users (particularly those migrating from Twitter) regularly lament this absence, leading for calls to its inclusion.

However, the absence of full text search has, to date, been a conscious design choice: For example, Mastodon’s founder and lead developer Eugen Rochko noted in 2017:

“If text search is ever implemented, it should be limited to your home timeline/mentions only. Lack of full-text search on general content is intentional, due to negative social dynamics of it in other networks.”

In keeping with this, the Mastodon Project currently supports only limited functionality search:

“Mastodon’s full-text search allows logged in users to find results from their own statuses, their mentions, their favourites, and their bookmarks. It deliberately does not allow searching for arbitrary strings in the entire database.”

At issue are multiple underlying concerns from protecting marginalised groups from intrusion and harassment to consciously anti-viral design (on Mastodon’s anti-viral design more generally, see Thompson, 2022).

The absence of unrestricted text search is contested through regular requests for the inclusion of search through posts on Mastodon, specific Github requests for feature changes, but also attempts to simply circumvent these restrictions through the provision of alternative software tools.

Arguments for unrestricted text search typically appeal to individual freedom or individual ‘rights’ as a rationale. In keeping with this individual-focussed perspective, much of the discussion on ameliorating the impact of search focusses on consent. While opt-in to discoverability as a design-choice clearly addresses some concerns about search, it does not address the fundamental issue that “negative social dynamics” (real or imagined) are *system level properties* of the online discourse as a whole. This means they may potentially impact all users in some way, whether they consent or not. This raises multiple questions about the relationship between individuals on the platform, between individuals and the collective, and platform governance. We return to these implications below; the main goal of this blog post, however, is to make such talk of ‘system level properties’ non-mysterious in order to help promote better thinking and discussion about the design of online communication systems. For this, we present a toy example designed to provide some basic insights into online communities as ‘complex systems’ and how they might be altered by a feature such as text search.

2. Online communication as a complex system

Complex systems are systems characterised by large numbers of interactions between their (often simple) components that give rise to emergent properties of the system as a whole that are typically difficult to predict (e.g., Ladyman et al., 2013).

Communicating individuals form social networks (such as the sample network seen in the figure below) in which individuals are nodes and their communication paths are represented by links between those nodes. Even though each individual may have direct connections with only a handful of other individuals, the interconnections between individuals may link them, collectively, into a much, much larger network. In this way, individual Mastodon users are linked not only to direct followers, but indirectly to those followers’ followers and so on.

Understanding online communication platforms as complex systems consequently involves trying to understand how information is propagated across such networks and what kind of emergent patterns this may to give rise to. For example, although it is individuals that read, write, and boost posts, we may think of their interactions combining to determine characteristics of the discourse across the network as a whole. Is the discourse predominantly friendly or hostile? Do focal topics that attract widespread interest persist, or is attention fleeting and fragmented? What kinds of community norms govern interactions?

To understand questions like these, researchers make use of computer simulations involving agent based models (ABM). Such models allow one to explore the behaviour of systems in ways that we could never do in real life. Figure 1 shows the interface of a simple model created to illustrate some basic aspects of online communication systems (a link to the model itself and instructions for further, hands on, exploration can be found below).

The simulation creates a simple social network. It then simulates a contagion process across that network. Such models are widely used to study opinion dynamics or the spread of behaviour. They simply appropriate the notion of ‘being infected’ to receiving a message or witnessing a behaviour. Depending on what we are trying to model, it may make sense to think of the process of spread as either a ‘simple’ or a ‘complex contagion’. For a simple contagion, a single exposure is enough for ‘infection’. For a complex contagion, multiple exposures are required. This can be modelled, for example, by setting a threshold: an individual only becomes ‘infected’ once the number of infected neighbours exceeds the specified threshold. This is typically more appropriate than a single contagion for modelling the adoption of behaviours (e.g., Guilbeault et al., 2018).

The threshold value, like the number of agents in the network, or the structure of that network is a ‘parameter’: an attribute of the configuration of the model. The goal of agent based simulations is to gain insight into the patterns that emerge from agent interactions and understand how those patterns depend on the values of the model’s parameters.

For our toy example, let’s now think of the ‘infection’ as something like “anger”. Anger may be socially mediated in as much as others’ angry behaviour may make us angry in turn. So we will interpret the model as capturing the spread of anger via a diffusion process on the network: let’s assume, for example, that at each time step in the model, agents communicate a message and the message content reflects their current state as either not angry (black) or angry (red). The model is set up with an initial number of randomly selected ‘angry’ agents. Our interest then lies with the extent to which anger spreads.

Exploring our simple model reveals that anger will eventually spread to all members of the network over time, regardless of the exact number of initially angry agents, the size of the network, or its structure, if we set the contagion threshold to 1, that is, to a simple contagion.

This is no longer true if we raise the contagion threshold to 2, so that an individual agent becomes ‘angry’ only when 2 of its direct neighbours are themselves ‘angry’. This means that ‘anger’ requires sufficient support among the direct neighbours of an agent: without that support, the contagion will stop in that neighbourhood, and may eventually come to a stop in the network as a whole.

This means also that anything that makes it more likely that multiple neighbours become infected will change the dynamics of spread. This includes increasing the proportion of initially infected agents. But it also includes changes to the structure of the network, for example, increasing the number of neighbours an agent has (i.e., increasing the number of links), or increasing the clustering whereby an agent’s neighbours are themselves neighbours. All of these now matter, and will interact to produce the patterns of spread, making that spread increasingly difficult to predict.

This brings us back to the topic of this blog post: unrestricted text search. How might we think about the effects of adding search? In effect, what search does is that it (dynamically and temporarily) rewires the network. Instead of seeing posts only from those we follow, we (and others!) can now see posts from arbitrary individuals as a function of message content.

So what happens in our toy model if we introduce some limited rewiring in the model as a result of hypothetical ‘search’? On a given time-step there is now a small probability (p=.02) that a randomly selected node receives three additional random links as a result of a ‘search’. All else about the model stays the same. Yet this simple addition dramatically changes the behaviour of the system. The complex contagion (with the familiar threshold = 2) now stands a decent chance of, once again, reaching all agents in the network. Figure 2 below shows the outcome of 1000 runs of the model (each with a different random starting configuration) both with and without this rewiring (‘search’). The x-axis shows the respective proportions of ‘angry’ agents in the population, and the y-axis shows the count of how many simulated networks ended up with that proportion.

Fig. 2 Histogram of 1000 sample runs by condition (‘search’/no ‘search’).

The plot reveals several important features of our model, and with it of complex systems more generally:

the system’s behaviour varies as a result of random factors (e.g., which agents are randomly selected for initial contagion, and which are randomly selected for ‘search’).
as result, there is no single answer as to “what happens” (e.g., when we introduce ‘search’). Rather, there is a space of possible outcomes, some of which are more probable than others.
there are discontinuities in the space of possible outcomes: there are *no* model runs that end in 60% angry. Rather, once a large enough subset is reached, all agents are reached eventually.
a minor change to the model can lead to very different outcomes: no run without ‘search’ saw spread to all agents.
the impact of ‘search’ is not restricted to the agents that are rewired. On those runs where all agents eventually become angry, the vast majority end up in a different state than they would have without that intervention.

These points are echoed in Fig. 3 which provides further insight into the relevant mechanisms. Our stylised ‘search’ adds links to the network. To examine the impact of this, the plots show the correlation between final proportion of ‘angry’ agents and the number of final links and the number of ‘searches’ that took place (right hand), with the top panels (green) showing the results of the 1000 ‘no search’ runs, and the bottom panels (blue) showing the results of the 1000 ‘search’ runs.

We can again see clearly how much variability there is even under exactly the same parameter settings. We can see also that increasing numbers of searches (and with that additional links) increase the chances of spread, but that correlation is loose.

Fig. 3. Correlations between final proportion “angry” (‘infected’) and the number of additional links to the initial 200, and with the number of ‘searches’ in the simulation run.

3. Establishing an evidence base?

So what does this simple model tell us? It is nothing like real communication or the real scale of a network such as the one comprising all Mastodon users, and our ‘search’ and ‘anger’ are nothing like real search or real anger.

The latter are just a label; we could have equally chosen ‘happiness’, or a wholly uninformative label such as ‘gleeb’. The meaning of the labels we chose to describe the model is exhausted in what they actually represent: state changes in a particular social network model. What we really want to know is what would happen *on Mastodon* if we introduced unrestricted text search.

This toy model cannot tell us that. It can still tell us plenty that is relevant, however. All of the general characteristics 1-5 above apply to complex systems more generally. Making a model more realistic will not generally make these features go away.

This means also that there are limits on what any kind of real-world empirical study could tell us. Even if we could run ‘experiments’ on Mastodon (or some other platform) that allowed us to look at the effects of introducing search, an individual ‘run’ of our real world network remains just a single point in a space of possible outcomes —outcomes that could have been different given random variation.

That space of possible outcomes likely contains non-linearities and phase transitions (Sole et al., 1996). So even if we combine our best methods (experiments, simulations, observational studies) we will likely understand only something about the broad directions in which changing a parameter might push the system. It will remain the case that ‘just a little bit more’ could lead to a qualitatively wholly different outcome. And the interactions of multiple parameters will be even more resistant to our understanding.

While this means that we will not likely ever get a definitive answer on what adding search would do to Mastodon, it does shed light on some of the arguments and intuitions in that debate. First and foremost, the idea of restricting search as part of an anti-viral design stance is plausible, in the sense that it is plausible that search and the degree to which things may go viral are connected. Second, examination of even our toy model as a complex system shows that one cannot conclude that because a bit of something had little to no impact, a bit more will continue to be inoccuous: #hashtag search is already search, but that doesn’t mean that adding more won’t radically transform the system. A bit more can become radically different —so friction matters as a factor dampening the rate at which individuals take certain actions, here and elsewhere. Third, ‘minority actions’ can have global, system wide, effects far beyond those directly involved. That may seem mysterious in the absence of consideration of an actual complex system, but our simple example shows that it is not. This means also that “consent” to having one’s own posts included in unrestricted search does not solve all issues: one can’t consent on behalf of others to effects that *they* will incur.

4. Who decides?

The upshot of all of this is that decisions on system design for online communication platforms seem unlikely to occur in a context with an abundance of evidence that clarifies precisely what effects a particular change to the system would have. This makes all the more important considerations of who gets to decide and on what grounds.

It is natural to try and cast these issues in terms of ‘rights’: individual rights of users, rights of those who have invested most in building the platform, and so on. But rights are never limitless, because exercising them typically touches on the rights of others —all the more so when we are dealing with public or collective goods. These raise a host of problems of their own (e.g., Reaume, 1988). And it seems extremely unlikely that those problems will magically disappear or resolve by virtue of decentralisation or federation. So Mastodon —as not just a piece of software, or a development company, but also as a community— might ultimately find itself seeking to develop governance structures to resolve such issues.

References

Guilbeault, D., Becker, J., & Centola, D. (2018). Complex contagions: A decade in review. In Lehmann, S. and Ahn, Y.Y. (eds.) Complex spreading phenomena in social systems: Influence and contagion in real-world social networks, Springer. accessed at: https://arxiv.org/pdf/1710.07606.pdf

Ladyman, J., Lambert, J., & Wiesner, K. (2013). What is a complex system?. European Journal for Philosophy of Science, 3, 33-67. https://link.springer.com/article/10.1007/s13194-012-0056-8

Réaume, D. (1988). Individuals, Groups, and Rights to Public Goods. The University of Toronto Law Journal, 38(1), 1-27. https://www.jstor.org/stable/825760

Solé, R. V., Manrubia Cuevas, S., Luque, B., Delgado, J., & Bascompte, J. (1996). Phase transitions and complex systems: Simple, nonlinear models capture complex systems at the edge of chaos. https://digital.csic.es/bitstream/10261/44294/1/COMPLEXITY-96.pdf

Thompson, C. (2022) Twitter alternative: how Mastodon is designed to be “antiviral”. Medium https://uxdesign.cc/mastodon-is-antiviral-design-42f090ab8d51

Appendix: Instructions for model exploration

A link to the model is here. Clicking it will open a version of the model for running within a browser. All Netlogo models have three tabs (see Figure 1 above): the Interface, an Info Tab, and a Code Tab. The Interface Tab lets the user run the model via buttons and to-be-entered parameter values. The Info Tab contains a description of the model, and the Code Tab shows the computer programme itself.

Pressing the Setup button will initialise the model, Go Once will execute a single time step, and Go will let the model run until behaviour stabilises so that there is no more change, at which point the run ends.

The contagion threshold is set by the number entered in the “threshold” box. Entering 1 into the “search” box enables search, 0 turns it off (alternatively, pressing the purple Search button will execute ‘search’ once with the pre-defined probability, regardless of setting). The topology parameters N and p determine the size and structure of the network, respectively (the graphs above were all produced with a network of N=100 agents, and a rewiring probability p= .19, giving rise to a so-called small world network; see the Info Tab for more detail).

For more thorough exploration, the model can be downloaded together with NetLogo (which is free), and explored using NetLogo’s inbuilt “BehaviorSpace” which allows one to define experiments involving many runs (for instructions on how to use BehaviorSpace).