Bridging to Bluesky: The open social web, consent, and GDPR

February 18, 2024

There’s been intense discussion across the fediverse, GitHub, blogs, and articles about a bridge that would let you use a Mastodon account to follow people on Bluesky, see their posts, reply, like and repost them —and vice versa. It’s an exciting prospect: there is quality content on Bluesky, and it feels in the spirit of an open social web that connects people without restriction by platform (particularly, corporate owned platform). For some, this quickly elevated the bridge to a decisive step in the future of decentralised online social media and even for the ‘future of the internet’ itself.

Prompted partly by the fact that it was first floated as being opt-out there was a wave of backlash that the bridge violated user consent and potentially endangered vulnerable groups (as one can only opt-out of something one actually knows exists). This was countered by arguments claiming that opt-out consent was sufficient (“the fediverse is built on opt-out consent”), was already given by virtue of signing up and publicly posting, or simply wasn’t relevant (“if you want privacy, don’t post on the internet”).

Little discussion involved GDPR – the EU’s General Data Protection Legislation, so this post is my attempt to work through issues raised by the bridge, consent, and GDPR, why it’s tricky and why one should care.

First, why should you care? GDPR applies to all EU citizens, those of the EEA, and the U.K. (even if the data processor isn’t based in the EU). That’s already more people than the population of North America. Additionally, GDPR applies to the processing of personal data of all users by companies based in the EU, regardless of where users live — e.g., all users on a Mastodon server). Even where none of that applies, you might care because law is a tool for regulating social conflict. So thinking about the way legal rules approach certain issues may help clarify wider ethical intuitions.

Second, basic GDPR: For the EU, data protection is a fundamental right that reflects basic values such as privacy, transparency, autonomy, and non-discrimination [11]. GDPR sets out the following fundamentals on processing personal data: 1) processing must be lawful, fair, and transparent to the data subject; 2) is restricted to the legitimate purposes specified explicitly on collection; 3) is collected/processed only as absolutely necessary for those purposes; 4) kept accurate/up to date; 5) stored appropriately; 6) is processed with appropriate security, integrity, and confidentiality and 7) accountability – the data controller must be able to demonstrate GDPR compliance. This site provides a great overview.

At the heart of GDPR is consent which must be “freely given, specific, informed and unambiguous” (whether by statement or “a clear affirmative action”). The EU Data Protection Board and others offer guidance on what constitutes lawful ‘consent’ [e.g., ICO guidelines, EDBP guidelines].

Art. 6 makes clear that processing personal data is only legal given one of the specific circumstances it sets out. Art. 9 contains special provisions for “sensitive personal data”: data revealing, e.g., racial or ethnic origin, political opinions, religious beliefs; health-related data; or data concerning a person’s sex life or sexual orientation. Their processing is prohibited unless specific exceptions apply.

Sensitive personal data emerge frequently in social media posts. Art 9 2e provides a potential exception for these as it allows processing where:

processing relates to personal data which are manifestly made public by the data subject

As always with law, issues of interpretation arise. Legal scholars develop guidelines for interpretation, but it is courts that decide what things mean. Art. 9 2e has two components: the data must actually 1) be “manifestly made public”, and 2) the individual themselves must have made them so [1,2,5,16]. Online social media raise problems, here, because they occupy ‘an interpretive grey zone between traditional conceptions of public and private spaces’ [16] and lawyers will include in their interpretation the goals of the legislation. If that goal is protective, interpretation will tend to be more restrictive and that might include consideration of how data might be automatically acquired or re-linked [16]. “Public” doesn’t simply mean ‘available on the internet’ but rather invites consideration of how easy access to the data actually is [2]. Scholars have suggested “public” necessitates possible access by an indeterminate number of individuals, without significant access barriers [1,5,16]. For example, the Norwegian DPA has raised the possibility that using an LGBTQ online dating app is not making manifestly public data about sexual orientation as an account is required to access the app and it will largely be used only by the LGBTQ community [cited here]. EDPD guidance concerned specifically with social media users lists multiple elements relevant for assessment (pg. 35) stating that a combination of these may need to be considered and case-by-case assessment is needed. Again, ultimately the decision will lie with the courts.

Being public in the requisite sense is still not enough because the individual has to have “manifestly made public” their information. This requires a legally relevant action (including intention) [5]. This could, for example, be missing due to a failure to understand system settings so what is intended to be a post to family and friends, goes to an entire network instead because of default settings [ICO, cited in 2]. Judgments on such actions will involve how an objective external observer would evaluate them [5].

Finally, Art 6 GDPR might then still limit what actions are allowed with these (now manifestly public) data [1,2,5,] (though the intended relationship between Art. 6 and Art 9 is itself controversial).

So, how does that square with Mastodon and Bluesky?

Third, “public posts” and the fediverse:

The place we consent to what happens with our posts (and other personal data) is the privacy policy or other terms of service of our servers. Mine (as many) basically mirrors the Mastodon.social privacy policy. That states:

“Public and unlisted posts are available publicly. When you feature a post on your profile, that is also publicly available information. Your posts are delivered to your followers, in some cases it means they are delivered to different servers and copies are stored there. When you delete posts, this is likewise delivered to your followers. The action of reblogging or favouriting another post is always public.”

Note the mention of ‘public’ and ‘unlisted’, not ‘followers only’ posts —as the latter go to a specific, delimited set of people.

But the policy also says:

“Any of the information we collect from you may be used in the following ways:

“To provide the core functionality of Mastodon. You can only interact with other people's content and post your own content when you are logged in. For example, you may follow other people to view their combined posts in your own personalized home timeline.”

And:

We do not sell, trade, or otherwise transfer to outside parties your personally identifiable information. This does not include trusted third parties who assist us in operating our site, conducting our business, or servicing you, so long as those parties agree to keep this information confidential…

Your public content may be downloaded by other servers in the network. Your public and followers-only posts are delivered to the servers where your followers reside, and direct messages are delivered to the servers of the recipients, in so far as those followers or recipients reside on a different server than this….

When you authorize an application to use your account, depending on the scope of permissions you approve, it may access your public profile information, your following list, your followers, your lists, all your posts, and your favourites. “

Finally, it is worth noting the basic ActivityPub/fediverse design: public posts are shared with other servers only when requested. The ‘local timeline’ chronologically shows every public post on a server. The ‘federated timeline’ shows all public posts from users ‘known to my instance’: i.e., everyone on my instance plus users elsewhere that someone on my instance follows [see here]. It is not a firehose of all public posts ‘on mastodon’, let alone the fediverse at large.

What does that all mean? Are (public and unlisted) posts on my server “manifestly made public” in the sense of Art 9 2e GDPR? I don’t know: probably? But I could certainly make arguments against. My server’s privacy policy sounds like someone could reasonably interpret it to mean that my public posts are “public” to the network, not to anyone, anywhere, for whatever purpose they choose.

Likewise, the way posts are distributed to other parts of the fediverse (or anything using ActivityPub) feels like it could be interpreted as a design that tries to not make it entirely easy for “anybody” to find those posts (see Appendix “added wrinkles” below).

By contrast, things seem quite different on Bluesky.

Fourth, “public posts” and the design of Bluesky.

The Bluesky privacy policy notes:

“Profiles and Posts Are Public. The Bluesky App is a microblogging service for public conversation, so any information you add to your public profile and the information you post on the Bluesky App is public.”

Note also other passages of the privacy policy:

“Business Partners. …We may also share your personal information with business partners with whom we jointly offer products or services.”

This already feels more “public” and that is supported by the actual design of Bluesky itself, as set out in this recent preprint [9], coauthored by Bluesky employees.

Figure 3 from [9] (reproduced here by CC-BY license).

The preprint notes:

“… atproto is currently designed for content that users want to make publicly available. In particular, Bluesky user profiles, posts, follows, and likes are all public. Blocking actions are also currently public” pg. 3

This seems necessary because core processes are intentionally designed to be fulfilled by third parties, such as feed generators or moderation:

“Bluesky allows anybody to run moderation services that make subjective decisions of selecting desirable content or flagging undesirable content, and users can choose which moderation services they want to subscribe to. Moderation services are decoupled from hosting providers, making it easy for users to switch moderation services until they find ones that match their preferences.” pg. 1

For that to work, Bluesky gathers posts etc. into a common index that mediates all communication (see Figure):

“Each user account has one repository, and it contains all of the actions they have ever performed, minus any records they have explicitly deleted. A Personal Data Server (PDS) hosts the user’s repository and makes it publicly available as a web service; we discuss PDSes in more detail in Section 3.2.” pg. 4

“A PDS stores repositories and associated media files, and allows anybody to query the data it hosts via a HTTP API. Moreover, a PDS provides a real-time stream of updates for the repositories it hosts via a WebSocket. Indexers (see Section 3.3) subscribe to this stream in order to find out about new or deleted records (posts, likes, follows, etc.) with low latency. This architecture is illustrated in Figure 3.” pg 4

“interaction between users goes via the indexing infrastructure in any case.”

“At the time of writing, most of Bluesky’s indexing infrastructure is operated by Bluesky Social PBC (indicated by a shaded area in Figure 3). However, the company does not have any privileged access: since repositories are public, anybody can crawl and index them using the same protocols as our systems use.” pg 5. [emphasis mine]

The preprint also suggests that it is very much the intention of Bluesky that this infrastructure be used by third parties for purposes unrelated to communication:

“…as Bluesky grows, there are likely to be multiple professionally-run indexers for various purposes. For example, a company that performs sentiment analysis on social media activity about brands could easily create a whole-network index that provides insights to their clients. Web search engines can incorporate Bluesky activity into their indexes,…” pg. 5 [emphasis mine]

It seems clear to me from all of that that Bluesky not only considers public posts to be ‘manifestly public’ —and hence usable by anyone, anywhere, for any purpose— Bluesky is intentionally designed to make this as easy as possible.

Finally, it might also be worth mentioning this from Bluesky’s privacy policy:

“Do Not Track.” Do Not Track (“DNT”) is a privacy preference that users can set in certain web browsers. Please note that we currently do not respond to or honor DNT signals or similar mechanisms transmitted by web browsers, as there is no consistent industry standard for compliance.”

A recent court judgment found a similar a policy by LinkedIn to conflict with German law. LinkedIn, too, had appealed to the ‘lack of an industry standard’, but the court maintained that this did not stand in the way of Art 21. 5 GDPR which allows “automated means using technical specifications” in order to exercise the right to object to the processing of personal data (whether a standard actually existed or not).

So….what does it all mean?

First, “followers only” posts (and of course DM) have no explicit consent via the Mastodon stock privacy policy to be treated as “public” and they go to a specific, limited set of recipients, both of which speak against “manifestly made public”. How confident you are that my (Mastodon) public posts are manifestly public will determine your views on the GDPR implications of bridging them to Bluesky and what, if any, kind of further consent is required (including for Bluesky’s subsequent processing, remembering the possibility of further Art. 6 constraints).

There can then still be problems by other standards, such as ethical or moral ones [6]. It’s easy to find evidence of people feeling uses of their online posts are wrong, even where legal, or feeling that current law doesn’t go far enough. There are also attempts to develop new notions of consent that would deserve reviews of their own [11].

Crucially, perceptions on data protection, privacy and consent will vary across individuals, but also across groups (with marginalisation [4], by age [15], by gender [14], education [3,6], or country [12]), driven (plausibly) by how exactly individuals perceive themselves affected. That will limit the scope of any single person’s intuition! So maybe the best reason for learning about GDPR is that it reflects the sentiments of a rather sizeable number of people whose intuitions might be very different from yours. There are structural reasons why we will systematically mis-estimate the popularity of certain beliefs [7].

That matters, because even if ethical considerations don’t worry you, strategic ones should: any online social media project ultimately rests on people wanting to use it. People don’t like feeling coerced or deceived [8]. While I know nothing about developing software or protocols for online communities, I do think a lot about discourse participation and I think about trust. Consent is tied up with both; and for both, (seeming) short term gains can easily turn into massive longterm losses. ‘Developing tools’ (like the bridge) involves satisfying not just technical, but also legal, social and financial constraints — I just can’t see that being any other way and that has implications for what it can mean ‘to grow’ the open social web.

The value of debate. Finally, a comment on the debate itself. Debate can be intensely frustrating when it repeats, feels stuck, or we feel others just don’t get it. This can also make the mere fact of debate seem a sign of dysfunction (“there they are over on Mastodon, tearing themselves apart -AGAIN- over whether or not to bridge.”). But debate is the only way to understand all relevant aspects of a problem when diverse knowledge, interests, personal situations, cultural contexts and perspectives are involved. I couldn’t have written this post two weeks ago, without seeing and exchanging arguments, including with people who disagreed.

And my understanding is still really limited. I should already know about consent and GDPR through my science; I also once qualified as a lawyer (though I didn’t like it and wasn’t particularly good) so should be able to read up on and understand legal problems. But my few days reading about GDPR and active legal practice in an area are two very different things (and I certainly wouldn’t want to give legal advice). Details really matter here and that includes technical computing details. So it’s likely I still got things wrong or overlooked important aspects. That does make me very confident about one thing, though: these issues are complicated and we really do need debate to sort them out.

-——

Appendix: Added Wrinkles.

So as not to overburden an already overly long text, some further details of potential relevance here.

First, in addition to the privacy policy at initial sign up, my (fantastic!) current server, like many, allows me to check a tick box that I:

“Opt-out of search engine indexing” (“Affects your public profile and post pages”)
“opt-in to a full text search that includes posts (“Note that this is an opt-in feature. Your posts will only be included if you have explicitly enabled this”).

I see this only once I’ve set up my account, but I could well see a lawyer arguing this is both relevant in shaping people’s expectations (as consent can also be withdrawn under GDPR) with respect to people’s interpretations of what they took ‘public’ to mean and relevant to a judgment about the extent to which my posts are, in fact, accessible to anybody without an account. Given the way Bluesky is designed, it’s also not clear to me how the bridge could honour those settings other than by not bridging my content at all.

Second, on the above mentioned technical features that might count as Mastodon limiting how accessible personal data are in practice (with potential consequences for whether “anybody” can access them), Manton Reece notes in his blog post: “We can already see some signs of Mastodon putting up slight roadblocks to open web access. For example, permalink posts on Mastodon require JavaScript — you can’t view HTML source and get the post details, making it a little more difficult to build tools that understand Mastodon pages. At the API level, some servers also require signed ActivityPub requests, making it a little more difficult to look up user profiles.” But note also that anyone can currently set up an rss feed for an account’s public and unlisted posts.

Finally, a recent joint statement by 12 (non EU) data protection authorities emphasised that publicly accessible personal information online is still subject to data protection laws in most jurisdictions and that social media companies have “obligations with respect to third-party scraping from their sites” and exhorts social media companies to consider steps against unlawful scraping. Bluesky’s design feels to me like a step in the opposite direction.

References

[1] Altobelli, C., Johnson, E., Forgó, N., & Napieralski, A. (2021). To Scrape or Not to Scrape? The Lawfulness of Social Media Crawling under the GDPR. Deep Diving into Data Protection; Herveg, J., Ed.; Larcier: Namur, Belgium.

[2] Dove, E. S., & Chen, J. (2021). What does it mean for a data subject to make their personal data ‘manifestly public’? An analysis of GDPR Article 9 (2)(e). International Data Privacy Law, 11(2), 107-124.

[3] Epstein, D., & Quinn, K. (2020). Markers of online privacy marginalization: Empirical examination of socioeconomic disparities in social media privacy attitudes, literacy, and behavior. Social Media+ Society, 6(2), 2056305120916853.

[4] Ganesh, M. I., Deutch, J., & Schulte, J. (2016). Privacy, anonymity, visibility: dilemmas in tech use by marginalised communities. Brighton: IDS.

[5] Gola, P., & Heckmann, D. (2022). Datenschutz-Grundverordnung VO (EU) 2016/679. Bundesdatenschutzgesetz. Kommentar, 3. Rn. 32-33.

[6] Hanlon, A., & Jones, K. (2023). Ethical concerns about social media privacy policies: do users have the ability to comprehend their consent actions?. Journal of Strategic Marketing, 1-18.

[7] Jackson, M. O. (2019). The friendship paradox and systematic biases in perceptions and social norms. Journal of political economy, 127(2), 777-818. accessed version on arXiv

[8] Khan, M. I., Loh, J. M., Hossain, A., & Talukder, M. J. H. (2023). Cynicism as strength: Privacy cynicism, satisfaction and trust among social media users. Computers in Human Behavior, 142, 107638.

[9] Kleppmann, M., Frazee, P., Gold, J., Graber, J., Holmgren, D., Ivy, D., ... & Volpert, J. (2024). Bluesky and the AT Protocol: Usable Decentralized Social Media. arXiv preprint arXiv:2402.03239.

[10] Kühling, J. & Buchner, B. (2024) DS-GVO BDSG. 4. Ed. Beck. Rn. 77-82

[11] Politou, E., Alepis, E., & Patsakis, C. (2018). Forgetting personal data and revoking consent under the GDPR: Challenges and proposed solutions. Journal of cybersecurity, 4(1), tyy001.

[12] Rughiniș, R., Rughiniș, C., Vulpe, S. N., & Rosner, D. (2021). From social netizens to data citizens: variations of GDPR awareness in 28 European countries. Computer Law & Security Review, 42, 105585. retrieved version on arXiv

[13] Spindler, G., & Schuster, F. (2019). Recht der elektronischen Medien. beck-online. Rn. 14.

[14] Tifferet, S. (2019). Gender differences in privacy tendencies on social network sites: A meta-analysis. Computers in Human Behavior, 93, 1-12.

[15] Van den Broeck, E., Poels, K., & Walrave, M. (2015). Older and wiser? Facebook use, privacy concern, and privacy protection in the life stages of emerging, young, and middle adulthood. Social Media+ Society, 1(2), 2056305115616149.

[16] Wolff, H.A., Brink, S. & von Ungern-Sternberg (2023) Beck`scher Online-Kommentar Datenschutzrecht. Rn. 74-79.