Ants, Sharks and Flies: how most Data Products harm us, even if they are bullshit

13 ottobre 2020

In the bubble, a bunch of Tech Bros burn their brains to produce useless or dangerous data products designed on ridiculous premises, victims of what is usually called “dataism”. Dataism has many faces, being a complex ideology often devolving into superstition or religious fanaticism, but we can summarize it as the belief that data, its collection and circulation produce unbiased knowledge about an objective truth. The Human, the scientist, the analyst, the theorist become useless: machine can replace the production of knowledge and become the ultimate source of truth.

Researchers are stumbling again into old forms of pseudoscience , falling for the same mistakes made by scientists in the 18th and 19th century. Economic incentives towards novelty at all costs create monsters that appear grotesque to anybody living outside that specific bubble. The total absence of strategy and narrative about how to build a collective future that doesn't look like the present keeps the best mind of our generation busy trying to spot friendly patterns on their screens.

On the other side of the fence there are countless journalists, activists, politicians, researchers and in general the public sphere. These people are trying to stop this crazy suicidal machine that turns data into compelling stories and stories into profit.

Clearly, at the global level, the forces driving this phenomenon are quite clear: the immense political and economic power of the main offenders (Facebook, Google, Amazon, Netflix, ...) protect a system in which the proliferation of bullshit for profit is possible. Systemic feedback rewards the production of the useless. These big trees create the shade in which an ecosystem of services and products can grow undisturbed by regulators, protestors and cultural resistance. Rivers of ink have been spent on this topic and I don't have much to add right now.

But what about the daily experience of people involved in this system? What about your local tech hub, your meetups, the companies you find on StackOverflow Jobs and apply to? How does this surreal process looks like on the ground and from the inside? I think that as a Machine Learning Engineer I can give you some perspective.

The three types of Tech Workers in the Data World

The disciplines and roles in developing data products are always a complicated topic and there's no agreement on what these names mean: data scientists, data analyst, data engineer, machine learning engineer, MLOps and so on and so forth. For the sake of our argument though, this division is not that relevant. Your contribution to bullshit as a technical person is not necessarily determined by your skillset, but by how you’re positioned in the system or organization producing such bullshit.

That's why I believe it's fundamental to identify categories that translate the macro system described before into realities and individual experiences at the micro level. You might more easily relate to them, understand in which one you fit in and eventually decide to change something in your life or career.

I believe there are two relevant dimensions to this analysis: the soundness of the thinking behind the product and the awareness of how sound or unsound it is. Therefore we identify three categories: the product is Sound and you’re Aware (SA), the product is Bullshit and you’re Aware (BA), the product is Bullshit and you’re Unaware.

In this categorization I want to include mostly individual workers but, more often than not, they operate in companies, departments, teams and organizations that are similarly positioned and therefore it’s not unlikely that the behavior and problems of a type BA worker are shared by its own department or company. We could therefore speak about BA workers in BA companies, where the network of individuals expresses similar traits to the individual and vice versa.

Where do you stand?

Let's go more in detail:

Type SA

These workers know that their product is Sound and they are Aware of it. They know because they have strong, direct, unambiguous evidence (or if they are lucky, formal definitions) of the impact they have. This doesn't mean their product has a positive impact: a good part of these people produce software that is extremely harmful to society. Example: data-driven UI/UX designers at Facebook, that use data to maximize the engagement on the platform and to turn it into clicks. Their metric (engagement) can be tracked through experimental methodologies (like A/B testing) and it's mostly dependent by elements that are defined inside the system (i.e. the engagement doesn't need proxies, it's something that Facebook itself defines). Producing clicks is their mission and produced clicks is what they see.

Another example: the company I work for, Teraki. Luckily, they are not evil. They do data compression using Machine Learning on embedded devices (cars, drones, delivery robots and so on). Our metric is the size of the data that gets transferred : if what we put on the network is consistently smaller than traditional algorithms, it's working. Otherwise it's not working. There are no proxies, no ambiguities: the formal definition of the size of a binary sequence can be specified on paper in a few minutes and it's very easy to implement a reliable tracking system for this metric.

These workers, while not necessarily capable of framing their activities in a broader social and economic context, still find meaningful engagement in the problem they are trying to solve. They might narrow down the perspective in order to focus exclusively on the metrics that are a reliable testimony of their impact, but nonetheless be able to root their actions in a shared reality.

Type BA

Their product is Bullshit and they are Aware of it. This is the most common category in terms of sheer numbers, especially in middle or small sized companies. Though in some fields, they are the majority in corporate environments too. Type BA workers are the most interesting to observe because their work is extremely peculiar and fundamental to building the castle of delusions that we collectively buy into and call “big data” or “data science”. Small bricks made of tiny lies, half-truths and omitted details are built and placed everyday. These form thick walls and shiny towers made of investor cash flow, designed to impress the external observer and to hide from the gaze an uncomfortable evidence: knowledge is in another castle.

Type BAs know that what they are doing is pointless, because often they have enough background in Statistics, Epistemology or Scientific Methodology. These tools are often sufficient to trigger some reflection over the meaning of the button smashing; they know it's pointless but they also know it's profitable and therefore they keep doing it. The whole endeavour, though, is critically dependent on one thing alone: the person that pays for that work needs to hold a strong belief in what is being delivered, be it the manager, the customer or the investor. When unable to articulate to themselves how the economic and social incentives play into the epistemic inconsistencies, they dismiss their own work as an anomaly in an otherwise well-functioning system.

Their work resembles more the one of a graphic designer: it's not important if your logo effectively drives up sales, it's important that everybody is convinced that it's a good logo and therefore it will eventually drive up sales. A minute,yet critical, difference. The same is true for a model: it's not important if you can show rigorous, unambiguous metrics picked from established research in your domain, what’s important is that you can show metrics that can convince people. These people you are paid to convince might be your managers and you have to explain your work, lead them where you want, connect what you did with increased revenue and justify your role in the company. Or perhaps the managers are complicit: they know the product doesn’t deliver the promised impact. Then the people you need to convince are your customers. It doesn't make much of a difference: half of your day will be spent doing data science, the other half massaging the numbers, inventing stories and narratives, developing fancy presentations with plenty of graphs and tables.

It's easy to find in jobs descriptions for these positions lines like: “you need to be able to explain and narrate your work”. Sometimes they call it “data storytelling”. While explaining a complex data analysis is a fundamental skill for any serious scientist, most companies are actually looking for people capable and willing to participate in this bullshit-production process. This is equally built on effective production of insights and on systems of smoke and mirrors. Complacency is not enough to be qualified for such a job: the candidate needs a technical background as solid as any other and, on top, they should be able to use it in conjunction with interpersonal and communication skills to concoct the perfect narrative, capable of going through the defense mechanisms of the target receiver with a masterful use of statistics, metrics and other kinds of lies.

Clearly you cannot say the quiet part out loud before you start the job; it's whispered months later, in meeting rooms, when you are fully included in the company and have a stake in its success. Bullshit is your daily life already and only then it can be acknowledged. Some people decide to stay because they have to pay the bills, some others buy enthusiastically into it and embrace the mindset portraying the management or the customers as ignorant to deceive. They feel that most of the IT industry produces bullshit anyway and they tell themselves: “why should I move somewhere else?” Relevant and meaningful work is a privilege we cannot aspire to.

Type BU

They are the most dangerous ones. Their product is Bullshit and they are Unaware. These are the people that really believe they are saving the world despite everybody constantly yelling to stop what they're doing. They think:“They are clearly not talking about me, just Facebook and Google are greedy, I'm not like them”. Critical self-reflection is not an option: their world, their beliefs, their religion do not allow them to distance themselves from the product of their work and observe it from the perspective of the critics or the victims. They think Technology (with the capital T) is intrinsically liberating, good, beneficial and they are their loyal servants. Questioning this truth is beyond their possibilities because everything around them is reinforcing this self-deception: they are like flies flying repeatedly into a window. They repeat the same mistakes over and over, the same of the people that came before them, because they lack the means to understand and articulate what a glass window is and why their continuous movement is not bringing them anywhere. They believe their nonsense is an attitude of progress, an attitude of being on the bleeding edge, necessary to divine what the market really wants. The very rare success of one of their peers justifies in hindsight any waste of resources. The market becomes an optimization algorithm towards a global optimum and this requires exploring seemingly irrelevant paths. This magical thinking is rooted in the capitalist realism that dominates our times. It somehow connects the systemic sacrifice of mind and money to produce a few champions, capable of lifting themselves above the masses, and save us all through the achievement of the most important goal for humanity: profitability, the true judge that divides the worthy from the unworthy. Crazy is the new smart and we are still waiting for the supposed shared benefits.

This spirit, this social bubble capable of nurturing generations of worshippers of shallow criticism and enemies of radical criticism, of creators unable to understand their position in the world, attracts and unleashes an insane amount of energy. This comes in the form of skilled engineers, fervent entrepreneurs, huge investments from the Digital Capital and sometimes even big revenues, when the system becomes so huge that a vast majority of the public starts buying into the delusion. This world is also the source of the most ridiculous and surreal products that we often make fun of. Useless, toxic, dangerous machine learning or data-driven applications and research are no different; at best the issues are harder to spot for the layperson.

Where to go from here

Regardless of the type of worker you are, you're still involved in a very dysfunctional system. Tech has failed to meet its promises at large and while it produced some benefits for everybody and many profit for the elites, it did it at prohibitive costs in terms of human, social, economical and environmental resources.

The western Tech Industry is aggravating many structural challenges that, in face of an approaching economic and social collapse cannot go unsolved. Keeping the best minds of our generation busy with the creation of useless gimmicks or nefarious contraptions to maximize the number of impressions on ads is already a crime in such critical times.

To defuse the dangers brought by this bubble, we have to challenge the ideological and pseudo-religious premises of the whole endeavour:

Technology cannot solve everything:some problems are better solved through political and social effort.
Technology is not intrinsically good or useful. Even if the market seems to say so.
A system can be big, pervasive, naturalized and still be a superstition. Astrology and divination led the decisions of the most important people on Earth for millenia. The Enlightenment just skewed our preference towards superstitions that resemble science: our elites are not smarter than a Chinese Emperor or a Roman General. Most “Big Data” is just well-designed astrology, incapable of producing knowledge.
Slow, incremental change cannot tackle every challenge: sometimes change must be radical. A self-regulating system will always fail as soon as some new element is not registered as a feedback. Today we have plenty of feedback signals that go unregistered by the Tech Industry: ecological costs, social unrest, drainage of mental energy and so on.

On a personal level, the best thing you can do to avoid being part of the problem is to be critical about the problems you tackle and the solutions you propose. As a Tech Worker you're in a special position nowadays: you're among the people that have the tools to build a new world, but the old world deems you to valuable for its own reproduction and feeds you an idea of future that looks exactly like the present, but with flying cars and with some other useless gadget.

Being critical is not just about thinking very hard about what you're doing: it means creating a richer context for your decisions, reading about politics, philosophy, sociology, ecology and understanding where you sit in the world and what impact will your work make. Ask yourself questions: what is my working changing in the world? Can I measure it? Am I just measuring proxies of proxies? Who benefits from these changes? Why? Am I really convinced of what I'm doing? Am I just trying to find a problem for my solution? Is the problem I want to solve even relevant outside my bubble? How small is my bubble? Why is the market or the investors rewarding me even if I don't seem to do anything meaningful? In the end, should I just give up and go do something else? Be honest first with yourself, then with others.

Technology, including Machine Learning and data-driven approaches can be useful. It must be applied or designed in a useful way because there are great challenges in front of us. This is not something a single person can solve and it's not something you can solve inside the startup world. Be critical, be skeptical, reject everything around you if you believe it's tainted by this new religion of data and code. The next generations are asking you to. History is asking you to. In the face of a dying ecosystem, rising authoritarian governments and the disintegration of the social fabric, technologists cannot waste time. There is no alternative.