agonzalezsosto

The automatization of specialist jobs

I find the parallels between the mechanically-focused industrial revolution of the 18th century and the data-focused industrial revolution of our present times to be very interesting, thought provoking, and somewhat terrifying.

Living through a technological revolution has made me develop empathy for the luddites. Often we look at the past as a series of events that happened to some people we're not very emotionally invested in. We know what happens at the end of their story, and we know their endpoints (somehow) lead to where we are now, but we often forget the obvious fact that they didn't know what was going to happen as it was happening to them.

A notable parallel I feel between these two industrial revolutions is the fear that new innovations bring to the working classes. I'm personally terrified by the idea of current job obsolescence. But at the same time, it makes me hopeful. It's a strange combination.

How will I provide for myself and look after people I care about in a future where there are no jobs my present mind can imagine? And furthermore, who will take the helm of the new world created by this revolution? Will it be ill-intentioned people? Will it be well-meaning but incompetent people? Or will it be the “system” itself that will take lead of the new world?

Despite the huge anxiety and fear that our possible futures bring, it also brings interesting questions. Not all changes are negative. Also, some changes sometimes have the opposite effect of what would've been expected.

I think about this specifically in the context of music. If algorithms can write believable pieces of text, then they can write believable pieces of music.

A very naive part of me is hopeful that once we realize algorithms can write “better” songs than anyone else, and that algorithms can mix “better” than anyone else, we'll come to the conclusion that it's ultimately meaningless to try to be “good” at music.

When technical achievements are easily manufactured by computers, individuality will have a different meaning.

Instead of feeling defeated by computers who are better at our jobs, we might find ourselves liberated from the meaninglessly mechanical routine of creation for a commercial marketplace, be it of art or other products.

We might remember that it is our weird and personal biases and views and inherent incoherencies and personal fears and experiences that really make interesting works of art.

We might lift the veil that technical proficiency has been providing for too long. It will no longer be special to be talented, but it'll be special to be honest.

Hit-composing algorithms might “break” the music business, but then again, maybe that's the best thing that could happen to creativity.

But alas, it is not that simple. And I did provide a disclaimer – I am being naive. And the old question comes back: how will I make money if a robot is replacing me?


The algorithm

It's interesting how sensitive language is.

In the beginning of Gillespie's essay, he refers to Raymond Williams' book, “Keywords”. Williams explains how groups of people often find themselves talking about different things while using the same term.

Gillespie uses this reference as a way of pointing out that what is implied by the usage of the term 'algorithm' is dependent on the context and those involved in its usage.

I think this is interesting, because it points out how obfuscated conversations can become when discussing abstract concepts. Communication breaks down when there is no common ground of agreement, despite having the impression of one. It is confusing, and can even turn people hostile.

It's interesting how the meaning one derives from a word functions as a form of mirror – what someone understands by a word reveals a lot about them. But what is even more interesting to me, is how easy it is to shift from one usage to another without being consciously aware of it.

I wonder if this shift in understanding is simply an empathetic mechanism in order to acknowledge the fact that language is somewhat fluid.

Personally, when I think of the world 'algorithm', I'm thinking very technically. I'm thinking of lines of code, of instructions... things like that. I try to understand how a given algorithm works and I try to go from there.

But when I talk to my mother, the “algorithm” might turn into something more abstract. My descriptions are more vague, and probably verge on sounding like “it's the thing computers do”.

To what extent do we shift definitions of words to our convenience?

It seems harmless to simplify a concept when talking to my mother – not because she's not capable to understand a technical term, because she is, it just seems like a lot of work for a brief point in a conversation.

However, what she understands by that word is not what the word meant at one point. Wouldn't it be unreasonable for me to expect her to understand the original meaning when we implicitly agreed its definition also included a simplified version of the concept?

Eventually, it will become a feedback loop.

A multi-leveled feedback loop with no point of agreement.

And we're somehow supposed to understand each other.


Writing from the perspective of an algorithm

MERGE SORT

I am given a list of numbers and my purpose is to organize it. It is expected that I will arrange numbers from smaller to larger, their size determined by their position in the number line. The number line is a convention agreed upon by the human race at the time of my existence.

I am fed the following list:

583712946

I take the list of numbers and divide them into their smallest possible grouping unit. In this case, the smallest possible grouping unit is a single digit. After dividing the list of numbers, I am left with the following individual units:

5 8 3 7 1 2 9 4 6

My main mechanism for sorting is called the merge. When I merge, I re-group numbers that have been separated. In this case I will turn single digits into small lists. However, I will not only merge, but I will also perform an evaluation to determine which number is lower in the number line, and I will place it first.

My new grouping looks as follows:

58 37 12 49 6

My main mechanism for sorting is called the merge. When I merge, I re-group numbers that have been separated. In this case I will turn small lists into larger lists. However, I will not only merge, but I will also perform an evaluation to determine which numbers are lower in the number line, and I will place them first.

My new grouping looks as follows:

3578 1249 6

My main mechanism for sorting is called the merge. When I merge, I re-group numbers that have been separated. In this case I will turn small lists into larger lists. However, I will not only merge, but I will also perform an evaluation to determine which numbers are lower in the number line, and I will place them first.

My new grouping looks as follows:

3578 12469

My main mechanism for sorting is called the merge. When I merge, I re-group numbers that have been separated. In this case I will turn small lists into larger lists. However, I will not only merge, but I will also perform an evaluation to determine which numbers are lower in the number line, and I will place them first.

My new grouping looks as follows:

123456789

I am done.

Choreographies with Software

For obvious reasons, when encountering the concept of choreography, I think of dance.

I think that the choice of this word is interesting because of the implications that arise when juxtaposing the art of dance with the usage of computers.

Interactions between computers and humans are often seen as rigid encounters, utilitarian at their core. To see these processes described with the implied elegance and beauty of dance is something that opens up a lot of opportunities in how we conceive these interactions in the first place.

This is important, because by changing how we conceive interactions, we can find new outcomes. We can also find new routes that lead to familiar outcomes, which in turn reveal new ways of dealing with problems of expression.

However, is this word choice simply a rhetorical mechanism employed by a writer? Is this more than someone overanalyzing something as straightforward as interacting with a computer (or any tool)?

I don't know, and honestly, I don't have an answer yet.

As interesting and tempting as it may be to think of the interaction between a computer and a human as a deep experience, can we really assert that it is without the risk of relying on anecdotal experiences and heavily opinionated perspectives?

What quantitative mechanisms can we use to determine if computers are inherently different tools from any others we've had before? Are quantitative methods appropriate for measuring this in the first place?

I don't think that anecdotal experiences and heavily opinionated perspectives are bad, negative, or in-constructive, by the way. I'm just wondering how much credibility they have when we consider how widespread computers are. Anecdotal experiences could just reflect the experiences of a select few, so we can't rely on them to understand a larger phenomenon.

I often fear that we think our technologies are exceptional when maybe they're not, not necessarily, and not entirely. I don't mean to downplay the large effect of modern technology – it certainly has had a huge effect in how modern life is lived. It is ubiquitous and pervasive. But sometimes I can't help myself but perceive a certain degree of technological self-involvement when reading and thinking about these issues – which is not to say that I want to disqualify the notion of discussing them in the first place, because I don't.


Choreographies are usually dictated by a choreographer – a composer that is removed from the real-time interaction between the performers.

I think that this is also the case between software and users. I am a musician, and I often use a piece of software called Ableton Live. The engineers for this program, people I've never met, have decided what features they think are important for a musician/sound-maker to have.

It seems to me that the last sentence of the previous paragraph generates a lot of feelings in the free software/open source software communities, especially considering the readings “A fish can't judge the water” and “Beyond Photoshop with Code”. Something along the lines of “Why should Ableton(or whatever other company) get to decide what you think is important to do with your work!?!”

I think that's a rather infantile and somewhat impertinent response to being presented an array of tools. You have the choice of using them as you please. You have the choice to create what you want with it. And ultimately, you have the choice of not using it when you don't want to, or when it doesn't fit what you need for a project.

I often balance an array of audio tools depending on my needs. Sometimes Ableton Live is useless at things I want to do, so I use Logic Pro. Sometimes Logic Pro is useless at things I want to do, so I use Pro Tools. It's a balancing act. But what if none of these tools match what I want?

That's, to me, where the amazing world of programming comes in. I have no use for reinventing the wheel when other tools do an amazing job of what I need. But when my life and my necessities are so specific, I want to be able to develop the tools that can adjust to my specific setting and desire.

I don't really feel limited by other tools. I do feel that the tools have limits, but I don't feel limited by them. The limits of the tools are a place where I'm artistically very interested in, and where I feel somewhat comfortable. There's something oddly comforting about being at the edge of where everything about a system might go totally wrong and break down.

What I find appealing about computational arts is not the “liberation” from “consumer tools” that “control” the way I think. What I find appealing is the language of computational thought – I believe that it's aesthetic effect and flexibility is profound. I'm not looking to replace my old tools. I'm looking to express something I can't with them, so I'm looking in a new place.


I think that every single tool that can be used has an implied bias, and to a certain extent, you have to agree with its bias to use it.

A hammer has the pre-conceived notion that concentrating a lot of force in a small space with relatively little effort is a good thing, that it's preferable and ideal. A guitar has the pre-conceived notion that playing pitches in the ways that its frets and string arrangement allow is desirable. Computers have the pre-conceived notion that achieving certain results is desirable. However, with computers (if we can program them), we have the liberty to decide what those notions are.


Software as culture

I do think that, despite what I wrote in the previous section, there is a lot to be said about the cultural effects that certain technologies can have.

In music, an obvious example to me is the creation of the sequencer. The sequencer allows musicians to write musical phrases without having to play them. It's perfectly feasible to write musical ideas that are impossible to perform by humans, but are easily reproducible by a sequencer. A lot of styles of electronic music have been born out of this.

To what extent is the result of those pieces of music a collaboration between not only the composers involved in a track, but also between the instrument makers? Are instrument makers part of a musical composition?

I believe that to an extent they are. This relates to the notion of limits I was discussing previously. I wouldn't write the music I'm writing if I wasn't using a particular set of tools. The tools I'm using are definitely going to color the end result.

But is this bad? I don't think so. Not necessarilly.

I like working in musical ensembles because I get to react how other people think. No one thinks exactly like me, so working with other people forces me to encounter ways of thought that make me react.

I believe that by working with software, you're often led into a position where you're in a form of ensemble with other people. And different combinations of people end up giving very different results.

However, this raises the question: if a given company's product is being used by millions of people, are they directly responsible for all of the music their users create? Does this result in the homogenization of music?

I believe this is a very deep topic, one I can't really do justice to right now. It involves considerations on the nature of tools and culture that I need to do additional research on.


A brief note on Borges

I brought in two books from Latin American authors to class – “El Aleph”, by Jorge Luis Borges, and “Rayuela”, by Julio Cortázar. I ended up just presenting Borges, as it felt more relevant, but I feel Cortázar's work also has a degree of computational thought behind it.

To a degree, I chose these books because I have very nostalgic tendencies and they remind me, despite being from a completely different country from me, of my general cultural area. They're Latin American, so am I, so I feel a degree of connection to them.

However, going beyond my sentimentality, I think that Borges is a very interesting writer. I feel like he had the capacity to very elegantly encapsulate complicated notions and present them in an easily accesible way.

This is something I strive to as an artist, and I find it somewhat amazing that he was doing it without access to technology that would make these points and concepts self-evident. I wonder what that means for artists of this generation.

An interesting thing about Borges is that he has already inspired computational arts projects in the past – namely, the website https://libraryofbabel.info/, which is inspired by his short story of the same name.

In this website, users can look up strings of text and find the book in which they were randomly generated in.

What if looping could be more interactive than constant overdubbing?

Don’t get me wrong – I think this way of looping is amazing, and looping pedals have constantly been part of my music creation. They allow the exploration of rhythm, harmony, and melody in a way that is very stimulating to me. But – what if loops were more dynamic than that? What if loops could somehow change, mutate, and be controlled in a new way?

This basic desire drove me to investigate new ways of approaching looping. As part of this investigation, I developed a series of systems in Max that would allow me to control loops in different ways, which resulted in a series of systems that hopefully could come in handy in other people’s music creation.

Lately, I’ve been dusting some of those patches off and trying to find ways to improve on them. I wanted to feel the patches were in a stable state before I shared them, and I feel that some of them have reached that point.

As of today, I have made two of these loopers public – lento and pixel. I want to write about the composition of pixel, as I feel it’s a fairly interesting. It makes use of a lot of different technologies in Max 8, and it makes me happy to see how it ties them all together.

Pixel makes extensive use of MC, a feature that was introduced in Max 8. It is a fascinating addition to Max, and it allows multichannel patching using single patch cords. The ramifications of this simple idea are rather huge, and this short definition doesn't really do justice to the magnitude of this development. I hope that the examples I give as I examine the pixel looping system will demonstrate some of the applications and implications of this new technology.

The M4L device can be downloaded from my GitHub (https://github.com/agonzalezsosto/m4LLooping) , so if you want to download it and check it out, please feel free to do so.

Interface Overview

Main Interface

Pixel takes an incoming audio signal, splits it into 8 different spectral bands, and records each band onto a unique buffer. A buffer is a space in computer memory where we can store audio loops, and each buffer can have its own size, meaning that you can have a spectral band that is looping on a shorter buffer than the rest. This means that you can dynamically re-size the playback length of spectral loops, and create rhythmic pulsations between the different spectral components of a sound.

In general, one of my key priorities with interface design is for the user's experience to be as simple as possible, while still retaining flexibility to turn the device into their own. It was quite a challenge to achieve this with Pixel, as there’s a lot of parameters to control, but I believe that the final version strikes a good balance between minimal controls and required parameter accessibility.

Some controls affect each band separately, and others affect all of the bands jointly. Each band has an independent amplitude and length control. The length control can be thought of as a percentage – the dial above the fader controls the length percentage of the loop from its maximum length. If the dial is fully turned to the right, the loop will play to completion, whereas the opposite is true if the dial is turned in the other direction. The maximum length can be set from the “Loop Length” control.

The rest of the controls affect all of the bands together. All bands have the same feedback, they’re all played in the same direction, they all have the same envelope, and they all play at the same rate. I believe that having some controls that affect all of the loops at once lets the loop retain some relation to itself. One of the goals is to achieve independence whilst maintaining some degree of unity, and having some controls affect all loops and some controls affect each loop separately is a way of achieving that goal.

Let’s look inside the patch-

Patch Overview

Inside the patch

Pixel is a relatively simple looking patch. This is, in large part, due to the usage of MC and the compartmentalization of code.

MC allowed me to simplify a lot of code I was repeating over and over again in my older versions of the system. Having this tool helps you think about the nature of your signal processing, and it made me ask myself a lot of times: how can I optimize this signal path?

Of course, I also just wanted my patch to look simple, so I did as much as I could to achieve that. Let’s look at the patch section by section to see how I accomplished that goal – let’s start by going into the input subpatch.

Input Subpatch

Input Subpatch

The input subpatch takes the audio input from Ableton Live and does two things with it.

Firstly, it takes the stereo signal that’s coming from the plugin~ object through two separate patch cables and packs it into a single MC signal. It then sends it somewhere else using the mc.send~ object. This signal be used again near the output stage when mixing between the wet/dry signal. Using mc.pack~ and mc.send~ just simplifies the patch – by doing so, I don’t have to use to separate send~ objects for each input channel.

Secondly, it takes the input signal and sends it into a pfft~ object called band-sep. band-sep has 8 outputs, which are then packed into an 8-channel MC signal. As the name suggests, band-sep essentially just separates the incoming audio into multiple bands.

Before we dive any deeper and explore how this works – let's think about what it means to separate a sound into different spectral bands.

Most signals that people work with in musical and sonic contexts have a relatively complex harmonic buildup. They are sounds that are built up from components in different spectral regions – from the low end, all the way to the high end. The spectral constitution of a sound gives it its own unique timbre. An instrument like a piano has harmonics that range from its fundamental pitch, all the way to harmonics we might not even be aware that are present. In a rather beautiful way, the unique characteristic timbre of a sound is sometimes determined by hidden sounds that we take for granted because they're always there and they're hard to separate from their “parent” sound. By separating spectral bands, we can isolate the fundamental frequency of a sound from its harmonics – essentially decomposing the characteristic timbre of our imaginary piano and listening to the parts separately.

In the case of Pixel we're not separating individual frequencies, but rather, ranges of frequencies. We're going to understand how these ranges are determined once we examine the mechanisms we're using, but as a general conceptual notion, it's useful to think about band-sep as a tool that separates our incoming signal into 8 different spectral regions – from the low end to the high end of our sound.

Let’s see how this is achieved.

band-sep

band-sep

Opening band-sep doesn’t seem to give much information about how it actually works. It features yet another abstraction, called spec-unit, which has 8 MC outputs. Each MC output has 2 channels, which are then unpacked and sent to the an fftout~ object for re-synthesis.

While we might not be able to discern how it is done yet, we can tell that spec-unit is outputting the real and imaginary components of our FFT band separated analysis signal on a 2 channel MC patch cable. We’re using the mc.unpack~ object to separate this real and imaginary components and routing them to their respective inputs in the fftout~ object for resynthesis. It’s good to know this before we continue going into deeper layers – we now know what kind of output we’re expecting from spec-unit.

Uh oh. Let’s go one layer deeper.

spec-unit

spec-unit

So we’re down in the belly of the spectral-separating beast.

This patcher is where all the band-separation is realized. Let’s go step by step to understand what’s really going on here.

a. Inputs

We have three inputs.

The three inputs correspond with the outputs from fftin~ 1 from our parent patch (see band-sep for reference). Inlet 1 is the real component of our FFT analysis, inlet 2 is the imaginary component, and inlet 3 is the bin index.

If you’re familiar with working within the pfft~ environment in Max, that should be simple enough, but if you’re not, then this might all be a bit confusing.

The pfft~ object carries out something called a Short Term Fourier Transform. This is a process where a signal that is being represented as changes in amplitude in the time domain is converted it into a signal that is being represented as changes in amplitude in the spectral domain. There are a few ways of accomplishing this conversion, and there's a lot to be said on this topic, but for the sake of scope, I'll simply mention the components that are relevant to this case.

When we convert a given signal vector from the time domain into the spectral domain, we're essentially calculating what sum of sinusoids and their associated phase values would correspond to that incoming signal. Those values are output as real and imaginary numbers, and each of them have a bin index position that corresponds with their particular frequency range. Having a different bin size will affect the frequencies that will fit per bin, and the bin size is determined by the arguments in the pfft~ object.

We can think of the real and imaginary part of our signal as being our analysis data. They're a spectral representation of our signal. The bin index, on the other hand, is simply their position in the spectral range. If we had, for instance, 10 bins ranging from 0-9, the analysis information associated with bin 0 would contain the lowest harmonics and the analysis information associated with bin 9 would contain the highest harmonics of a given sound.

So, after our analysis data is input into our subpatcher, it’s repeated 8 times – 1 time per channel. The real and imaginary parts are then sent to a mc.*~ object for later multiplication, and the bin index is sent to a series of logical conditions. Let's see what that's all about.

b. Range Checking

range-check

As previously stated, we can think of our bin index as a value that tells us the spectral position of its associated analysis information. The logical condition that is set in spec-unit is checking if our current bin position is within a given range.

If the incoming value is within a specified range, then the logical operator objects (mc.>~ and mc.<~) will output a value of 1. When a value of 1 is output, it means that the analysis data will be allowed through, as the output from these logical operators is being multiplied against the analysis data. What this means is that only the analysis data within a specified range will be allowed to pass through. But how are we specifying those ranges?

In this case, we're using the mc.sig~ object as a way of specifying the value against which our bin index is being compared to. By using the signal probe function in Max 8, we can see what the values coming out of those mc.sig~ objects are:

bins-one bins-two

What this means is that channel 1 will check if our bin index lies between 0 and 4, channel 2 will check if our bin index lies between 4 and 8, and so on. We're using mc.sig~ as a way of determining different values for different channels of the same object. It's a simple way of reusing the same code we have already written, and it saves a lot of time. For reference, check out how I was doing the same kind of process in the first prototype of this idea:

bins-two

Using MC allowed me to fold this process many times onto itself, meaning I don't have to repeat myself as much as I did in the original prototype of this band separating tool (which also had other issues stemming from conceptual misunderstandings I had back then).

c. The rest of this patch

So once the ranges have been checked, that means that each channel only has FFT analysis data that corresponds to the range that we want. By using mc.unpack~ 8, we're able to separate this multi-channel signal into its 8 different components. I did this so that I could then pack them into a 2-channel MC signal that I would then output from this patch for re-synthesis later on.

This patch also features some calculations which were performed using uzi – these calculations are written into a coll and then read from the coll to be assigned to the mc.sig~ objects.

So this in general covers the band separation part of this patch. Let's cover a new section – writing into the buffers.

Working with buffers

mc-gen

In the original prototype version of this patch, I had 8 separate buffers, a separate buffer per band. I wasn't a big fan of this idea, but I didn't see any alternative solution. However, upon working with MC, and with mc.gen~ in particular, I saw an easy alternative for writing into 8 different channels of a single buffer at once. Now, in terms of memory, I suppose it might be about the same to have a single 8-channel buffer and having 8 single-channel buffers. However, in terms of patching, it's much simpler to have a single buffer to make reference to. This way, I only have to keep track of one buffer name, and there's less clutter in the patch.

We must instantiate the buffer operator to make reference to a buffer within gen~. This allows us to use the first argument in the buffer operator when wanting to make reference to a buffer in the containing Max patcher. To record to the external buffer, in this case named “—-loop” in buffer-related operators within gen~, we simply need to make reference to the buffer operator within gen~, which I also named “loop”, but without the “—–”. Sounds a bit confusing, but it's just kind of a bridging operator for buffers between Max and gen~.

Recording

We're using the poke operator to record into our buffer. We're using the “in 1” operator to determine the contents to be recorded. Because we're using MC, the contents of that input will be determined by the channel that a particular instance of that gen~ object is operating on. So, when it's working on channel 5, that in 1 corresponds to whatever signal is on channel 5.

The poke operator uses specific sample references in order to record a value onto a buffer. Think of a buffer as a table (like in a spreadsheet, for instance) – one column is the index and the other column is the sample value. We're using the counter operator – a sample rate counter – to assign the position in a buffer we want to assign to a particular incoming sample value. The counter object serves as an index position generator of sorts. We also use the mcchannel operator to assign the channel in our buffer onto which this particular signal should be written into. The mcchannel object is 1-indexed, whereas buffers are 0-indexed, which is why there's a subtraction by 1 in that connection. Poke also has a very convenient feedback input, which we're controlling with our “feedback” parameter.

The convenient thing about all of this is that by using the mc_channel operator, we can simultaneously address multiple channels of a buffer that we're recording into. I really like this, as it can help me condense my code substantially. I don't need 8 repetitions of code to be able to record onto 8 channels – I simply need a single instance of MC. Multichannel recording in Max 8 is incredibly simple.

Reading

Reading is just as simple as recording, for many of the same reasons.

The poke operator has a related operator called peek, which also works with specific sample references for indexing. Instead of working with sample indexing, I chose to work with the wave operator, which works with a phase input from 0-1. I used this because when working with ranges from 0-1, we can use the rate operator to change the way in which this linear ramp behaves. We can think of the counter operator as generating a phasor between 0-44100. If we divide this signal by 44100, as is the case in this patch, we can get a phasor between 0-1. This allows us to be able to read from wave with the same counter as we're writing, but also to manipulate this phasor with rate, which allows us to reverse, speed up, and slow down our signal. It also allows us to read from our envelope buffer, which has a different sample number than our recording buffer.

Summary

This article was intended as a way of giving an overview of the different technologies within Max 8 that the pixel looper makes use of – specifically MC, pfft~, gen~ and mc.gen~. Hopefully this provided an insight into how flexible these technologies can be, and hopefully some of this code can prove to be useful as you develop your own variations on the ideas demonstrated here.