hammertoe

A coffee-loving software developer.

So... umm... yeah... I guess I might be addicted to Vietnamese coffee at the moment. First it was having it hot, then iced... now even colder, I've gone for making ice cream.

This post is part of my coffee series of posts, a full list of which is available here.

It has been a gorgeously sunny, hot, week here in the UK this past week. And so of course as soon as I make this, I wake up to rain the next morning. Sorry. But not sorry. I have yummy ice cream.

And now, so can you!... here we go.

You need:

  • Some Vietnamese coffee (or failing that, some instant espresso)
  • ½ can of condensed milk
  • 300ml of double cream
  • ½ a cinnamon stick
  • 3 cardamom pods
  • An electric whisk
  • A freezer
  • A container to put it in (about 500ml / 1 pint)

Time:

  • 10 mins to make
  • 2 hours to infuse the coffee
  • Overnight to freeze

To make:

  • Bash the cardamom pods with a spoon to break them open and put them into a mug along with the cinnamon stick.
  • If using Vietnamese coffee grounds, put a couple of tablespoons of grounds in a phin (Vietnamese coffee filter) and set on top of the mug. Pour in some boiling water... it might take quite some time to filter through and you might need to keep topping it up. You want about 50 – 100ml (1-2 double espresso's worth) or so at the end. If you are using instant coffee, then put 2 tablespoons of that in the mug and pout over 50-100ml of boiling water
  • Let stand for 2 hours for the coffee to absorb the cinnamon and cardamom.
  • Pour the double cream and condensed milk into a large bowl. Strain the coffee into the bowl with a sieve, discard the spices.
  • Beat the mixture with the electric whisk for 5 minutes so it gets thicker and a bit fluffy with air
  • Pour / scoop it into a container and put in the freezer overnight.
  • Enjoy!

So, I have written about Vietnamese Coffee before. But despite being a big fan of it, I've never before had it iced. Well today it is a nice hot (for the UK) 25 degree day. Not quite as warm as Hanoi, which is currently about 35 degrees. But still... hot enough for my mind to wander towards iced coffee.

This post is part of my coffee series of posts, a full list of which is available here.

I'm someone who generally likes their coffee black. I occasionally have a flat white or cappuccino, but never drink just 'regular' coffee with milk or sugar.

In the summer I occasionally have an iced coffee in a cafe, but never really considered making one at home. No idea why, it is simple enough, just never thought of it.

Vietnamese coffee is generally done in one of two ways, either hot (how I generally do it) or over ice. Given the temperature in Vietnam, the iced version has obvious appeal. So I thought I'd make the iced version today.... and wow! It is amazing. Normally coffee goes very bitter when cold and needs sugar to balance it. But the condensed milk in this works really well.

I am also using Trung Nguyên Vietnamese coffee. My local asian supermarket stocks it, and whilst I'm generally all for fresh roast beans and grinding, this pre-ground coffee works so well for this.

I have a traditional small metal filter called a 'Phin' that I bought on ebay for a few pounds.

Process:

  • Put the phin on top of a mug, and put in a tablespoon of ground coffee in top
  • Place the filter screen on top
  • Pour in a small amount of boiling water and let stand 20 seconds for the coffee grounds to bloom.
  • Pour in water to the top.
  • Put on the lid and let stand for few minutes to trickle through.
  • Pour the coffee into a glass, add ice cubes and 2 tbsp of condensed milk
  • Stir well
  • Enjoy!

This is the second part of a series looking at IBM Cloud functions. In this session we looked at using Docker images to provide a richer, custom environment for Cloud Functions. The previous session, an intro to IBM Cloud Functions is available here.

A recording of the live coding session on Twitch is below:

https://www.cinnamon.video/watch?v=328051630442284961

Session recap

To start with you need an account on IBM Cloud. You can get a free account by signing up here.

You can either manage the Cloud functions through the IBM Cloud web portal, or you can use the command line interface (CLI). To install the CLI and Cloud Functions plugin, details are here: https://cloud.ibm.com/functions/learn/cli

In this case I wanted to be able to use the audio processing library librosa and the video processing library ffmpeg. Those libraries require some operating system level dependancies, and so I need to create a custom Docker image for them to be run with. The image I created, librosa_ml, is on Docker hub.

How to create and invoke a Cloud Functions with a custom Docker image:

https://gist.github.com/hammertoe/8c495efd9e475394c585868073549dc8

The code for the calculate_alignment.py file is based upon the code we looked at in a previous session to do with aligning audio files:

https://gist.github.com/hammertoe/afa404c36a65a5d048aa27e65ad7b5ce

As usual all the code and examples here can be found in my Github repository for the show:

https://github.com/IBMDeveloperUK/ML-For-Everyone

I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel:

https://developer.ibm.com/livestream/

IBM Cloud Functions are a “serverless” technology based on Apache Openwhisk hosted by IBM. They allow you to write code and deploy it without having to worry about hosting, operating systems.

Below is the first part of a series of live coding sessions on Twitch I've done exploring IBM Cloud functions and showing some simple examples.

https://www.cinnamon.video/watch?v=329436575563777372

Session recap

To start with you need an account on IBM Cloud. You can get a free account by signing up here.

You can either manage the Cloud functions through the IBM Cloud web portal, or you can use the command line interface (CLI). To install the CLI and Cloud Functions plugin, details are here: https://cloud.ibm.com/functions/learn/cli

There are four main constructs in Cloud Functions:

  • Actions
  • Sequences
  • Triggers
  • Rules

In this session, we mainly covered actions, which is where the code lives. IBM Cloud Functions is based on Apache Openwhisk and offers a number of different runtimes: JS/NodeJS, Swift, Java, Python, Ruby, Go, PHP, .NET, Any language (Docker)

Some simple examples of creating, updating, invoking and deleting an action in IBM Cloud Functions:

https://gist.github.com/hammertoe/4f3409dad274162c0b0180d140387d11

As usual all the code and examples here can be found in my Github repository for the show:

https://github.com/IBMDeveloperUK/ML-For-Everyone

I hope you enjoyed the video, if you want to catch them live, I stream each week at 2pm UK time on the IBM Developer Twitch channel:

https://developer.ibm.com/livestream/

As part of the Call for Code, I am working on a project, called Choirless, with a couple of colleagues at IBM that allows remote singers sing together as a choir.

A recording of the livestream of this coding session is below:

https://www.cinnamon.video/watch?v=318119613646767615

The way Choirless works is that it allows each singer to sing their part separately and then the parts are all mixed together to create a performance. One of the main technical issues is dealing with the variable latencies in recording the audio for each part.

The image below shows two waveforms. The reference piece sung by the choir leader (blue) and the recorded piece sung by my daughter, Sarah (orange). As you can see the orange waveform appears to be slightly delayed behind the reference, blue, waveform.

Down below is the python notebook I wrote as part of the livestream. If you scroll down to cell 7, you can hit the play button and you will hear how Sarah's recording is slightly delayed from the reference. This is not to do with her singing ability, but that there appears to be a slight delay in the capture process, so even though she appears to be singing in time to the reference piece when she records it, it is actually captured slightly behind.

The way we tackle this is by using a library called librosa to detect the 'onset strength' of the signal. We then find any peaks that are twice as high as normal, and use those as reference points to align the two signals. In order to provide a level of fuzzy matching, we add an artificial build up and decay to the peaks such that we can use that to measure how good a fit we get when we line them up.

We can then shift Sarah's signal forwards and backwards an amount and measure the 'fit' at each point. The best fit will be where the error (MSE) between the two signals is the lowest:

In this case the best fit is with Sarah's audio shifted 25 frames forward. Each frame is 256 samples and each sample is 1/44100th of a second. So that equates to Sarah's audio being about 300 milliseconds behind. So if we cut the first 300ms off Sarah's audio she will then be in time with the reference track.

You can scroll down to cell 13 in the notebook below and hear what it sounds like when the two audio tracks have been synchronised.

Here is a video of some of the output from Choirless showing 4 singers, and a total of 9 audio tracks that have been synchronised together using this algorithm.

As always, if you want to have a go at any of this yourself, you can sign up for a free IBM Cloud account and create you own Watson Studio instance and have a play.

Still catching up here, as a week behind. I said to myself that I would make sure I keep up with the weeknotes. At first I was worried that I wouldn't have enough to write about. Now I'm having the opposite problem... too much! So this will be a very quick run down for last week as got a lot more to put in this week's already

Meetups

This week was the PyData UK May meetup. PyData is a regional event, and I have attended and spoken before at PyData Bristol previously. Thiis month and with COVID-19 sees the various UK regions (PyData Bristol, PyData Manchester, PyData Edinburgh... etc.) all join together to do a single PyData UK meetup.

It worked really well. Probably one of the “best produced” online meetups I've attended so far recently. It was streamed live on Youtube, I'm guessing from something like OBS with each speaker remote and brought in via video conference. The management of it was very slick, and handover between speaker very smooth.

Speakers:

  • John Sandell – 5 Python modules. Some great little tidbits in here, even for an experiences Python developer. Black looks pretty cool, and interesting to see it used directly in a Jupyter notebook. Something I might try on my Twitch stream.

  • Paige Bailey – TensorFlow Probability. A great talk... much of it over my head, but still very interesting to learn about Tensorflow Probability. Again, something I should take a look at on a future live stream.

  • Ian Ozsvald – Making Pandas Fly. Some good tips about being a bit more efficient with large datasets in Pandas. Mostly about chosing more appropriate data types for your columns to reduce memory usage.

Mentoring

I had a great opportunity to mentor at the Beginner's Data Workshop for Minorities as part of PyCon 2020's online conference this year. It was very well attended with around 16 'tables' each one containing a mentor and 5-6 mentees.

Although it took quite a bit of time at the start to get everyone organised into their respective 'tables', the end result worked very well. We had a separate Zoom 'room' each, and a channel in Slack to communicate. We used a collaborative Jupyter notebook system called CoCalc. The interface is quite cluttered and clunky, but it allows the mentor to jump in to a student's notebook and have a Google-docs-like collaborative editing.

Last week I did a session on what an “Autoencoder” is and how to build one.

This week I took the process further and looked at one particular use for an autoencoder – to find similar images. The video of the complete coding session is on Cinnamon:

https://www.cinnamon.video/watch?v=314194811739441102

As I mentioned before an autoencoder takes some kind of input data and compresses it down to a smaller representation of itself. In the process the neural network learns which elements of the data are “important” in terms of containing information to recreate or describe the image.

As an example, if we played a game and you had a card with a photo in front of you and could use a maximum of only 3 words/phrases to describe it, such that I could try and draw what is on your card. You might say “Male; 5;9”; dark hair”. Or “Car; red; coupe”.

That is what an autoencoder is trying to do. It is trying to learn the most important bits of information that allow it to recreate or describe the input data.

So once we have that descriptive information we can then search for other things that have similar descriptions. So, for example, I'm browsing a shopping catalogue online and I am looking at a shirt, and I want to “find all other items that look like this”. The autoencoder could have broken the image down to: “Has arms, has collar; is red” and then it could look for all other items that match that description and show them to me.

In the case of the autoencoder we build in the live stream, we are compressing the image down to just 8 x 4-byte floats (32 bytes). So the original 28 x 28px (784 bytes) image is being compressed down to 25 times smaller representation.

If we then take that 8-byte array and treat it as describing a vector in an 8-dimentional space, we can use Cosine Similarity as a measure of how similar one encoding is to another.

The result worked surprisingly well! For each row, the image on the left is the original image we start with, then each other image in the row are images whose encoding are “most similar” to the original

You can see the complete notebook below, and you can sign up for IBM Cloud for a free account to run this yourself here.

Chaos

So this week started with chaos. At the end of last week I was talking to someone else at IBM and they say “Hey, your email address has that '1' at the end of it, you know you could change it to something better”.

“Better?” I thought. Hrmm... yeah, my email address is a bit ugly, and as a developer advocate, it would be nice to have one a bit more memorable than that dowdy automatic IBM one. What could I use instead? How about “hammertoe”? It is a phonetic corruption of my surname “Hamilton” that came about due to a Unix system upgrade 25 years ago. Strangely it never seems to be taken on any platforms for some reason 🤣. I'm hammertoe on Coil, on Twitter, Skype, Gmail, on pretty much every other site.

So I went to the form, and filled in to set up an email alias for hammertoe@ibm.com.

The next day... I couldn't log in to anything. “Uh-oh”, I thought.

So yeah... turns out something went wrong... still not sure what, whether it was my mistake or something went wrong in a process somewhere, but the end result was my main primary ID changed from matthew.hamilton1@ibm.com to hammertoe@ibm.com. Anyone who has worked in a large enterprise or who works in IT will be able to appreciate the sheer chaos this caused. I use about a dozen different systems regularly on a daily basis that all know me by the original ID. Suddenly everything started to vanish. I could no longer log in... Box suddenly had none of my files. Various internal compliance systems were sounding alerts because wtf is this “hammertoe” account that just suddenly appeared out of nowhere?

So that was why the last weeknote was so late, as I was busy trying to sort all that mess out. I'm nearly back to where I should be... however can't access my email through the more modern web-based email we have, and have to use IBM Notes. I first encountered Lotus Notes (as it was then) around 1995, and it was pretty revolutionary then. This was in days of Windows 3.11 and 9600 baud modems. It's killer feature was the way it could synchronise offline databases for “road warriors”.

Catching up with friends

I caught up with two friends I'd not chatted to in quite some time. One, Calvin, I know through the Plone Community. He and his family live in Indiana, US and through co-incidence have been linking up with IBM Developer Advocate locally to them. They have been running a local Python conference IndyPy for decades now. I'd generally see Calvin at least once a year at the annual Plone Conference or one of the Python Conferences. It reminds me that the last EuroPython conference I went to was in Birmingham, UK in 2009 and where I met the guy who would later hire me and be my boss at IBM a decade later.

I actually spoke at that conference and did a talk called “Lipstick on a Pig” and was about dynamically re-skinning a .NET portal site using Python... a bit of a blast from the past:

Hopefully with all the online conferences happing, I will try to “attend” some of the Python conferences this year, and hopefully speak at some of them.

The other friend I caught up with was, a former co-director of mine in the company above, Netsight. We were talking about how to get the best out of social media in terms of whether to mix personal and professional topics.

I have been pretty open and free-flowing on Twitter. I talk a lot about my political views on there. I know I likely alienate some people with them, but it is my personal account and I say what I feel on there. Did I have concerns mixing my personal tweets with my work ones? Yes, a bit. Especially when I became a Developer Advocate, and I now have my boss, boss's boss, and boss's boss's boss following me on Twitter. But I stick to my honesty on there. I do like vigorous debates, and often find myself in arguments with people over political views or about cryptocurrencies. But at the same time I have learned a lot from others, and I hopefully have imparted knowledge to others too. That is the ultimate goal.

Anyway, this weeknote is already late, so I'll finish it here. I have lots to cover in the next one ;)

This is a write-up of the Twitch stream I did last week on the IBM Developer Twitch channel on autoencoders. You can watch the full session of the video below:

https://www.cinnamon.video/watch?v=309145882496861783

An autoencoder is a system of neural networks that can 'compress' data down to a more compact representation.

This is done by passing the input data through a series of layers that output less and less information then expanding it back out again. The layers in the network have to learn the 'important' parts of the data.

Below you can see some input images from a dataset of clothing images, and the output from the autoencoder:

In this case the input image is 28px x 28px = 786 bytes and the intermediate coding is 4 x 4 x 4 bytes = 64 bytes. So we have managed to compress our images by a factor of 12x and still retain a substantial amount of information about the images.

What uses are there for an autoencoder? Well you could use the intermediate code and store that if you just needed a summary representation of the image. You could also use it for matching images in a search system. e.g. all pairs of trousers should have similar intermediate codes.

The complete notebook from this stream is below. If you want to have a go yourself, click here to sign up for a free account on IBM Cloud and have a go in Watson Studio yourself.

[slightly delayed for reasons you'll see in the next post]

Understanding AI Conference

Last week we ran a conference called “Understanding AI”. This was the first “virtual conference” our team had organised completely ourselves. And it went fantastically. In the end we had over 400 unique viewers of the conference and a peak of over 220 simultaneous viewers.

The conference was delivered on the “BigMarker” platform, which overall worked extremely well. We actually had a hidden producer (as in a human one) behind the scenes from a professional videography company who was sort of 'stage managing' the whole thing and cutting between people and arranging the videos and slides to all come up at the right time. This allowed the speakers to focus on their talks themselves.

We actually had loads of questions by the end, but ran out of time to go through all of them. Next time we should schedule more time for Q&A as there were far more than we expected.

If you want to watch the event, there is a playback of it here:

https://www.bigmarker.com/ibm-developer-uki/Understanding-AI

We had a good mix of speakers and topics, ranging from talking about the ethics to biases to talking about applications. There was a talk from one of the US Developer Advocates, Maureen, talking about fairness and bias in AI, which is a very big problem that as developers we need to be aware of when developing AI solutions:

There was also a showcase from one of my colleagues, Liam, on his Call for Code project last year. The project, Project Reunite, uses machine learning to help people find lost family members and loved ones after natural disasters. Typically people have to manually trawl through books or website of photos. If there are 50,000 photos then on average you will need to look through 25,000 of them to find the person you are looking for. With Project Reunite's algorithm the user would need to search through on average just 40 photos to find a match.

I have to say, I slept better than I have done for ages that night. It wasn't so much the relief of the confernce being wrapped up. I was pretty confident in that, but I think it was more a feeling of finding a bit more of a “purpose” right now with things all being a bit haywire in the world. This conference was a project that was born in lockdown, and completely executed in our new way of working. I can see us doing more of these in the future.

One thing we are finding now is that with the breakdown of physical barriers for conferences, it is easier to pull in diverse and remote speakers. Since everyone is remote.So I'm hoping this brings about more collaboration between the various developer advocacy teams at IBM across the globe.

Twitch Stream

I did another Twitch stream this week on creating an “Autoencoder” in Python. I'll write it up fully in another post like I have done with previous Twitch streams. Annoyingly still had some video lag in this stream. Afterwards I quite Firefox and restarted it and the lag went away, so for future reference that is what I need to try. My hunch is that some other process had had the webcam open and had put it in a strange state causing it to lag.

I'm getting pretty used to doing the Twitch streams, and hopefully should be doing one each week from now on. I'm slowly working through technical issues. Now I have the lighting rig and backgrounds it certainly has a more professional look to it.