TILs and Life

Yuri Oliveira

So, we played a little bit with ETS and DETS match/2, match_object/2 and select/2 functions a while ago. We also mentioned that Erlang's Efficiency Guide suggests we use select/2 instead of match/2 and match_object/2. But let's see when it makes sense to use it and see if we can gather some data with Benchee.

If you don't know it yet, Benchee is a nice tool for (micro) benchmarking in Elixir. Its use it's simple and we'll create a basic mix project to use it:

mix new benchee_ets

And we'll add some dependencies to help us play with it:

# in mix.exs
defp deps do
  [
    {:benchee, "~> 1.0"}, # benchee itself
    {:benchee_html, "~> 1.0"}, # to generate nice visual reports
    {:faker, "~> 0.17"} # to fake ETS data
  ]
end

Then we install those dependencies,

mix deps.get

And we create our ets_test.exs script file, based on our address book example with a small dataset (100k contacts because we're popular, right?):

# setup ETS
table = :ets.new(:address_book, [:set, :protected, :named_table])

# Setup ETS data
phones = Enum.map(1..100_000, fn _n -> Faker.Phone.PtBr.phone() end)

Enum.each(phones, fn phone ->
  :ets.insert(
    table,
    {phone,
     %{
       name: Faker.Person.first_name(),
       address: Faker.Address.street_address(),
       country_code: Enum.random([55, 34]),
       number: Enum.random(1..909)
     }}
  )
end)

So far we've set our project up and inserted data into our ETS table. Let's run a test where we want to get all contacts with number == 101:

# scenario 1: without keys
Benchee.run(
  %{
    "match_object" => fn -> :ets.match_object(table, {:_, %{number: 101}}) end,
    "select" => fn ->
      :ets.select(table, [{{:_, %{number: :"$1"}}, [{:==, :"$1", 101}], [:"$_"]}])
    end
  },
  warmup: 0,
  time: 5,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchmarks/output/1_without_keys.html"},
    Benchee.Formatters.Console
  ]
)

We created two functions that'll be compared to one another, "match_object" and "select". To run it we just go with

mix run ets_test.exs

and wait for the output.

...

WAT?!

mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s

Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/1_without_keys.html
Generated benchmarks/output/1_without_keys_comparison.html
Generated benchmarks/output/1_without_keys_match_object.html
Generated benchmarks/output/1_without_keys_select.html
Opened report using xdg-open

Name                   ips        average  deviation         median         99th %
match_object         72.44       13.80 ms    ±11.02%       13.31 ms       19.17 ms
select               55.54       18.00 ms    ±12.92%       18.89 ms       23.20 ms

Comparison: 
match_object         72.44
select               55.54 - 1.30x slower +4.20 ms

Why does match_object/2 executes faster than select/2?! Well, for a fixed pattern match without any guards or even a key, match goes through less operations than select, even though it has higher deviation and a few outliers with higher execution time.

Let's try another scenario where we know the keys we're looking for:

# scenario 2: with keys
random_phone = Enum.random(phones)

Benchee.run(
  %{
    "match_object" => fn -> :ets.match_object(table, {random_phone, :_}) end,
    "select" => fn ->
      :ets.select(table, [{{random_phone, :_}, [], [:"$_"]}])
    end
  },
  warmup: 0,
  time: 5,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchmarks/output/2_with_keys.html"},
    Benchee.Formatters.Console
  ]
)

And wait for the results:

mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s

Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/2_with_keys.html
Generated benchmarks/output/2_with_keys_comparison.html
Generated benchmarks/output/2_with_keys_match_object.html
Generated benchmarks/output/2_with_keys_select.html
Opened report using xdg-open

Name                   ips        average  deviation         median         99th %
match_object        1.33 M      751.22 ns  ±2795.49%         623 ns        1556 ns
select              1.27 M      786.07 ns  ±4908.25%         610 ns        1469 ns

Comparison: 
match_object        1.33 M
select              1.27 M - 1.05x slower +34.85 ns

That was close! select/2 was just a little bit slower than match_object/2.

Now let's make things complicated for match_object/2 — otherwise why would we need a select/2? We'd just go with the simpler match, right?

Imagine we want the contacts from our address book that have a number >= 101. That's when the select function starts to shine, because the match functions only support simple matches (match_pattern()) and not a match_spec() as the select functions.

This scenario forces us to get all numbers on the match function and filter in the application layer, while the select function will do everything on it:

# scenario 3: without keys but with some more complex condition
Benchee.run(
  %{
    "match_object" => fn ->
      table
      |> :ets.match_object({:_, %{number: :"$1"}})
      |> Enum.filter(fn {_key, value} -> value.number >= 101 end)
    end,
    "select" => fn ->
      :ets.select(table, [{{:_, %{number: :"$1"}}, [{:>=, :"$1", 101}], [:"$_"]}])
    end
  },
  warmup: 0,
  time: 5,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchmarks/output/3_no_keys_but_complex.html"},
    Benchee.Formatters.Console
  ]
)

... and

mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s

Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/3_no_keys_but_complex.html
Generated benchmarks/output/3_no_keys_but_complex_comparison.html
Generated benchmarks/output/3_no_keys_but_complex_match_object.html
Generated benchmarks/output/3_no_keys_but_complex_select.html
Opened report using xdg-open

Name                   ips        average  deviation         median         99th %
select               22.39       44.67 ms    ±29.85%       40.37 ms      134.88 ms
match_object         17.32       57.73 ms    ±31.74%       55.06 ms      178.56 ms

Comparison: 
select               22.39
match_object         17.32 - 1.29x slower +13.07 ms

That's the beauty of the :ets.select/2 function! One, elegant call and we can benefit of the match_spec grammar.

So, wrapping up:

  • match_object/2 is not super slower than select/2 for simple matches (and small datasets)
  • select/2 is more powerful (and faster) when dealing with complex matches due to the match_spec nature

If you want to play more with this case, the source code is available on https://github.com/yuriploc/benchee_ets.

#elixir #otp

Imagine you manage a fleet of Raspberry Pis deployed in shipping containers (the real ones, not the docker ones) that control all the environment variables of your hydroponic lettuce farm spread across the world.

Now imagine how noisy those Pi logs can be.

Let's say one of your Pis is not properly authorizing workers inside the container. Every time a worker types a password and verifies biometry the Pi should emit and log an event. But it also logs sensor data and timesheet of workers on that site.

Because of that, logging into the Pi and watching it spit out logs is not very human friendly. That's where Erlang :logger filters can come in handy.

Erlang released this new :logger in OTP 21 and Elixir released a put_module_level/2 in version 1.11.0.

Considering we're using the Logger macros to generate those, like Logger.info/2 or any other level we have as of today,

[:emergency, :alert, :critical, :error, :warning, :notice, :info, :debug]

We have the guarantee that some metadata is present, like which module generated the logs. We can also leverage that Elixir modules are atoms and use that to dynamically choose to silence some module logs.

So, what we'll do is create a module to filter the noisy modules:

defmodule HydroLettyce.LoggerHandler do
  @moduledoc """
  Let's filter some logs!
  """
  @noisy_modules [HydroLettyce.Sensors.Temperature, HydroLettyce.Auth.Biometry]
  
  # ... format or other functions you want to

  @spec filter(map(), keyword()) :: :stop | :ignore | map()
  def filter(%{meta: %{mfa: {module, _, _}}} = _log_event, _opts)
      when module in @noisy_modules do
    case Logger.get_module_level(HydroLettyce.NoisyLogs) do
      [{LogVideo, :none}] -> :stop
      _ -> :ignore
    end
  end
end

and we should also tell our Application.start/2 that our logger has a new global filter:

# HydroLettyce.Application module, application.ex file
@impl true
@spec start(atom(), keyword()) :: {:ok, pid()} | {:error, any()}
def start(_type, _args) do
  # ...
  :logger.add_primary_filter(:noise_filter, {&HydroLettyce.LoggerHandler.filter/2, []})
  # ...
end

That said, we can just iex (or an RPC call) and:

iex> Logger.put_module_level(HydroLettyce.NoisyLogs, :none)
:ok

What just happened here?

When calling Logger.put_module_level(HydroLettyce.NoisyLogs, :none) we ask logger to add to its own ETS table that the atom :'Elixir.HydroLettyce.NoisyLogs' has :none level set.

Then, as the application has already started with the HydroLettyce.LoggerHandler.filter/2 and we have a Logger.get_module_level(HydroLettyce.NoisyLogs) call, we'll get the :none level and :stop the logging flow to not pass the log forward. Everything else will be :ignored by our filter and forwarded to other logs. The add_primary_filter/2 function doesn't replace the default log filter but adds our custom to a list of global filters.

One important (and usually useful) tip here is that the log filters execute in the same process as the log caller and because of that, have no backpressure strategy. Also, consider the filter function performance (try not making network calls) because that would easily become a bottleneck for your application overall.

For reference and more information for deep dives, see:

#elixir #otp

ETS / DETS match patterns and specifications

You may have used ETS or DETS in the past. They share an almost identical API and today I'm reminding myself (and you, maybe) that we need to carefully read the docs.

ETS will be used for snippets but everything applies to DETS the same way here. Also, set, bag and duplicate_bag types behave the same.

So, imagine we have the following table:

iex> table = :ets.new(:address_book, [:set, :protected, :named_table])
:address_book

iex> :ets.insert(table, {"55-123456789", %{name: "Yuri", address: "Newbie Street", number: 101, country_code: 55}})
true

iex> :ets.insert(table, {"55-987654321", %{name: "Meru", address: "Mountain Ave", number: 901, country_code: 55}})
true

iex> :ets.insert(table, {"55-223456789", %{name: "Satya", address: "Newbie Street", country_code: 55}})            
true

And we want to search for objects where number keys are equal to 901. Let's search for any key on this example.

match/2

:ets.match/2 requires a match_pattern() as a second argument. That is, an atom() or a tuple(). Let's try that:

iex> :ets.match table, {:_, %{number: 901}}
[[]]

WAT?

Well, we need at least a pattern variable (in Erlang terms) to match something.

iex> :ets.match table, {:_, %{number: :"$1"}}
[[901], 'e']

Ok, now we have all bindings we wanted to, with the downside of having to filter for 901 in our application code. We can't add any guards or extra logic here.

match_object/2

:ets.match_object/2 also requires a match_pattern() to be present in order to return something. Rather than returning only what we bind to match, match_object returns the entire object and the key. Let's check this out:

iex> :ets.match_object table, {:_, %{number: 901}}
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}}
]

Okay, so now we can match exactly on what we want!

Also note that if we try binding the same way we did with match/2, we'd get all objects with a map and a number key.

iex> :ets.match_object table, {:_, %{number: :"$1"}}                                                   
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}},
  {"55-123456789",
   %{address: "Newbie Street", country_code: 55, name: "Yuri", number: 101}}
]

select/2

:ets.select/2 accepts a more complex second argument, a match_spec(). Match specification is a super set of a match pattern we saw on both match/2 and match_object/2, accepting guards and a MatchBody, which defines how you want your matches to return.

Roughly, match_spec is [{ match_tuple(), [guard_tuple()], [body] }].

iex> :ets.select table, {:_, %{number: :"$1"}}
** (ArgumentError) errors were found at the given arguments:

  * 2nd argument: not a valid match specification

    (stdlib 4.0.1) :ets.select(:address_book, {:_, %{number: :"$1"}})

So we can't pattern match in select/2, but we can have a more complex and refined pattern (a match spec):

iex> :ets.select table, [{{:_, %{number: :"$1"}}, [{:"==", :"$1", 901}], [:"$_"]}]
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}}
]

This specification [{{:_, %{number: :"$1"}}, [{:"==", :"$1", 901}], [:"$_"]}] means:

  • {:_, %{number: :"$1"}} Look in all keys for an object where there's a map with a number key and bind its value to :"$1"
  • [{:"==", :"$1", 901}] Guard where :"$1" is equal to 901
  • [:"$_"] Return everything that matches (key and object)

What if

Instead of looking for number keys to be equal to 901 we wanted objects where number keys are greater than 100?

Well, both match/2 and match_object/2 functions wouldn't be enough. We'd have to return all objects with number keys and filter the ones we want in out application code.

Also, match/2 and match_object/2 don't support Match Specification arguments, only simpler patterns.

But with select/2 that's fairly easy:

iex> :ets.select table, [{{:_, %{number: :"$1"}}, [{:">", :"$1", 100}], [:"$_"]}]                      
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}},
  {"55-123456789",
   %{address: "Newbie Street", country_code: 55, name: "Yuri", number: 101}}
]

If you have even more complex match cases to try with select/2, the :ets.fun2ms/1 would be the way to go in order to get a more readable match specification.

Conclusions

  • match/2 and match_object/2 support simple match patterns
  • select/2 supports more complex matches based on Erlang Match Specification
  • Erlang's Efficiency Guide recommends using select/2 instead of match/2 and match_object/2 for performance reasons

If you had issues following this post, check out these materials:

#elixir #otp

A spec(ification) is nothing more than a business or engineering requirement that software engineers use to implement features or solve bugs.

Is that so?

In our current software development process at Savi Solutions we get more than that. Because we're a startup is not a surprise when the business requirements are not clear or don't reflect what customers really want. That happens more often than we'd want to and is usually due to the lack of technical knowledge the customers have, which is expected.

The engineering requirements are not always clear as well. Usually, the person writing it has no domain knowledge of the project at hand or is not clear what the best solution is, which is part of our job as software engineers.

So, I've been observing for the past 1.5 years that a spec is not a dead document and is not something someone above you had determined what your next task is and you should blindly follow it.

The best engineers I work with always question requirements, context and needs and always think before starting writing actual code. And that's what makes them problem solvers and become the go-to people to ask challenging questions in the engineering team.

With that in mind, I started seeing the spec as a living document that's not just to be accountable for what I've been working on or to have a standup message ready to post every morning or a set of tasks written in stone.

I've found that the spec doc writing process can have many faces, like:

  • a moment to plan ahead which specific tasks should be done and in which order (should update a schema before pushing the migration?);
  • where you bring context to your mind by checking important files or functions that could be changed;
  • a powerful way of documenting and tracking your ongoing work;
  • notes from your present self (on Friday) for your future self (on Monday) so you can clear your head during the weekend and start the week knowing what should be done;
  • a way of cleaning your head with the problem at hand so you can start thinking about other possible side-effects you should consider when implementing the feature;

As a result of those possibilities, a detailed spec document can take some amount of time to be done with the outcome of reducing implementation time, future bug correction and code review rounds. That not only improves our productivity as engineers but also improves the productivity of the entire engineering organization.

#software

I'm an average human being.

I'm an average Software Engineer.

I was always an average student. I'm an average adult.

I go to places and people don't notice me. I get to talk to random people and they don't recall me. I'm almost invisible.

I'm visible to the ones really close to me, though. I'm someone's child, I'm someone's husband, I'm someone's father, I'm a dog owner. But I get in and get out of places and no one knows me.

I'm an average surfer, will never be able to compare myself to the GOAT. Will never fly over the waves.

I'm an average dog trainer. Will never be in an IPO or a Mondioring competition. I can't teach advanced tricks to my dog.

I'm an average software engineer. I've always needed help, I struggle so much on my own. I've given up so many times I can't count anymore. As an average, I can't excel in my career either, I'm stuck with easy-to-mid-level tasks and will probably never do anything meaningful on it. I'll always be the “maintenance engineer” hired to take care of things the clever engineers built.

I'm an average father. I can't make my child sleep at night without the mother's help. I can't brush her teeth on a daily basis because I keep forgetting. I would never be able to take care of a child on my own. I'm barely helpful in my home.

I once dreamed I was noticed. People noticed me. That made me happy. But then I realized it was a dream and nobody was noticing me. What I do, who I am, my capabilities. I've always been too old to become a gymnast. I'm too old to do anything meaningful with my life.

The only thing I hope is the demand for average people continues to exist so I can pay my bills and provide for my family.

#life

It's not unusual for database queries to avoid select * when we can use an index (say, :id) and have a performance gain.

So, instead of writing:

query = from p in Post
Repo.all(query)

And getting back more data than we would care using, we can explicitly tell Ecto (and the DB) which columns we want it to return us:

query = from p in Post, select: %{id: p.id, title: p.title, category_id: p.category_id}
Repo.all(query)

But why do we have to be so explicit and duplicate keys and values? Isn't there a better way?

It turns out Ecto.Query already solved this for us with the map/2 function. So this:

query = from p in Post, select: %{id: p.id, title: p.title, category_id: p.category_id}
Repo.all(query)

Becomes:

query = from p in Post, select: map(p, [:id, :title, :category_id])
Repo.all(query)

Or, in Pipeland:

Post
|> select([p], %{id: p.id, title: p.title, category_id: p.category_id})
|> Repo.all()
Post
|> select([p], map(p, [:id, :title, :category_id]))
|> Repo.all()

And we can even have dynamic fields when using it in a function, like:

def filter_posts_by_id(post_ids, fields \\ [:id, :title, :category_id]) do
  Post
    |> where([p], p.id in ^posts_ids)
    |> select([p], map(p, ^fields))
    |> Repo.all()
end

Enjoy Ecto!

Thank you to the Groxio Mentoring folks for the support

#elixir #ecto