TILs and Life

elixir

So, we played a little bit with ETS and DETS match/2, match_object/2 and select/2 functions a while ago. We also mentioned that Erlang's Efficiency Guide suggests we use select/2 instead of match/2 and match_object/2. But let's see when it makes sense to use it and see if we can gather some data with Benchee.

If you don't know it yet, Benchee is a nice tool for (micro) benchmarking in Elixir. Its use it's simple and we'll create a basic mix project to use it:

mix new benchee_ets

And we'll add some dependencies to help us play with it:

# in mix.exs
defp deps do
  [
    {:benchee, "~> 1.0"}, # benchee itself
    {:benchee_html, "~> 1.0"}, # to generate nice visual reports
    {:faker, "~> 0.17"} # to fake ETS data
  ]
end

Then we install those dependencies,

mix deps.get

And we create our ets_test.exs script file, based on our address book example with a small dataset (100k contacts because we're popular, right?):

# setup ETS
table = :ets.new(:address_book, [:set, :protected, :named_table])

# Setup ETS data
phones = Enum.map(1..100_000, fn _n -> Faker.Phone.PtBr.phone() end)

Enum.each(phones, fn phone ->
  :ets.insert(
    table,
    {phone,
     %{
       name: Faker.Person.first_name(),
       address: Faker.Address.street_address(),
       country_code: Enum.random([55, 34]),
       number: Enum.random(1..909)
     }}
  )
end)

So far we've set our project up and inserted data into our ETS table. Let's run a test where we want to get all contacts with number == 101:

# scenario 1: without keys
Benchee.run(
  %{
    "match_object" => fn -> :ets.match_object(table, {:_, %{number: 101}}) end,
    "select" => fn ->
      :ets.select(table, [{{:_, %{number: :"$1"}}, [{:==, :"$1", 101}], [:"$_"]}])
    end
  },
  warmup: 0,
  time: 5,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchmarks/output/1_without_keys.html"},
    Benchee.Formatters.Console
  ]
)

We created two functions that'll be compared to one another, "match_object" and "select". To run it we just go with

mix run ets_test.exs

and wait for the output.

...

WAT?!

mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s

Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/1_without_keys.html
Generated benchmarks/output/1_without_keys_comparison.html
Generated benchmarks/output/1_without_keys_match_object.html
Generated benchmarks/output/1_without_keys_select.html
Opened report using xdg-open

Name                   ips        average  deviation         median         99th %
match_object         72.44       13.80 ms    ±11.02%       13.31 ms       19.17 ms
select               55.54       18.00 ms    ±12.92%       18.89 ms       23.20 ms

Comparison: 
match_object         72.44
select               55.54 - 1.30x slower +4.20 ms

Why does match_object/2 executes faster than select/2?! Well, for a fixed pattern match without any guards or even a key, match goes through less operations than select, even though it has higher deviation and a few outliers with higher execution time.

Let's try another scenario where we know the keys we're looking for:

# scenario 2: with keys
random_phone = Enum.random(phones)

Benchee.run(
  %{
    "match_object" => fn -> :ets.match_object(table, {random_phone, :_}) end,
    "select" => fn ->
      :ets.select(table, [{{random_phone, :_}, [], [:"$_"]}])
    end
  },
  warmup: 0,
  time: 5,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchmarks/output/2_with_keys.html"},
    Benchee.Formatters.Console
  ]
)

And wait for the results:

mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s

Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/2_with_keys.html
Generated benchmarks/output/2_with_keys_comparison.html
Generated benchmarks/output/2_with_keys_match_object.html
Generated benchmarks/output/2_with_keys_select.html
Opened report using xdg-open

Name                   ips        average  deviation         median         99th %
match_object        1.33 M      751.22 ns  ±2795.49%         623 ns        1556 ns
select              1.27 M      786.07 ns  ±4908.25%         610 ns        1469 ns

Comparison: 
match_object        1.33 M
select              1.27 M - 1.05x slower +34.85 ns

That was close! select/2 was just a little bit slower than match_object/2.

Now let's make things complicated for match_object/2 — otherwise why would we need a select/2? We'd just go with the simpler match, right?

Imagine we want the contacts from our address book that have a number >= 101. That's when the select function starts to shine, because the match functions only support simple matches (match_pattern()) and not a match_spec() as the select functions.

This scenario forces us to get all numbers on the match function and filter in the application layer, while the select function will do everything on it:

# scenario 3: without keys but with some more complex condition
Benchee.run(
  %{
    "match_object" => fn ->
      table
      |> :ets.match_object({:_, %{number: :"$1"}})
      |> Enum.filter(fn {_key, value} -> value.number >= 101 end)
    end,
    "select" => fn ->
      :ets.select(table, [{{:_, %{number: :"$1"}}, [{:>=, :"$1", 101}], [:"$_"]}])
    end
  },
  warmup: 0,
  time: 5,
  formatters: [
    {Benchee.Formatters.HTML, file: "benchmarks/output/3_no_keys_but_complex.html"},
    Benchee.Formatters.Console
  ]
)

... and

mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2

Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s

Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/3_no_keys_but_complex.html
Generated benchmarks/output/3_no_keys_but_complex_comparison.html
Generated benchmarks/output/3_no_keys_but_complex_match_object.html
Generated benchmarks/output/3_no_keys_but_complex_select.html
Opened report using xdg-open

Name                   ips        average  deviation         median         99th %
select               22.39       44.67 ms    ±29.85%       40.37 ms      134.88 ms
match_object         17.32       57.73 ms    ±31.74%       55.06 ms      178.56 ms

Comparison: 
select               22.39
match_object         17.32 - 1.29x slower +13.07 ms

That's the beauty of the :ets.select/2 function! One, elegant call and we can benefit of the match_spec grammar.

So, wrapping up:

  • match_object/2 is not super slower than select/2 for simple matches (and small datasets)
  • select/2 is more powerful (and faster) when dealing with complex matches due to the match_spec nature

If you want to play more with this case, the source code is available on https://github.com/yuriploc/benchee_ets.

#elixir #otp

Imagine you manage a fleet of Raspberry Pis deployed in shipping containers (the real ones, not the docker ones) that control all the environment variables of your hydroponic lettuce farm spread across the world.

Now imagine how noisy those Pi logs can be.

Let's say one of your Pis is not properly authorizing workers inside the container. Every time a worker types a password and verifies biometry the Pi should emit and log an event. But it also logs sensor data and timesheet of workers on that site.

Because of that, logging into the Pi and watching it spit out logs is not very human friendly. That's where Erlang :logger filters can come in handy.

Erlang released this new :logger in OTP 21 and Elixir released a put_module_level/2 in version 1.11.0.

Considering we're using the Logger macros to generate those, like Logger.info/2 or any other level we have as of today,

[:emergency, :alert, :critical, :error, :warning, :notice, :info, :debug]

We have the guarantee that some metadata is present, like which module generated the logs. We can also leverage that Elixir modules are atoms and use that to dynamically choose to silence some module logs.

So, what we'll do is create a module to filter the noisy modules:

defmodule HydroLettyce.LoggerHandler do
  @moduledoc """
  Let's filter some logs!
  """
  @noisy_modules [HydroLettyce.Sensors.Temperature, HydroLettyce.Auth.Biometry]
  
  # ... format or other functions you want to

  @spec filter(map(), keyword()) :: :stop | :ignore | map()
  def filter(%{meta: %{mfa: {module, _, _}}} = _log_event, _opts)
      when module in @noisy_modules do
    case Logger.get_module_level(HydroLettyce.NoisyLogs) do
      [{LogVideo, :none}] -> :stop
      _ -> :ignore
    end
  end
end

and we should also tell our Application.start/2 that our logger has a new global filter:

# HydroLettyce.Application module, application.ex file
@impl true
@spec start(atom(), keyword()) :: {:ok, pid()} | {:error, any()}
def start(_type, _args) do
  # ...
  :logger.add_primary_filter(:noise_filter, {&HydroLettyce.LoggerHandler.filter/2, []})
  # ...
end

That said, we can just iex (or an RPC call) and:

iex> Logger.put_module_level(HydroLettyce.NoisyLogs, :none)
:ok

What just happened here?

When calling Logger.put_module_level(HydroLettyce.NoisyLogs, :none) we ask logger to add to its own ETS table that the atom :'Elixir.HydroLettyce.NoisyLogs' has :none level set.

Then, as the application has already started with the HydroLettyce.LoggerHandler.filter/2 and we have a Logger.get_module_level(HydroLettyce.NoisyLogs) call, we'll get the :none level and :stop the logging flow to not pass the log forward. Everything else will be :ignored by our filter and forwarded to other logs. The add_primary_filter/2 function doesn't replace the default log filter but adds our custom to a list of global filters.

One important (and usually useful) tip here is that the log filters execute in the same process as the log caller and because of that, have no backpressure strategy. Also, consider the filter function performance (try not making network calls) because that would easily become a bottleneck for your application overall.

For reference and more information for deep dives, see:

#elixir #otp

ETS / DETS match patterns and specifications

You may have used ETS or DETS in the past. They share an almost identical API and today I'm reminding myself (and you, maybe) that we need to carefully read the docs.

ETS will be used for snippets but everything applies to DETS the same way here. Also, set, bag and duplicate_bag types behave the same.

So, imagine we have the following table:

iex> table = :ets.new(:address_book, [:set, :protected, :named_table])
:address_book

iex> :ets.insert(table, {"55-123456789", %{name: "Yuri", address: "Newbie Street", number: 101, country_code: 55}})
true

iex> :ets.insert(table, {"55-987654321", %{name: "Meru", address: "Mountain Ave", number: 901, country_code: 55}})
true

iex> :ets.insert(table, {"55-223456789", %{name: "Satya", address: "Newbie Street", country_code: 55}})            
true

And we want to search for objects where number keys are equal to 901. Let's search for any key on this example.

match/2

:ets.match/2 requires a match_pattern() as a second argument. That is, an atom() or a tuple(). Let's try that:

iex> :ets.match table, {:_, %{number: 901}}
[[]]

WAT?

Well, we need at least a pattern variable (in Erlang terms) to match something.

iex> :ets.match table, {:_, %{number: :"$1"}}
[[901], 'e']

Ok, now we have all bindings we wanted to, with the downside of having to filter for 901 in our application code. We can't add any guards or extra logic here.

match_object/2

:ets.match_object/2 also requires a match_pattern() to be present in order to return something. Rather than returning only what we bind to match, match_object returns the entire object and the key. Let's check this out:

iex> :ets.match_object table, {:_, %{number: 901}}
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}}
]

Okay, so now we can match exactly on what we want!

Also note that if we try binding the same way we did with match/2, we'd get all objects with a map and a number key.

iex> :ets.match_object table, {:_, %{number: :"$1"}}                                                   
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}},
  {"55-123456789",
   %{address: "Newbie Street", country_code: 55, name: "Yuri", number: 101}}
]

select/2

:ets.select/2 accepts a more complex second argument, a match_spec(). Match specification is a super set of a match pattern we saw on both match/2 and match_object/2, accepting guards and a MatchBody, which defines how you want your matches to return.

Roughly, match_spec is [{ match_tuple(), [guard_tuple()], [body] }].

iex> :ets.select table, {:_, %{number: :"$1"}}
** (ArgumentError) errors were found at the given arguments:

  * 2nd argument: not a valid match specification

    (stdlib 4.0.1) :ets.select(:address_book, {:_, %{number: :"$1"}})

So we can't pattern match in select/2, but we can have a more complex and refined pattern (a match spec):

iex> :ets.select table, [{{:_, %{number: :"$1"}}, [{:"==", :"$1", 901}], [:"$_"]}]
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}}
]

This specification [{{:_, %{number: :"$1"}}, [{:"==", :"$1", 901}], [:"$_"]}] means:

  • {:_, %{number: :"$1"}} Look in all keys for an object where there's a map with a number key and bind its value to :"$1"
  • [{:"==", :"$1", 901}] Guard where :"$1" is equal to 901
  • [:"$_"] Return everything that matches (key and object)

What if

Instead of looking for number keys to be equal to 901 we wanted objects where number keys are greater than 100?

Well, both match/2 and match_object/2 functions wouldn't be enough. We'd have to return all objects with number keys and filter the ones we want in out application code.

Also, match/2 and match_object/2 don't support Match Specification arguments, only simpler patterns.

But with select/2 that's fairly easy:

iex> :ets.select table, [{{:_, %{number: :"$1"}}, [{:">", :"$1", 100}], [:"$_"]}]                      
[
  {"55-987654321",
   %{address: "Mountain Ave", country_code: 55, name: "Meru", number: 901}},
  {"55-123456789",
   %{address: "Newbie Street", country_code: 55, name: "Yuri", number: 101}}
]

If you have even more complex match cases to try with select/2, the :ets.fun2ms/1 would be the way to go in order to get a more readable match specification.

Conclusions

  • match/2 and match_object/2 support simple match patterns
  • select/2 supports more complex matches based on Erlang Match Specification
  • Erlang's Efficiency Guide recommends using select/2 instead of match/2 and match_object/2 for performance reasons

If you had issues following this post, check out these materials:

#elixir #otp

It's not unusual for database queries to avoid select * when we can use an index (say, :id) and have a performance gain.

So, instead of writing:

query = from p in Post
Repo.all(query)

And getting back more data than we would care using, we can explicitly tell Ecto (and the DB) which columns we want it to return us:

query = from p in Post, select: %{id: p.id, title: p.title, category_id: p.category_id}
Repo.all(query)

But why do we have to be so explicit and duplicate keys and values? Isn't there a better way?

It turns out Ecto.Query already solved this for us with the map/2 function. So this:

query = from p in Post, select: %{id: p.id, title: p.title, category_id: p.category_id}
Repo.all(query)

Becomes:

query = from p in Post, select: map(p, [:id, :title, :category_id])
Repo.all(query)

Or, in Pipeland:

Post
|> select([p], %{id: p.id, title: p.title, category_id: p.category_id})
|> Repo.all()
Post
|> select([p], map(p, [:id, :title, :category_id]))
|> Repo.all()

And we can even have dynamic fields when using it in a function, like:

def filter_posts_by_id(post_ids, fields \\ [:id, :title, :category_id]) do
  Post
    |> where([p], p.id in ^posts_ids)
    |> select([p], map(p, ^fields))
    |> Repo.all()
end

Enjoy Ecto!

Thank you to the Groxio Mentoring folks for the support

#elixir #ecto