Benchee and ETS functions
So, we played a little bit with ETS and DETS match/2
, match_object/2
and select/2
functions a while ago. We also mentioned that Erlang's Efficiency Guide suggests we use select/2
instead of match/2
and match_object/2
. But let's see when it makes sense to use it and see if we can gather some data with Benchee.
If you don't know it yet, Benchee is a nice tool for (micro) benchmarking in Elixir. Its use it's simple and we'll create a basic mix project to use it:
mix new benchee_ets
And we'll add some dependencies to help us play with it:
# in mix.exs
defp deps do
[
{:benchee, "~> 1.0"}, # benchee itself
{:benchee_html, "~> 1.0"}, # to generate nice visual reports
{:faker, "~> 0.17"} # to fake ETS data
]
end
Then we install those dependencies,
mix deps.get
And we create our ets_test.exs
script file, based on our address book example with a small dataset (100k contacts because we're popular, right?):
# setup ETS
table = :ets.new(:address_book, [:set, :protected, :named_table])
# Setup ETS data
phones = Enum.map(1..100_000, fn _n -> Faker.Phone.PtBr.phone() end)
Enum.each(phones, fn phone ->
:ets.insert(
table,
{phone,
%{
name: Faker.Person.first_name(),
address: Faker.Address.street_address(),
country_code: Enum.random([55, 34]),
number: Enum.random(1..909)
}}
)
end)
So far we've set our project up and inserted data into our ETS table. Let's run a test where we want to get all contacts with number == 101
:
# scenario 1: without keys
Benchee.run(
%{
"match_object" => fn -> :ets.match_object(table, {:_, %{number: 101}}) end,
"select" => fn ->
:ets.select(table, [{{:_, %{number: :"$1"}}, [{:==, :"$1", 101}], [:"$_"]}])
end
},
warmup: 0,
time: 5,
formatters: [
{Benchee.Formatters.HTML, file: "benchmarks/output/1_without_keys.html"},
Benchee.Formatters.Console
]
)
We created two functions that'll be compared to one another, "match_object"
and "select"
. To run it we just go with
mix run ets_test.exs
and wait for the output.
...
WAT?!
mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2
Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s
Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/1_without_keys.html
Generated benchmarks/output/1_without_keys_comparison.html
Generated benchmarks/output/1_without_keys_match_object.html
Generated benchmarks/output/1_without_keys_select.html
Opened report using xdg-open
Name ips average deviation median 99th %
match_object 72.44 13.80 ms ±11.02% 13.31 ms 19.17 ms
select 55.54 18.00 ms ±12.92% 18.89 ms 23.20 ms
Comparison:
match_object 72.44
select 55.54 - 1.30x slower +4.20 ms
Why does match_object/2
executes faster than select/2
?! Well, for a fixed pattern match without any guards or even a key, match goes through less operations than select, even though it has higher deviation and a few outliers with higher execution time.
Let's try another scenario where we know the keys we're looking for:
# scenario 2: with keys
random_phone = Enum.random(phones)
Benchee.run(
%{
"match_object" => fn -> :ets.match_object(table, {random_phone, :_}) end,
"select" => fn ->
:ets.select(table, [{{random_phone, :_}, [], [:"$_"]}])
end
},
warmup: 0,
time: 5,
formatters: [
{Benchee.Formatters.HTML, file: "benchmarks/output/2_with_keys.html"},
Benchee.Formatters.Console
]
)
And wait for the results:
mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2
Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s
Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/2_with_keys.html
Generated benchmarks/output/2_with_keys_comparison.html
Generated benchmarks/output/2_with_keys_match_object.html
Generated benchmarks/output/2_with_keys_select.html
Opened report using xdg-open
Name ips average deviation median 99th %
match_object 1.33 M 751.22 ns ±2795.49% 623 ns 1556 ns
select 1.27 M 786.07 ns ±4908.25% 610 ns 1469 ns
Comparison:
match_object 1.33 M
select 1.27 M - 1.05x slower +34.85 ns
That was close! select/2
was just a little bit slower than match_object/2
.
Now let's make things complicated for match_object/2
— otherwise why would we need a select/2
? We'd just go with the simpler match, right?
Imagine we want the contacts from our address book that have a number >= 101
. That's when the select function starts to shine, because the match functions only support simple matches (match_pattern()) and not a match_spec() as the select functions.
This scenario forces us to get all numbers on the match function and filter in the application layer, while the select function will do everything on it:
# scenario 3: without keys but with some more complex condition
Benchee.run(
%{
"match_object" => fn ->
table
|> :ets.match_object({:_, %{number: :"$1"}})
|> Enum.filter(fn {_key, value} -> value.number >= 101 end)
end,
"select" => fn ->
:ets.select(table, [{{:_, %{number: :"$1"}}, [{:>=, :"$1", 101}], [:"$_"]}])
end
},
warmup: 0,
time: 5,
formatters: [
{Benchee.Formatters.HTML, file: "benchmarks/output/3_no_keys_but_complex.html"},
Benchee.Formatters.Console
]
)
... and
mix run ets_test.exs
Operating System: Linux
CPU Information: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Number of Available Cores: 8
Available memory: 15.25 GB
Elixir 1.13.4
Erlang 25.0.2
Benchmark suite executing with the following configuration:
warmup: 0 ns
time: 5 s
memory time: 0 ns
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 10 s
Benchmarking match_object ...
Benchmarking select ...
Generated benchmarks/output/3_no_keys_but_complex.html
Generated benchmarks/output/3_no_keys_but_complex_comparison.html
Generated benchmarks/output/3_no_keys_but_complex_match_object.html
Generated benchmarks/output/3_no_keys_but_complex_select.html
Opened report using xdg-open
Name ips average deviation median 99th %
select 22.39 44.67 ms ±29.85% 40.37 ms 134.88 ms
match_object 17.32 57.73 ms ±31.74% 55.06 ms 178.56 ms
Comparison:
select 22.39
match_object 17.32 - 1.29x slower +13.07 ms
That's the beauty of the :ets.select/2
function! One, elegant call and we can benefit of the match_spec grammar.
So, wrapping up:
match_object/2
is not super slower thanselect/2
for simple matches (and small datasets)select/2
is more powerful (and faster) when dealing with complex matches due to the match_spec nature
If you want to play more with this case, the source code is available on https://github.com/yuriploc/benchee_ets.