0% found this document useful (0 votes)
18 views2 pages

FM Algorithm Theory Explanation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views2 pages

FM Algorithm Theory Explanation

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Why FM algorithm?

Every day on the internet, more than 2.5 quintillion bytes of data are created. This data

is increasing in terms of variety, velocity and volume, hence called big data. To analyze this

data, one has to collect this data, store it in a safe place, clean it and then perform analysis.

One of the major problems faced by big data engineers is dealing with unuseful or redundant

data. A lot of time and memory is used to store and analyze this extra data which turns out

to be fruitless in the end. Thus, the removal of duplicate data becomes extremely essential

to cut the analysis cost and reduce redundancy.

Data cleaning can be done using various techniques but before cleaning the data, it is

necessary to know the amount of useful data present in the dataset. Therefore, before the

removal of duplicate data from a data stream or database, it is necessary to have knowledge

of distinct or unique data present. A way to do so is by hashing the elements of the universal

set using the Flajolet Martin Algorithm. The FM algorithm is used in a database query, big

data analytics, spectrum sensing in cognitive radio sensor networks, and many more areas.

It shows superior performance as compared with many other methods to find distinct

elements in a stream of data.


Flajolet Martin Algorithm:

Flajolet Martin Algorithm, also known as FM algorithm, is used to approximate the number

of unique elements in a data stream or database in one pass. The highlight of this algorithm

is that it uses less memory space while executing.

Pseudo Code-Stepwise Solution:

1. Selecting a hash function h so each element in the set is mapped to a string to at least

log2n bits.

2. For each element x, r(x)= length of trailing zeroes in h(x)

3. R= max(r(x))

=> Distinct elements= 2R

For small values of m(where m is the number of unique elements), the brute force approach

can work, but for large data sets or data streams, where m is very large, a lot of space is

required. The compiler may not let us run the algorithm in some cases.This is where the

Flajolet Martin Algorithm can be used. Not only does it occupy less memory, but it also

shows better results in terms of time in seconds.

This approach is used to maintain a count of distinct values seen so far, given a large number

of values. For example, getting an approximation of the number of distinct URLs surfed by

a person on the web. Many companies want to check how many unique users logged in to

their website the previous day to check if their advertisement was successful or not. Here,

the FM algorithm is an excellent solution for these companies.

You might also like