How Excel Works From Data Structures Point of View
How Excel Works From Data Structures Point of View
1 Introduction
We would be using two data structures: Linked List and Dependency Tree
Each cell will have either of the above two mentioned data structure.
->If a cell i has a constant number then it will have a linked list, first
node of list will be i itself and other nodes in list will be the cells which
are dependent on the constant value of cell i.
->If a cell i contains a formula, then it will have a dependency tree with
the cell i being its root node and the child nodes will be the cells on which
i depends. At the bottom would be the leaf nodes that will be the cells
having constant value.
->Algorithm-1 deals with the creation of Dependency Tree and Linked List.
1
IMPORTANT CONSIDERATIONS:
a. Cyclic Dependency are not entertained as they result in Deadlocks.
b. We are using the concept of caching and Multi-Threading, as they are
an existing feature of Operating System and helps enhance searching. They
would be explained in Algorithm-2.
c. Time-Space Complexity: We would find Worst-Case(O) Complexity.
APPROACH:
We would be following three step approach and we have one algorithm for
each of them:
Step-1 : Creation of Dependency Trees and Linked Lists
Step-2 : Finding Dependency Trees that needs to be updated, due to changes
in constant value of cell on which it depends
Step-3 : Update the Dependency Trees
1. Time Complexity :
Step-4 : For loop iterates for m cells. -> Total of m Steps
Step-14 : For each child node it checks that whether Dependency Tree
will form for it or Linked List will be formed.
No. of Trees + No. of Linked List = Atmost of size m. -> m Steps
Step-20 : A Kind of Recursive Function that Goes to Step-10.
Recursion steps << m steps . Thus, atmost m steps for Recursion.
-> Thus, total of m Steps
As above three points take place together, thus we multiply m steps
three times.
Time-Complexity = O(m3 )
2. Space Complexity :
Step-4 : There are total of m cells. -> m units of space required
Steps-9 to Steps-13 : We are Creating Dependency Tree. At maxi-
mum, a Dependency tree can depend on all m-1 cells (excluding itself)
and thus at maximum it will have m nodes in tree (including itself).
2
-> m units of space required to store all nodes of a Dependency Tree.
Steps-16 and 17 : We are Creating Linked List. At maximum, for
a given cell all other cells can depend on it and thus at max it will
have m nodes in linked list (including itself). ->Thus, m units of space
3
3 Algorithm-2 : Finding Dependency Trees that
needs to be updated
In Algo-1 we created Dependency Tree and Linked List for the input File.
Now if any of the constant value of the cell (say of cell io o ) is changed
then we have to update the Dependency Trees that depend on this value
and to do so we have to first find those Dependency Trees and thats what
we do in this Algorithm.
Algorithm 2 Locate Node whose value is changed and find all Dependency
Trees that needs to be updated due to change in constant value of cell io o
1: function Locating-Updates(LinkedList, DependencyT rees)
2: A cell io o is updated
3: if Link List of cell io o is in Cache Memory then
4: Load it in Main Memory
5: else
6: Determine from OS, how many threads are free. Create one
thread for each free processor
7: Distribute all Linked List equally to each thread
8: Each thread parallely searches for Linked List starting with node
io o
9: On finding the Link List with header as io o , load the List.
10: end if . Ending If-Else of Step-3
11: Increament cache counter for node io o . Updating Cache Log
12: for Each node m0 in Linked List of io o do . Find all Dependecy
Trees that contains io o node, as they needs to be updated.
13: Create required number of threads.
14: Distribute all Dependency Trees equally among Threads
15: Use Depth First Search on each tree . To check whether node
m0 is present in tree or not. DFS reaches leaf nodes faster w.r.t. others.
16: if Tree has node m0 then
17: Flag this Dependency Tree as Active . Dependent on io o
18: else
19: Flag this Dependency Tree as P assive . Independent of io o
20: end if . Ending If-Else of Step-16
21: end for . Ending for of Step-12
22: end function . Ending Locating-Updates function of Step-1
4
In Algorithm-2 we have used the concept of Caching. Initially size of
cache memory would be 1/10 of total number of Linked List. This capac-
ity of cache can be changed dynamically on run time as per the need.
Understanding Time and Space Complexity for Algorithm-2
1. Time Complexity :
Step-8 : At most m Linked List, searching takes at most of m Steps
Step-12 : At most m nodes in Linked List. For loop iterates for m times.
Step-15 : DFS takes O(m + edge), Let, edge=m, where =+ve value
Time = Linked List Search(m) + DFS search in tree for all nodes(m2 )
Time-Complexity = O(m2 )
2. Space Complexity :
General : To maintain cache, need atmost m units of space.
Step-4 or Step-9 : At maximum a Linked List can have m nodes if all other
cells are dependent on it. Hence, m nodes reqire m units of space.
Step-15 : At most a Dependency Tree has m nodes if it depends on all other
cells. Thus, m nodes in Tree require m units of space.
Total Space : Cache(m) + Dependency Trees for all Nodes of io o (m2 )
Space Complexity = O(m2 )
5
Output: Modified and Updated Dependency Trees. All the nodes/cells
that dependent on cell io o have been updated.
Time-Complexity = O(m2 )
2. Space Complexity :
Step-4 :At most there are m Active Dependency Trees, if all the cells have
dependency on cell io o .
Also, if a given cell depends on all other cells, then its Dependecy Tree will
have atmost m nodes.
Space = Combining both, we require m m units of space.
the table we see that the Algorithm maintains square units of space require-
ment but may posses cubic time complexity. Also, frequency of usage of
Algorithm-2 and Algorithm-3 would be more then that of Algorithm-1 as
modification and updation takes place more frequently.
Thus, average Time-complexity can be said to be O(m2 )
We have listed Worst-Case Time complexity and have not considered the
advantages of Caching and Multi-Threading while performing search oper-
ations, which we have extensively used in Algorithm-2 and Algorithm-3. In
real-time these two features would greatly reduce Time-Complexity.
At end, Our Algorithm updates only required nodes.