Testing A New Object-Relational Mapping Language For Database-Website Communication
Testing A New Object-Relational Mapping Language For Database-Website Communication
Testing A New Object-Relational Mapping Language For Database-Website Communication
THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE OF ARTIFICIAL INTELLIGENCE AT THE FACULTY OF HUMANITIES AND SCIENCES OF MAASTRICHT UNIVERSITY
Thesis committee: Dr.ir. J.W.H.M. Uiterwijk D. Delfosse, IGLg Dr. M.H.M. Winands A. Waagmeester, M. Sc.
Maastricht University Department of Knowledge Engineering Maastricht, The Netherlands June 2009
ii
PREFACE
This thesis is the result of my research done at the Faculty of Humanities and Sciences of the Maastricht University. It serves to partially fulfill the requirements for the degree of Master of Science in Artificial Intelligence (AI). The subject of research is: testing a new object-relational mapping language for database-website communication. In my thesis, I test LINQ and compare it to previously used technologies.
The problem statement for this thesis is: To what extent can a new object-relational mapping language enhance an application and its development? . I developed some performance- comparison applications to be able to give some results. In terms of development and maintenance time and effort, Ive done an application in a company to be able to measure it.
Many people supported and helped me during the process and there are a few I would like to address especially. First of all, I want to thank my supervisor, dr.ir. Jos Uiterwijk. He believed in my work and his contribution to this thesis has been a great benefit. Futhermore, I am grateful to my parents for their ongoing support. Finally, I want to make a special thanks to my friends and specially to my fiance for their support and help.
iii
iv
ABSTRACT
In our era, databases are totally integrated with computer software. The need to simplify and improve the connection between the software and the databases becomes more and more important. The databases contain a lot of information, it can now contain pictures, files, etc. Databases are used in many computer fields like data mining, web development, etc. To match this new need, database developers have created several new models: relational, hierarchical and network models. To enhance the communication between the database and the software a mapping between objects is obvious. This thesis will deal mainly with LINQ, the new querying language from Microsoft. We will especially focus on LINQ to SQL which is a provider that allows LINQ to be used to query SQL Server databases. In fact, it converts a LINQ query to a SQL query which is sent to the SQL Server for processing. A lot of other providers are available like LINQ to xml which converts to an xml document. Some of these providers will be used in this thesis. The problem statement of this thesis is: To what extent can a new object-relational mapping language enhance an application and its development? . The supporting research questions are: 1. To what extent do new technologies improve the quality of the applications? 2. To what extent do new data languages help the easiness of the data queries? In this thesis LINQ is compared to ADO.NET and a comparison is made to see which is easier and faster. LINQ is easier because some functionalities coming from the development environment are available. In terms of performance, 6 different comparisons are made. The results are promising for reading and inserting data into the database. In fact LINQ is better in the fields where it needs to be better but for example not in datasets. With LINQ, datasets are not as useful as before. LINQ will be replaced by the ADO.NET entity framework. This will be discussed in the future work section. In fact, this framework improves the LINQ to SQL framework. One of the biggest improvements is that the entity framework supports not only SQL Server but also Oracle, MySQL, DB2, etc. LINQ to SQL is a more simple framework and consequently easier to use. In conclusion, LINQ to SQL is not dead.
vi
CONTENTS
Preface ...............................................................................................................................................iii Abstract ...............................................................................................................................................v Contents ............................................................................................................................................ vii List of figures ...................................................................................................................................... ix 1. Introduction ................................................................................................................................ 1 1.1 1.2 1.3 2. The domain......................................................................................................................... 1 Problem statement and research questions ...................................................................... 1 Outline of the thesis ........................................................................................................... 2
Related work: The .NET framework ............................................................................................ 5 2.1 2.2 Data retrieval...................................................................................................................... 5 Previous database queries in .NET ..................................................................................... 5
3.
Solution approach and methodology: LINQ ............................................................................... 9 3.1 3.2 3.3 3.3.1 3.3.2 3.3.3 3.3.4 3.3.5 3.4 What is it? .......................................................................................................................... 9 What is it replacing exactly? ............................................................................................ 10 What are the improvements? .......................................................................................... 10 Performance ................................................................................................................. 10 Security ......................................................................................................................... 12 Development time and effort ...................................................................................... 15 Maintenance time and effort ....................................................................................... 26 Stability ......................................................................................................................... 28 What are the negative aspects of LINQ? ......................................................................... 31
4.
Results ....................................................................................................................................... 35 4.1 4.2 Research: tests and results .............................................................................................. 35 Analysis of the results ...................................................................................................... 42
5.
Conclusions and future work ................................................................................................... 45 5.1 5.2 Conclusions ...................................................................................................................... 45 Future work ...................................................................................................................... 46
vii
viii
LIST OF FIGURES
Figure 1 : Example of a 3-tier model for a web application ............................................................... 6 Figure 2 : LINQ .................................................................................................................................... 9 Figure 3 : The thorough search from the previous application ....................................................... 16 Figure 4 : The administration side of the previous application ....................................................... 17 Figure 5 : The thorough search from the new application .............................................................. 17 Figure 6 : The administration side of the new application .............................................................. 18 Figure 7 : Establishing a connection between the database and the application ........................... 20 Figure 8 : Retrieving data from the database .................................................................................. 20 Figure 9 : Inserting data into the database ...................................................................................... 21 Figure 10 : Updating the database ................................................................................................... 22 Figure 11 : Deleting records from the database .............................................................................. 23 Figure 12 : Executing a stored procedure which returns records .................................................... 23 Figure 13 : The DBML file ................................................................................................................. 24 Figure 14 : Types of maintenance .................................................................................................... 26 Figure 15 : Changing tracking in stateless environments ................................................................ 32 Figure 16 : No disconnected client side tracking ............................................................................. 33 Figure 17 : Inserting data using the stored procedure InsertIntoCustomer ................................. 36 Figure 18 : Directly inserting data in the customer table ................................................................ 37 Figure 19 : Reading data from the customer table .......................................................................... 38 Figure 20 : Reading an xml file ......................................................................................................... 39 Figure 21 : Using LINQ and C# with objects ..................................................................................... 40 Figure 22 : Dataset operations ......................................................................................................... 41 Figure 23 : A gridview ....................................................................................................................... 42
ix
Chapter 1
1. INTRODUCTION
In this Chapter, the domain will be explained. The problem statement and the research questions will be presented. The last section will give a short outline of the thesis.
1.1
THE DOMAIN
Since the mid-1990s, web development has been one of the fastest growing industries in the world. In 2005, there were over 30,000 web development companies in the United States. This industry is very promising with an expected growth of 20% by 2010. This is possible thanks to large businesses. They wish to sell products and services to their customers and to automate business workflow with the web developments (1). There are different types of websites. The two types are: static and dynamic websites. A static website is one that has web pages stored on the server in the same form as the user will view them. This type of websites usually displays the same information to all visitors. In fact, the users are not able to control the information they get. There is no real interaction between the user and the website. A dynamic website is one that does not have web pages stored on the server in the same form as the user will view them. In fact, the pages change depending on certain criteria. These criteria are predefined or based on a user input. A dynamic web page is generated on the fly by piecing together certain blocks of code, procedures or routines. A dynamic-generated page takes the information from the database and puts everything together in a predefined format to present to the visitor a web page. There is an interaction with the user. It means that the web site adapts its content depending on the visitor (2).
1.2
As evident from the previous section, the layer between the web page and the database is very important. If this layer is faster and works better, the influence on the application is substantial. The problem statement is then obvious:
To what extent can a new object-relational mapping language enhance an application and its development?
Testing a new object-relational mapping language for database-website communication Of course, it is totally impossible to test every object-relational mapping language. This thesis will then focus on one specific object-relational mapping language: LINQ. To evaluate LINQ, many tests will be made. These tests will provide an answer to the problem statement. The following two research questions will provide partial answers to the problem statement.
RQ1: To what extent do new technologies improve the quality of the applications?
This first question will determine if the quality of the application is enhanced thanks to the new object-relational mapping language. To quantify the quality of the application, we will first evaluate the performances. Second, we will evaluate the security. Finally, we will evaluate the stability of LINQ.
RQ2: To what extent do new data languages help the easiness of the data queries?
The second research question will focus on the easiness. A comparison will be made with a previous object-relational mapping language. First, the development time and effort will be evaluated. Second, the maintenance time and effort will also be evaluated. The research questions focus on the development of an application done for a company. This thesis will help the company to decide whether they will use the new object-relational mapping language. This report will explain: To what extent is the new framework easier? To what extent is it faster in terms of performance (speed of the application)? To what extent is the new development software a better support for the developers? To what extent is it possible to develop a framework in a faster way? What is the importance of the new .net components?
1.3
This section will present the reader with a short overview of this thesis.
Chapter 2: In this chapter, an introduction to object-relational mapping is made. The topic is so huge that it is necessary to explain the story of data retrieval and database queries in .NET. Chapter 3: In chapter 3, an introduction to LINQ and especially to LINQ to SQL is presented. Then, an explanation is given about what LINQ is replacing exactly. Finally, an enumeration of the improvements and bad aspects of LINQ is made. Chapter 4: In the fourth chapter, the results of the experiments are presented. To show these results some comparison graphics were made with a previous object-relational mapping language. Chapter 5: In this chapter, conclusions are made. Moreover, suggestions for future research will be given.
Chapter 2
2. RELATED WORK: THE .NET FRAMEWORK
The Microsoft .NET framework is a software framework available with several Microsoft Windows operating systems. In 2002, the web application framework, ASP.NET, was developed by Microsoft. With the grows of web applications and the importance of them, connecting a database with a web application became very important (3).
2.1
DATA RETRIEVAL
In previous applications, data retrieval was simply done by querying the data source. In the early 90s, object oriented programming became part of the programmers mind. The problem was that the type systems from the databases or other data sources were different from the object oriented languages. A type system defines how a programming language classifies values and expressions into types, how it can manipulate those types and how they interact. Therefore, it was needed to build a data-access layer to map the database objects onto the program objects. These object mapping frameworks were called ORMs (object relational mappings). ORM is a method for designing and querying database models, at a business level which non-technical users can understand. In an E-R model, we never introduce calculated attributes to respect modeling rules, for example a bonus for an employee calculated on base of the length of service. Entity-relationship modeling is a database modeling method, used to produce a type of conceptual schema or semantic data model of a system and its requirements in a top-down fashion. Using an object relational model provides a new layer between the database and the application. This layer permits to add calculating fields like the bonus field explained before. The ORM helps to convert a database type into an object language type such as c#.net or vb.net.
2.2
In this section, an overview about applications without LINQ is presented. An explanation will be given about what kind of models are typically used for classic companys applications. Without this short explanation, It will be difficult to understand LINQ.
In a commonly used model for web application there are 3 different tiers. This is the 3-tier model, the most well-known model to build applications. This model is not much different from a winform application. The differences are in the presentation tier. Because this thesis is about LINQ in ASP.NET 3.5, we will only focus on the web model (See Figure 1). The business and data tiers are the most important tiers for this thesis. In the business tier, there are 2 main parts: the business objects and the data-access layer. Here, LINQ replaces
both of them, it creates automatically business objects. There is no need to think about it
anymore. In previous ASP.NET web applications, it takes many time to create business objects, ever if it is not an obligation. Using business objects, it is easier for possible modifications of an application. In the last tier, we see different types of data storage like XML files and databases. LINQ was built to use every type of data storage in the same way. Previously, when modifications were made on the data storage type then modifications on the code were necessary. LINQ works also with stored procedures and Extensible Stylesheet Language Transformations (XSLT). XSLT is an XML-based language used for the transformation of XML documents into other XML or "human-
Testing a new object-relational mapping language for database-website communication readable" documents. In the past years, the use of stored procedures was spread in the company world. While using them, the security is increasing and LINQ still allows to use them (4).
Chapter 3
3. SOLUTION APPROACH AND METHODOLOGY : LINQ
This chapter is the main part of the thesis. An explanation about LINQ, LINQ to SQL and the utility of both will be given. LINQ to SQL was developed to improve the speed and easiness of database queries. This chapter will explain in which fields LINQ to SQL is really making improvements and in which not.
3.1
WHAT IS IT?
LINQ is a querying language. LINQ uses SQL queries to get the data from the database. Then, possible improvements in terms of performance can be made on the generation of the SQL code. It replaces previous tools (some libraries) with only one tool. In previous applications, it was needed to use different tools to retrieve data from xml files, databases, etc. Now with LINQ querying a database or a xml file is not much different (5). In Figure 2, an overview of LINQ is showed. From the figure, it is easier to understand what LINQ really is. It is the layer between the data and the code. One can use LINQ with several computer languages, obviously only .NET languages. In fact LINQ does not change anything to the .NET framework, it is only a new tool.
C#
VB
Others
LINQ To Objects
LINQ To Datasets
LINQ To SQL
LINQ To Entities
LINQ To Xml
Objects
FIGURE 2 : LINQ
Relational
XML
Testing a new object-relational mapping language for database-website communication This thesis will focus on LINQ to SQL because we will mainly talk about relational databases and compare it to previous technologies and ORMs.
3.2
To show exactly what LINQ to SQL is doing, It is easier to show an example. using (NorthwindDataContext dc = new NorthwindDataContext()) { dc.Log = Console.Out; var categories = from c in dc.Categories where c.CategoryName.StartsWith("B") select c; foreach (var c in categories) { Console.WriteLine(c.CategoryName); } } This is C# 3.0 code where a simple query is made on a database table (Categories) searching for every category name starting with the letter B and then showing the results in the console. LINQ then automatically will generate the following SQL statement : SELECT [t0].[CategoryID], [t0].[CategoryName], [t0].[Description], [t0].[Picture] FROM [dbo].[Categories] AS [t0] WHERE [t0].[CategoryName] LIKE @p0
3.3
3.3.1 PERFORMANCE
Stored procedures
10
Testing a new object-relational mapping language for database-website communication First of all, in this section, there we be a small explanation about the utility of stored procedures. Four different reasons to use them will be given. At the end of this section, the use of stored procedures with LINQ to SQL will be discussed. Especially, an emphasis on the performance will be presented. First, for maintenance, stored procedures are easier to troubleshoot because there is no need to delve into the GUI code. Second, to tune a stored procedure, there is also no need to delve into the GUI code. Furthermore, when opening the database management software, it is possible to tune it in a easy way. Third, It is easier to call a stored procedure from the GUI code than to type the SQL code. Moreover, the code will be more readable because the code is separated in different tiers. This helps the database administrator working in a big development team so he can focus on the database. Finally, at a security level, stored procedures are also better but this point will not be discussed. However, consider the following SQL code that might appear in 2 separate stored procedures:
INSERT INTO [Categories] ([CategoryName], [Description] ,[Picture]) VALUES (@CategoryName, @Description, @Picture) And the 2nd sproc: INSERT INTO [Products] ([ProductName],[SupplierID],...,[Discontinued]) VALUES (@ProductName,@SupplierID,...,@Discontinued,)
Each time a new category is added, 15 new products are also added.. Then 16 separate database calls are being required. In industrial applications, this will slow down the applications queries. In LINQ, this will be batched into 1 single database call. Lazy loading First of all, lazy loading is a design pattern. It is used to defer initialization of an object until the point at which it is needed. In LINQ, this means that properties of an object can be loaded on demand from the database when they are used. It simplifies the coding but obviously for performance this is not good at all.
11
Testing a new object-relational mapping language for database-website communication Consider the following code: using(NorthwindDataContext dc = new NorthwindDataContext()) { var categories = from c in dc.Categories select c; decimal categoryTotal; foreach (var category in categories) { categoryTotal = 0; foreach (var product in category.Products) { if (product.UnitPrice.HasValue) categoryTotal += product.UnitPrice.Value; } Console.WriteLine(categoryTotal); } }
In this case, there will be a call to the database for every single category. For example if there are 12 categories, this means 13 calls, one to retrieve the categories, then one again per category to retrieve its products. The advantage of LINQ is that it is possible to enable or disable the lazy loading.
3.3.2 SECURITY
Stored procedure
Jeff Atwood said something interesting about stored procedures (6): Stored procedures can be individually secured within the database. A client can be granted permissions to execute a stored procedure without having any permissions on the underlying tables.
12
Testing a new object-relational mapping language for database-website communication Stored procedures are also very powerful against SQL injection. It is a code injection technique that exploits a security vulnerability occurring in the database layer of an application.
SQL connection
In this case the security is not different from ADO.net. Here is an example:
In this example, the Integrated security is used so there is no need to use the password in the connection string. Using the password in the connection string should be avoided because it increases the security risks (7).
Application architecture security concerns In LINQ to SQL, a lot of code is generated. It is important to know how it works with respect to security. For example with the AdventureWorks database, provided by Microsoft, the product class generated by the LINQ to SQL designer will generate this code after the class definition (8):
#region Extensibility Method Definitions partial void OnLoaded(); partial void OnValidate(System.Data.Linq.ChangeAction action); partial void OnCreated(); partial void OnProductIDChanging(int value); partial void OnProductIDChanged(); partial void OnProductNameChanging(string value); partial void OnProductNameChanged(); partial void OnSupplierIDChanging(System.Nullable<int> value); partial void OnSupplierIDChanged(); partial void OnCategoryIDChanging(System.Nullable<int> value); partial void OnCategoryIDChanged(); partial void OnQuantityPerUnitChanging(string value); partial void OnQuantityPerUnitChanged(); partial void OnUnitPriceChanging(System.Nullable<decimal> value); partial void OnUnitPriceChanged();
13
Testing a new object-relational mapping language for database-website communication partial void OnUnitsInStockChanging(System.Nullable<short> value); partial void OnUnitsInStockChanged(); partial void OnUnitsOnOrderChanging(System.Nullable<short> value); partial void OnUnitsOnOrderChanged(); partial void OnReorderLevelChanging(System.Nullable<short> value); partial void OnReorderLevelChanged(); partial void OnDiscontinuedChanging(bool value); partial void OnDiscontinuedChanged(); #endregion
These are the partial methods. They enable class designers to provide method hooks. They are similar to event handlers, that developers may decide to implement or not. They will be called when it is necessary. The datacontext is the source of all entities mapped over a database connection. It has various methods defined:
#region Extensibility GetHealthy Definitions partial void OnCreated(); partial void InsertCategory(Category instance); partial void UpdateCategory(Category instance); partial void DeleteCategory(Category instance); partial void InsertProduct(Product instance); partial void UpdateProduct(Product instance); partial void DeleteProduct(Product instance); partial void InsertCustomer(Customer instance); partial void UpdateCustomer(Customer instance); partial void DeleteCustomer(Customer instance); partial void InsertOrder(Order instance); partial void UpdateOrder(Order instance); partial void DeleteOrder(Order instance); partial void InsertOrder_Detail(Order_Detail instance); partial void UpdateOrder_Detail(Order_Detail instance); partial void DeleteOrder_Detail(Order_Detail instance); #endregion
To make the code working, it is needed to put the code into partial classes which exist in the same project. Below we give an example for the product class:
14
namespace WhyLinQ { public partial class Product { partial void OnValidate(System.Data.Linq.ChangeAction action) { //do some validation logic here } } }
What does it mean? It means simply that the validation code will be in the same layer as the database access code.
During my company internship an applicants manager program was developed. We decided to use LINQ, simply to test it and to investigate if it is faster to develop a 3 tiers application using LINQ or not. The application consist among others of the following components: The list of applicants with a search bar A competence search A thorough search (explained later) Availability search Administration area Multi-languages
15
For the database access, the most difficult part is the thorough search because it is a very hard search, an adjustable one. An explanation of what was done with LINQ and ASP.NET will be given later. Another difficulty was that the application needs to be a multi-language one. As a consequence the competences, status, title, country need to be translated in every language used in the website. A previous application was built in 15 weeks in ASP.NET. The application uses ADO.NET for the database access. Not all the requirements of the first specification were taken into account when this application was developed. Some functionalities were not built. The thorough search is very hard. In the previous application, it was very difficult to understand how to use it. It was also very hard for the developer to program it. Moreover, in the previous application, this search is not working 100%. Company employees estimated that the search works only with simple requests without using the real potential of it (see Figure 3).
Furthermore, in the administration side, it is possible to add status, titles, countries, competence, etc. For a fully multi-language application, it is needed to manage the fact that
16
Testing a new object-relational mapping language for database-website communication the company can add new available languages. In the previous application, only French and Dutch were available (see Figure 4).
The idea about the new application was to totally redevelop every functionality, but now using LINQ. The thorough search and multi-language functionality were totally built in another way. The application is now faster and also has a better look and feel (see Figure 5 and Figure 6).
17
For the thorough search, a new look and feel was developed to make it easier for the user. On the Rechercher button event, there are some LINQ queries. Here is an example:
18
Testing a new object-relational mapping language for database-website communication Dim query = From qualif In cxt.Candidats_Qualifications _ Join cand In cxt.Candidats On cand.CandidatID Equals qualif.CandidatID _ Join stat In cxt.Statuts On cand.StatutID Equals stat.StatutID _ Where qualif.QualificationID.ToString() = DDL_Qualifications.SelectedValue _ And stat.CodeLangue = Session("CodeLangue").ToString() _ Select New With {.StatutID = cand.StatutID, .Compl_Disponibilite = cand.Compl_Disponibilite, .CandidatID = cand.CandidatID, .Statut = stat.Statut1, .Nom = cand.Nom, .Prnom = cand.Prnom, .Localite = cand.Localite}
It looks like a typical SQL query but using objects. A query in LINQ is very easy because it is easy to understand, easy to write and easy to use objects. The third point is very important. Now, it is possible to use a session variable CodeLangue. In websites development, it simplifies everything. If there is a need to compare a cookie value or http cache variable to a variable of a query, it can be done by writing the database query with LINQ. 3.3.3.2 Q UERIES COMPARISON BETWEEN LINQ AND ADO.NET
The next section will discuss the time and effort while using LINQ compared to ADO.NET. The examples used to compare are given in Figures 7-12. The given code snippets will be compared in term of time and effort. Time Mapping a connection (see Figure 7) is easier because it is not necessary to give the whole connection string. It is possible to store the connection string in the web.config file for the ADO.NET. But, there is still the need to use a function to get it.
19
When inserting data (see Figure 8), it is always needed to make a connection with the database. In ADO.NET when getting the data, it is needed to open the connection, to create a SQL command with the SQL query as a parameter, to create a reader object, to create an object to store the data and finally to load the data into the object. With LINQ you can use the intellisence, being Microsoft's implementation of autocompletion. It helps to get the tables and columns in an easier way. Before when a SQL query was made with ADO.NET, there wasnt any intellisence.
20
In ADO.NET, inserting data is easier (see Figure 9). The insertion is as follows: open the connection, create a command object with your SQL query as a parameter and execute it. With LINQ, it is needed to create an object (here customer) and set the properties of the object. After the object is created, the insertion and commitment need to be done.
Generally in ADO.NET, when data from a database is updated (see Figure 10), it is needed to play with quotes to update for example a date: UPDATE SET ModifiedDATE = + DateTime.now.ToString() + (In VB.NET) This kind of queries takes time. In LINQ, it is possible to update an element by getting it and then modifying the properties and submitting the element to the database.
21
In term of time, deleting data (see Figure 11) is approximately taking the same time. In ADO.NET, it is needed to create the SQL query and select which data need to be deleted. With LINQ to delete data, it is needed to get it and then call the function DeleteOnSubmit and then submit the changes.
22
Without the intellisence, in ADO.NET, it is a bit annoying to call stored procedures. 6 lines are needed to call a stored procedure (see Figure 12). In comparison, LINQ only needs one line to call the procedure and to give the parameters.
23
Effort In term of effort, the connection is very easy with LINQ (see Figure 7). The connection string is registered in the web.config file when the dbml file is created with the designer. In this file, it is possible to add the database tables wanted to be used in the project. In the designer mode, it is possible to manage the stored procedures. The dbml file is like in Figure 13.
With LINQ, it is possible to use a var type (see Figure 8). The var keyword tells the compiler to infer the type of the variable from the static type of the expression used to initialize the variable. It is then easy to use the var type for LINQ queries. In the figure, the ADO.NET insertion looks easier (see Figure 9). But some insertions can be very complex, like this:
24
SELECT CASE WHEN SOURCE='S1' THEN VALUE ELSE NULL END AS "COL1", CASE WHEN SOURCE='S2' THEN VALUE ELSE NULL END AS "COL2", CASE WHEN SOURCE='S3' THEN VALUE ELSE NULL END AS "COL3" FROM ( SELECT 'S1' AS "SOURCE",ID,name,value FROM S1 WHERE value>=10 union SELECT 'S2' AS "SOURCE",ID,name,value FROM S2 WHERE value>=200 union SELECT 'S3' AS "SOURCE",ID,name,value FROM S3 WHERE value>=3000 );
This kind of insertion can become more complex. For example, adding a data conversion or adding 2 months to a date. With LINQ, it is possible to use the C# or VB.NET code to do a such thing in a very easy way. As in the previous figure, updates can be very complex in SQL (see Figure 10). If somebody does not really know the functions of the used database like the todays date function, it takes some effort to learn them. With LINQ, you use what you know, being the developer of the application means to know the developing language. In the figure, to get the todays date with C# is done by simply doing this: DateTime.Now. Writing a query in the presentation tier can be very hard, for example when 20 tables and 400 columns with their types are not known by the developer. If it is needed to use them, it is annoying to change each time of windows to watch the types and columns names. With intellisence, there is no need any more to do that because it is possible to easily find the tables and the columns with their type without switching between Visual Studio and SQL Server Management Studio. What is told in the section for Figure 10 is also true for the 2 previous figures (see Figure 11). With LINQ, it is easy to understand how easy it is to call a stored procedure (see Figure 12). In term of effort, it is really pleasant. When the dbml file is well done, LINQ is really fast to code.
25
Corrective maintenance
Preventive maintenance
Adaptive maintenance
Perfective maintenance
Corrective maintenance Corrective maintenance is the modification of the software to correct discovered problems.
26
Testing a new object-relational mapping language for database-website communication In Visual Studio, it is possible to use the debugger, also with your LINQ code. Then it is easier to correct the bugs found. With ADO.NET, if a problem is coming from a query then it is impossible to debug it in Visual Studio. You need to copy/paste the query to SQL server management studio and to test what is wrong in the query. This is also more difficult for programmers who are not expert with SQL because finding errors in a SQL query can be really complex. With LINQ, it is possible to watch the values of the objects and then it is easier to find the error. Adaptive maintenance This type of maintenance is used to change software and adapt it to a new environment (a changed or changing one). With LINQ there is a separation between the data, the business and the presentation layer. Then, it is easier to adapt some part of the application to new requisites. For example, if it is needed to store not only the data of a modification but also the time, it can be done by modifying the column in the database. There is no need to update the business layer because it is done automatically by LINQ when reloading the table. Something else needed is to save the time where the data is saved. When a software is divided in tiers, it is faster and easier to adapt it. If it is a big application, where each tier will be developed by a group of persons, then a small modification takes less time. Perfective maintenance As explained in the above section, gaining time is really important. With really huge applications using huge databases some queries can take much time. Is it easy to improve after delivering the software? This section presents the perfective maintenance which means to perfect a software after delivery to improve performance or maintainability. To improve the database is easy due to the fact that the application is divided in layers. If there is a need to modify a query to make it faster, LINQ allows to do it easily. The only need is to find the query in the code and to modify it. Obviously, it is possible to use the intellisence like when we are building a new query. Indeed, if changes are made on a query and gets for example less data, it is also needed to change the webforms to match the data.
27
Testing a new object-relational mapping language for database-website communication Preventive maintenance Prevention means to prevent something to happen. How is it possible with maintenance? Preventive maintenance is the modification of the software after delivery to detect and correct latent faults in the software before they become effective faults. How is it possible to detect latent faults? A method is to perform regular checks on the product to test it also after delivery. Then, it is possible to find these faults and to correct them before the customer discovers these faults. How is it possible to correct them with LINQ? Simply like in the previous sections. The need is to find these faults in the code to modify the query for example if it is coming from the query. Something also very important when making some modifications on the code is to perform after other tests to see if the whole software is not suffering from other faults stemming from correcting one. In conclusion for this maintenance part, maintenance is really easy and fast with LINQ due to the layers fact. Cutting an application in layers is also possible without LINQ but LINQ automatically creates the layers. This is done by LINQ because the business layer is generated with the dbml file.
3.3.5 STABILITY
In this section, the stability of LINQ will be discussed. This part will focus on LINQ and its future, the produced SQL and the product in its integrality.
LINQ and its future The future of LINQ is not very brilliant. During the October PDC (Professional Developers Conference), Visual Studio 10 and the .NET framework 4.0 were announced. They talked about the investment they made on the Entity Framework in the .NET 3.5 SP1. With the Entity Framework, developers were able to build more advanced scenarios and to use LINQ against any database, this includes: SQL Server, Oracle, DB2, MySQL
28
Testing a new object-relational mapping language for database-website communication They recommend also the Entity Framework as the recommended dataaccess solution for LINQ to relational scenarios. They said that they will only evolve LINQ to SQL based on the feedback they receive from the community.
The produced SQL In this section, to understand better what is the produced SQL, examples will given. The produced SQL comes from the code given in the Development Time and Effort section. The first example is the SQL generated by the selection made in LINQ: SELECT [t0].[CustomerID], [t0].[PersonID], [t0].[StoreID], [t0].[TerritoryID], [t0].[AccountNumber], [t0].[rowguid], [t0].[ModifiedDate] FROM [Sales].[Customer] AS [t0] -- Context: 3.5.30729.1 SqlProvider(Sql2008) Model: AttributedMetaModel Build:
A second example is the SQL generated by the insertion with a stored procedure: EXEC @RETURN_VALUE = [Sales].[uspInsertIntoCustomer] -- @RETURN_VALUE: Output Int (Size = 0; Prec = 0; Scale = 0) [Null] -- Context: 3.5.30729.1 SqlProvider(Sql2008) Model: AttributedMetaModel Build:
In both examples, the SQL generated is very simple. It also looks at the context to know the version of the SQL Server database and depending on it, it uses the possibility given by the version.
The product in its integrality Is the product stable? This depends a lot of the graph consistency. In mathematics when nodes are connected together, the network made by these connections is called a graph. Here, we have exactly the same with the
29
Testing a new object-relational mapping language for database-website communication classes when they are connected. The health of the graph is vitally important to the stability of the application. LINQ to SQL does not automatically manage consistency of relationship references but it provides several mechanisms to make work easier. It also makes a pattern to follow to make sure that we are managing our references correctly. Entity classes generated by the code generation tool will automatically implement the correct patterns. Here is an example:
public class Customer() { this._Orders = new EntitySet<Order>( new Action<Order>(this.attach_Orders), new Action<Order>(this.detach_Orders));
); }
This is how a customer property on an order instance is automatically changed when an order is added to the customers collection.
[Association(Name="FK_Orders_Customers", Storage="_Customer", ThisKey="CustomerID")] public Customer Customer { get { return this._Customer.Entity; } set { Customer v = this._Customer.Entity; if (v != value) { if (v != null) { this._Customer.Entity = null; v.Orders.Remove(this); } this._Customer.Entity = value; if (value != null) { value.Orders.Add(this); } } } }
30
Testing a new object-relational mapping language for database-website communication When the customer property is being changed, the order instance is first removed from the current customers order collection and it is added later. To conclude this part, it is really important to maintain the consistency of the graph. If the in-memory graph is inconsistent, then a run-time exception is generated when the SubmitChanges method is called, submiting the change to the database. Using the generation tool, it can be easier to maintain the consistency of the graph. Sometimes to maintain the application it is necessary to change the dbml file manually without using the generation tool. This can cause inconsistencies and then generate exceptions.
3.4
In this section, the negative aspects of LINQ are presented. These negative aspects will be divided into 2 parts: bad and ugly (10).
The bad o Changing tracking in stateless environments In LINQ, the datacontext is in charge of the change tracking. Change tracking in SQL Server 2008 enables applications to obtain only changes that have been made to the user tables, along with the information about those changes. Change tracking should be stateless (see Figure 15).
31
Understanding when queries are executed and how LINQ does many things automatically. It makes it difficult to understand how it works exactly. But during execution and thanks to the debugger, it is possible to understand when the query is executed and how.
Lazy vs. Eager Loading Eager Loading is the opposite of lazy loading (see 3.3.1). With LINQ, eager loading is done only at the datacontext and this affects all queries.
32
Testing a new object-relational mapping language for database-website communication The ugly o No disconnected client side tracking! If it is needed to have a disconnected client side tracking, it is possible to build one. This is a real disadvantage for multi-tiers architectures (see Figure 16).
Works only with SQL Server LINQ works only with SQL Server. Obviously, if needed, it is possible to write a provider for another database but it is hard and takes many time.
33
Testing a new object-relational mapping language for database-website communication o Competition with ADO.NET Entity Framework The ADO.NET Entity Framework is part of the ASP.NET 3.5 SP1 framework and many persons say that it is the LINQ killer (11). Why? These are the 3 main points : Will have other providers like Oracle Mapping single class to multiple tables Bigger choice as datasources The ADO.NET Entity Framework will be discussed in the Future Work (see Section 5.2).
34
Chapter 4
4. RESULTS
This is the most important part of the thesis. Here, some graphics and some numbers to explain the improvements made by LINQ in the data-retrieval field will be given. Each graphic will be explained in the next section. The results are analyzed to get some conclusions from them.
4.1
In this section, all tests were made with the AdventureWorks database given as a sample database from Microsoft. The mostly used table for the tests is the customer table which contains 19185 rows.
Figures 17-22 show the performances of LINQ. For every figure, the code is executed 500 times to get the best performance values. The x-axis shows the 500 executions. The y-axis shows the number of ticks. The number of ticks depends on the computer processor. In this case, 14,318,180 ticks is one second. Then, the best performance is when the number of ticks is the lowest. In Figure 17, it is very interesting to see that LINQ is faster than ADO.NET using stored procedures when inserting data. For ADO.NET, the mean is 273.582 and for LINQ, it is only 95.76, so 2.86 times faster. There are some high peaks. This is due to the processor of the computer which is not important in our case. Probably, on a database server, these peaks will disappear.
35
Like in Figure 17, in Figure 18, LINQ also outperforms ADO.NET for direct inserting. What does direct insert mean? Simply using direct SQL statements. In LINQ, it can be done like this:
db.ExecuteCommand("insert values(10)"); into sales.customer(TerritoryID)
In this example, a SQL query is directly executed like in ADO.NET. In Figure 18, LINQ is still faster. The mean for LINQ is 96.04 and for ADO.NET, it is 249.67, so LINQ is 2.56 times faster.
36
301 ADO
401
In Figure 19, ADO.NET is doing better than LINQ for reading data but on average the difference is only 1.30%. Reading data with LINQ is like using datasets with ADO.NET. The mean for LINQ is 20,853 and for ADO.NET, it is 20,583.
37
301 ADO
401
To read an XML file with LINQ, it is like reading a table of a database. In F igure 20, the performance is great for LINQ after some reading. Naturally in larger applications we dont read an xml files 500 times but many times more. In that case, these performances are great for LINQ. The change of performance in the figure is not explainable. Here is an example to show how simple the code is:
XDocument xmlReferance = new XDocument(); xmlReferance = XDocument.Load("c:\\LinQInsertDirect.xml", LoadOptions.PreserveWhitespace); var linQresult = from xmlRecords in xmlReferance.Descendants("LinQInsertDirect") select new { NewPass = Convert.ToInt32(xmlRecords.Element("Pass").Value.Trim()), NewValue = xmlRecords.Element("Value").Value };
38
Testing a new object-relational mapping language for database-website communication In terms of performance, the mean for LINQ is 381.76 and for C#, it is 315.81, so C# is 1.2 times faster.
301 C#
401
In Figure 21, we compare two LINQ methods with C# using objects. LINQ 1 reads values from an array using plain LINQ. It creates a squared subset of even numbers and then displays the sum of the subset. LINQ 2 does exactly the same but uses the C# 3.0 aggregates. The aggregate function applies an accumulator function over a sequence. Here is the code of the C# objects:
int[] nums = new int[1000]; for (int i = 0; i < 1000; i++) { nums[i] = i + 1; } int[] numsEven = new int[500]; int evencounter = 0; double resultaddition = 0; for (int j = 0; j < 1000; j++) { if (nums[j] % 2 == 0) { numsEven[evencounter] = nums[j] * nums[j];
39
In Figure 21, the mean for C# is 2.15 and for LINQ 1, it is 3.03, so C# is 1.41 times faster. Surprisingly for LINQ 2, the mean is 6.588. We cannot explain this phenomenon. On internet, many tests show that LINQ2 is faster than C#, as expected.
C# vs LINQ: Objects
50 45 40
35
30 25
20
15 10 5
0
1 101 C#
FIGURE 21 : USING LINQ AND C# WITH OBJECTS
201 LINQ 1
301 LINQ 2
401
In Figure 22, ADO.NET is always above 30,000 while in Figure 19, LINQ never was above 30,000. With the datacontext of the LINQ reading query, it is possible to query the whole database without the need to make a new connection with the database. With LINQ, putting all the data in a dataset is not very useful. In fact, datasets are used for gridviews, etc., to show the data. Instead of using the dataset, it is possible to use directly the function ToList() which is much easier. Knowing this these results are less important. In
40
Testing a new object-relational mapping language for database-website communication this case, the mean for LINQ is 35,716 and for ADO.NET, it is 32,327, so ADO.NET is 1.1 times faster in this test (see Figure 22).
45000
40000
35000
30000
301 ADO
401
Here an example of what a gridview is and how easily it is to use the datacontext to get the data (see Figure 23).
GridView1.DataSource = cxt.Filter(Nothing, Session("CodeLangue").ToString()).ToList() ; GridView1.DataBind() ;
41
FIGURE 23 : A GRIDVIEW
4.2
In this section, the results are analyzed. The analysis will be short and only from a performance point of view. In the previous section, many comparisons were made to test the performance of LINQ. To insert data, LINQ is 3 times faster than ADO.NET while using stored procedures. In industrial applications, we generally use stored procedures, so it is very significant. In fact because the applications are divided in tiers, the database developer uses a lot of stored procedures. For the direct insert, LINQ is still approximately 3 times faster. This result is less important than the previous one. Direct insert is less used. From these two results, we know that the improvement is made not for the SQL generation but from the connection between the database and the application. To read data, LINQ loses 7% against ADO.NET but the reading cannot be really compared like this. In fact, in the datacontext when the read query is made, all tables from the dbml file are loaded. This means that if we need to get data from another table after the reading of a certain table, it is possible to do this without creating a new connection. For applications where we read data from many tables, one table after another, this is a huge performance improvement. Reading XML files with LINQ is really easier. This is due to the fact that there is no need to learn a new computer language. LINQ to XML in many cases overlaps the functionalities of the previous used computer languages for XML files (Xpath and XSLT). It aims to provide a superior developer experience, but existing code will continue to work (12).
42
Testing a new object-relational mapping language for database-website communication Obviously, using C# with objects is more powerful than using LINQ. However , the results are very interesting. Without using aggregates, LINQ is only 33% slower than C#. On internet, it is possible to find some comparisons with aggregates. In these comparisons LINQ with aggregates always outperforms plain LINQ. In this thesis, LINQ with aggregates is 300% slower than ADO.NET. Probably, it reacts differently on computers. Then, we can conclude that on servers, LINQ with aggregates is faster (13). The last figure of the results is the dataset operations. Datasets are not useful when using LINQ. It is possible to directly use the data from the datacontext instead of loading everything in the dataset and then using it.
43
44
Chapter 5
5. CONCLUSIONS AND FUTURE WORK
In this chapter, the answers to the research questions and the problem statement will be provided.
5.1
CONCLUSIONS
RQ1: To what extent do new technologies improve the quality of the applications?
Section 3.3.1 and chapter 4 show that in terms of quality the applications are improved. In term of performance, this means that the applications gain time when accessing the database. This influences directly the quality of the application. With the speed improved, the employee using the application needs less time to do something. The second research question was formulated as follows:
RQ2: To what extent do new data languages help the easiness of the data queries?
Section 3.3.3 and section 3.3.4 show that in terms of easiness for data queries, this new data language revolutionizes a lot. Developing an application costs much time, effort and money. Then, with the improvement of the easiness when building data queries, an improvement is made for these 3 things. As stated in section 1.2, the problem statement is as follows:
To what extent can a new object-relational mapping language enhance an application and its development?
45
As explained for the research questions, this new object-relational mapping language improves the quality of the applications and also the easiness of the data queries. Improving the quality of an application means that the whole application is enhanced. And, thanks to the improvement on the easiness for data queries, the development is faster and easier.
5.2
FUTURE WORK
In the field of object relation mappings a lot of research can still be done. There is a future work for ADO.net entity framework. In comparison with LINQ to SQL, this framework is designed to support many databases. DB2, MySql, Sybase and Oracle are supported. The mapping is also made by single class to multiple tables instead of single table for LINQ to SQL. The data sources are expanded to replication, reporting services, business intelligence, integration services, etc (11) (14).
46
BIBLIOGRAPHY
1. Connolly, Dan. A Little History of the World Wide Web. W3C. [Online] 2000. http://www.w3.org/History.html. 2. Okin, J. R. The Information Revolution. s.l. : Ironbound Press, 2005. ISBN 0976385740. 3. MacDonald, Matthew and Szpuszta, Mario. Pro ASP.NET 2.0 in C# 2005. s.l. : Apress, 2005. ISBN 1-59059-496-7. 4. Soldati, Yannick. Developing the Content Management website of Sylis. Seraing : s.n., 2007. 5. Microsoft. LINQ. .NET Framework Developer Center. [Online] http://msdn.microsoft.com/enus/netframework/aa904594.aspx. 6. Atwood, Jeff. Who Needs Stored Procedures, Anyways? Coding Horror. [Online] October 28, 2004. http://www.codinghorror.com/blog/archives/000117.html. 7. LINQ to SQL connection string? Microsoft Developer Network. [Online] April 2009. http://social.msdn.microsoft.com/Forums/en/linqprojectgeneral/thread/2cd12d33-3291-4a869bef-43f96ec17690. 8. Giesenow, Hilton. Why Use LinQ To SQL. Hilton Giesenow's Jumbled Mind. [Online] 2008. http://dotnet.org.za/hiltong/archive/2008/02/01/why-use-linq-to-sql-part-1-performanceconsiderations.aspx. 9. Page-Jones, Meilir. The Practical Guide to Structured Systems Design. New York : Yourdon Press, 1980. ISBN 0-917072-17-0. 10. ronnies(username).LINQ to SQL: the good, the bad and the ugly. Swiss MSDN Team Blog. [Online] 2007. http://blogs.msdn.com/swiss_dpe_team/archive/2007/12/21/linq-to-sql-the-goodthe-bad-and-the-ugly.aspx. 11. Microsoft. The ADO.NET Entity Framework Overview. Visual Studio 2005 Developer Center. [Online] http://msdn.microsoft.com/en-us/library/aa697427(VS.80).aspx. 12. Sarmiento, Edwin. Reading XML documents using LINQ to XML . MSSQLTips.com. [Online] 2008. http://www.mssqltips.com/tip.asp?tip=1524. 13. KshitijPandey(username). Performance comparisons LinQ to SQL ,ADO ,C#. Code Project. [Online] 2008. http://www.codeproject.com/KB/dotnet/LinQ_Performance_net3_5.aspx. 14. Pizzo, Michael. ADO.NET Entity Framework: What? How? Why? Channel 9. [Online] 2006. http://channel9.msdn.com/shows/Going+Deep/ADONET-Entity-Framework-What-How-Why/.
47
Testing a new object-relational mapping language for database-website communication 15. Northrup, Tony, Wildermuth, Shawan and Ryan, Bill. Microsoft .NET framework 2.0 Application Development Foundation. s.l. : Microsoft Press. 16. Snell, Mike, et al. Designing and Developing Web-Based Applications Using Microsoft .NET framework. s.l. : Microsoft Press. 17. Johnson, Glenn and Northrup, Tony. Microsoft .NET framework 2.0 Web-Based Client Development. s.l. : Microsoft Press. 18. Klein, Scott. Professional LINQ. s.l. : Wiley Publishing, Inc. 19. Esposito, Dino. Programming Microsoft ASP.NET 3.5. s.l. : Microsoft Press. 20. Leblanc, Grard. C# et .NET. s.l. : Editions Eyrolles, 2002. 21. Delmal, Pierre. SQL2 - SQL3. s.l. : Editions De Boeck. 22. Mariani, Rico. DLinq (Linq to SQL) Performance. Rico Mariani's Performance Tidbits. [Online] 2007. http://blogs.msdn.com/ricom/archive/2007/06/22/dlinq-linq-to-sql-performance-part1.aspx. 23. Beller, Maximilian. Performance comparison between Linq, NHibernate and ADO.NET / Stored Procedures. Maximilian Beller's Blog . [Online] 2007. http://www.mbeller.de/2007/12/performance-comparison-between-linq.html. 24. Flasko, Elisa. Introducing LINQ to Relational Data. .NET Framework Developer Center. [Online] January 2008. http://msdn.microsoft.com/en-us/library/cc161164.aspx#_Toc188851308. 25. Ramesh, Gopalaswamy and Bhattiprolu, Ramesh. Software maintenance : effective practices for geographically distributed environments. New Delhi : Tata McGraw-Hill, 2006. ISBN 9780070483453.
48
INDEX
3 tiers, 15 AdventureWorks, 13, 37 aggregates, 41, 45 ASP.NET, 5, 6, 16, 36, 49, 50 business, 1, 5, 6, 29, 30, 48 C#, ix, 10, 27, 41, 42, 45, 49, 50 consistency, 31, 32, 33 datacontext, 14, 34, 35, 42, 43, 44, 45 DB2, v, 30, 48 dbml, 25, 27, 30, 33, 44 effort, vii, 15, 25, 31 Entity Framework, 30, 31, 36, 49, 50 intellisence, 20, 24, 27, 29 lazy loading, 11 LINQ to SQL, v, 3, 9, 10, 11, 13, 31, 32, 48, 49 maintenance, iii, ix, 2, 11, 28, 29, 30, 50 Microsoft, v, 5, 13, 20, 37, 49, 50 MySQL, v, 30 object-relational mapping, i, iii, v, 1, 2, 3, 47, 48 Oracle, v, 30, 36, 48 ORM, 5, 15 partial classes, 14 partial methods, 14 produced SQL, 30, 31 security, vii, 12, 13, 16 SQL Server, v, 27, 30, 31, 34, 36 stored procedures, 6, 10, 11, 12, 13, 24, 25, 37, 44 time, vii, 15, 19, 31 tracking, ix, 34, 35 var, 26 variable, 19, 26 Visual Studio, 27, 29, 30, 49 web development, v, 1 XML, 6, 40, 44, 49 XSLT, 6, 44
49