Databases: An Overview of
Technology and Problems*
L J Haravu
Information System Consultant
69 Krishnapuri Colony, West Marredpally, Secunderabad-500026
Databases have been the main vehicle through which organizations of all kinds have made structured textual and other data available for data processing, data analysis, as data archives, and for the storage and retrieval of specific data and information needed by and relevant to the organization.
The need to create and maintain databases has spawned the development of software and paradigms such as the relational data model, relational database management software (RDBMS), object-oriented programming, client-server architecture, open database connectivity (ODBC), open systems, ActiveX components, and so on. Softwares for database development are available for all hardware platforms _ from the stand alone PC or Macintosh to servers on local area and wide area networks, and now for web servers. Some softwares are available for use across software platforms _ Unix, Windows, Linux _ while others are platform-specific.
The rapid spread of Internet and the world wide web (WWW) during the 1980's and 1990's has opened up new opportunities to make databases accessible via the web. Several advantages are seen for web-enabled databases:
Global access to data that is in public domain, e.g., bibliographic, statistical, full text, image and multimedia databases;
Wide access to data that is not in public domain but needs to be shared between different locations of a given enterprise, e.g., a multinational bank with branches all over the world, or between an organization and its vendors;
Platform-independent access to data and information;
Dynamic updating of data ensuring that live data is available globally for use by decision makers, customers, vendors, etc. This is becoming particularly important in the rapidly growing e-commerce sectors,
The fact that a single interface, viz., the Internet Browser is all that is needed on the client machine to access databases across the Internet, and
The possibility for developing and delivering interactive solutions and for the collection of valuable feedback, reactions, analyses, and user preferences for use in business and other decisions.
This paper briefly reviews the technology available for web-enabled databases.
Static-vs-Dynamic Content on the Web
When the WWW first emerged, it contained almost exclusively static HTML pages. This meant that when a user selected a URL (Universal Resource Locator), the web server returned the contents of a static HTML page corresponding to the URL. The HTML page could contain hyperlinks to other web pages on the same or other web servers, enabling the user to `surf the net'. From a server administrator's (or webmaster's) point of view he simply needed to save HTML files in a logically mapped drive structure. This paradigm was adequate for delivering static information, but it could not be used for anything interactive or for the delivery of web content that was built on-the-fly, i.e., on-demand.
The answer to enabling dynamic and interactive web content was found in Dynamic HTML or DHTML.
The central concept in DHTML is that the server does the processing before passing information back to the client. The server no longer returns the contents of a static file. Instead, it processes custom code on the server to determine what to return to the client based on inputs that is collected from the user.
Two common methods used on web servers to process customized code are: i) via an executable file, or ii) via a dynamic link library (DLL) both on the server. Each of these methods creates HTML pages on-demand (e.g., based on the search of a database) which are then returned to the client browser along with header information.
The use of executables on the web server developed using scripting languages such as PERL is the technology that was first to be used on servers to deliver dynamic content. The technology uses the so-called Common Gateway Interface (CGI) Scripting languages to create the customized code.
Typically in this method, the user fills out a HTML form which contains place holders, for instance, for search values or terms. Once the user fills-in the form, it is submitted to the web server via a Post or Get operation of the HTML Form. The values or terms entered by the user are passed to the CGI executable on the server as environmental variables (e.g., as header information). The executable then uses the input received and invokes custom processes and returns the results (e.g., as a list of hits dynamically selected from the database) to the browser. The use of CGI scripts on the server to deliver dynamic content is probably the most widespread method in use on web servers primarily because this is supported on all platforms.
The chief disadvantage of using CGI executables is the fact that each time the executable is invoked via a user's call to it (via the URL), a new process on the web server is spawned. The executable file is loaded into memory each time it is invoked. Also, the process is created in a different memory space on the web server. CGI executables are, therefore, also known as out-of-process servers. It is quite conceivable that when several users request the executable, the server becomes overwhelmed and performance may be seriously degraded. Memory corruption may also happen at times bringing the server down.
Unlike CGI scripts, dynamic link libraries (DLL) are in-process servers, i.e., a DLL runs in the same process space as the web server. Also, functions within the DLL need to be loaded only once into memory when the function is first invoked. Once this is done, all processes that might need them share the function. DLLs use the concept of multi-threading. In this, more than one procedure can be executing at the same time. One disadvantage of using DLLs is that thread safe code is difficult to write and test. However, once a DLL has been thoroughly tested, it is a far more efficient method as compared to CGI scripts for delivery of dynamic content in web applications.
Microsoft and Netscape, providers of the two most popular browsers provide specifications for DLLs, viz., ISAPI DLL and NSAPI DLL that hook into the Microsoft Internet Server and Netscape Internet Server respectively. It is possible to develop applications compatible with one or both the above mentioned servers using application development tools such as Visual Basic or Delphi.
Microsoft, which has the largest market share of the PC operating system and office productivity applications, understandably has been active in developing products to web-enable databases. Two of its popular RDBMS software, viz., MS-ACCESS and SQL Server come with web publishing wizards to enable databases under these two platforms to be quickly published to an Intranet or the Internet.
The Internet Database Connector (IDC) and Advanced Database Connector (ADC) are two methods that Microsoft has made available along with the ActiveX Data Objects (ADO) to web-enable Access and SQL Server databases. ADO is a successor to earlier Microsoft data access models. The ADO Object model provides a high level interface to ODBC data sources. The ODBC model itself is to be supplemented by another application programmer interface (API) called OLE DB. The ODBC specification was meant to provide access primarily to relational or SQL compliant databases. OLE DB on the other hand is a technology that will provide access not only to ODBC-compliant data sources but also to other sources, e.g., e-mail systems, CAD/CAM, etc. IDC applications are limited to data access and are supported only by Microsoft compatible web servers.
A new technology that has been on the market for a little under two years, viz., Active Server Pages (ASP) is beginning to make waves as one that is not limited only to data access but to a host of other applications including those in the burgeoning e-commerce area. ASP and its potentials are briefly described in the next section.
Active Server Pages
An Active Server Page is a file with an .asp extension that consists of regular HTML along with embedded scripting statements. The ASP Scripting engine on the server interprets the script statements, processes them and returns HTML to the web browser. Because the browser sees only HTML, ASP works with any browser.
Using ASP for web application development has several advantages:
Application development with ASP is far more efficient as compared to working with languages such as Visual C++. This is because ASP is compile-free. Unlike with earlier development tools, it is not necessary for an asp file to be compiled before it is tested. This reduces development effort considerably because in the previous method each time a change was made to a program, it had to be compiled and placed on the web server before it could be tested.
The ASP environment is fully extensible because it not only ships with several built-in components, but can use several third party components available freely and commercially.
ASP protects proprietary business logic and algorithms. Because ASP files run on the server and not on the client, it is possible to hide proprietary business rules and logic from clients. This feature is becoming important in three-tier and multi-tier applications where there is an even greater division of labour than in the two-tier, client-server architecture.
ASP provides ways to overcome the inherent limitations of the HTTP protocol, viz., as a stateless protocol. The Application and Session Objects supported by ASP permit the capture of session information to maintain a degree of continuity in the interactions within a given user session.
With the introduction of Visual InterDev (VID) by Microsoft in 1998, development of web database applications with Microsoft technologies took a major step forward. VID is a Rapid Application Development tool in the same genre as Visual Basic, Delphi and the like and permits high throughput in development, testing, debugging and deploying web applications.
Cold Fusion, a product developed by Allaire Corpn., since its release, has become a popular product for web database applications. Cold Fusion is based on a markup language on the server that exposes powerful functionality to the web developer. However, it does not provide the degree of freedom that is offered by ASP in mixing and matching components using other languages such as Visual Basic, C++, Java, Delphi and so on. Third party tools are now available for converting Cold Fusion created files to ASP files.
IntraBuilder is another popular web database application development tool from Borland (Now Inprise Corpn). This is another product that uses the well-known Integrated Development Environment that was pioneered by Borland, making it a very developer-friendly tool.
Visual Café Pro for Java (Database Edition) is a product from Symantec for web database development. Here, the emphasis is on using Java as the programming language and the Java Database Connectivity (JDBC) tools for web database applications.
File-Maker Pro, a low-cost user-friendly product which made its appearance only two or so years ago enables relational databases to be web-enabled without a web server. Once the software is installed and a database is defined, it can be published on an Intranet or Internet. In effect, the software has a built-in web server. This is considered to be an ideal product for small applications.
When applications increase in size and complexity in terms of number of records, size of records, complexity of business logic and frequency of changes in business logic, they must be scaled to be more robust and efficient to use and maintain. Problems of this kind are addressed by the concept of N-tier architecture as opposed to the two-tier client-server architecture. In this, the application is partitioned into at least 3 tiers with the business logic being hosted in a Middle tier (also called Middleware or Application Server) between the User Services and Data Services tiers. The User services tier takes care of the presentation of data at the client, the Middleware houses proprietary business logic and algorithms, and the data services tier provides the database logic and services. The advantage of this architecture is that when business logic changes, only the middle tier needs to change. This makes for more efficient development and maintenance effort and ensures that business logic is protected. On the other hand, it increases the cost of development and investments needed in skills.
Some web-based database applications require sophisticated statistical analysis tools. Commonly available database softwares (Access, SQL Server) are not in a position to provide more than elementary statistical functions. In such a situation, it may become necessary to introduce a statistics-rich system along with a database software.
However, higher-end database solution providers such as Oracle, Sybase and Informix provide the so-called data warehousing solutions, which include not only RDBMS functionality but also data mining and decision support capabilities. Data mining is a descriptive statistical analysis tool for discovering patterns and trends in the data while decision support is an inferential tool for supporting decisions based on data patterns and trends.
An irksome problem faced by most web application developers is the lack of standards among browser manufacturers. Capabilities vary widely across browsers and their versions.
Web developers have to either code for the lowest common denominator which means that they cannot take advantage of the features of later versions of browsers, or they have to build their applications to first detect the users' browser and then present content to suit a given browser. This adds an overhead to the development and maintenance effort.
* Paper presented at ITT 99: Towards Information Content for Global Competitiveness, organized by NISSAT, and held during November 16-19, 1999 at IICT, Hyderabad.
Information Today & Tomorrow, Vol. 19, No. 1, March 2000, p.7-p.10