CGI Scripts: Indian Institute of Technology Kharagpur

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

Indian Institute of Technology Kharagpur

CGI Scripts

Prof. Indranil Sen Gupta


Dept. of Computer Science & Engg.
I.I.T. Kharagpur, INDIA

Lecture 19: CGI Scripts


On completion, the student will be able to:
• Explain the basic structure of CGI scripts,
and their working.
• Explain the different ways in which form data
can be passed on to a CGI script program.
• Illustrate the URL encoding/decoding issues.
• State the standard environment variables
used in CGI scripts.
• Explain how response is sent back to the
web client after the CGI script finishes
execution.

1
Introduction

• CGI stands for Common Gateway


Interface.
¾Allows interactive web pages to be written.
ƒ Page created dynamically, based on user
request.
¾CGI programs are called “scripts” because
the first CGI programs were written using
UNIX shell scripts, and PERL.
ƒ Can be written in almost any language.
¾Usually resides in a special directory in the
web server (typically, “cgi-bin”).

• Apache Directory Structure: a case study


¾cgi-bin
ƒ Here most of the interactive programs will
reside. These will be written in Perl, Java, or any
other programming language.
¾conf
ƒ This will contain the configuration files.
¾htdocs
ƒ This will contain the actual HTML documents,
and will typically have many subdirectories. This
directory is known as the DocumentRoot.

2
¾icons
ƒ This contains the icons that Apache will use
when displaying information or error messages.
¾images
ƒ This will contain the image files that will be used
in the web site.
¾logs
ƒ This will contain the log files: the access_log
and error_log.

Structure of CGI Script

• When a CGI script is invoked by the


server, the server passes information to
the script in one of two ways:
a) GET
b) POST
• The request method used is passed to
the script via the environment variable
REQUEST_METHOD.

3
“GET” Request Method

• The GET method sends request information


as parameters appended at the end of the
URL.
http://myserver.edu/cgi-bin/myprog.pl?
name=niloy&rollno=7312&age=24
• The parameters are passed to the CGI
program via the environment variable
QUERY_STRING.
¾For the above example, QUERY_STRING
will contain
name=niloy&rollno=7312&age=24

“POST” Request Method

• The data gets passed from the server to


the CGI script through STDIN.
• The environment variable
CONTENT_LENGTH indicates the size in
bytes of the incoming data.
• The format of the POST-ed data is:
var1=value1&var2=value2&……
• The REQUEST_METHOD environment
variable must be examined to know
whether or not to read from STDIN.

4
To Summarize

• For GET
¾Data are read from QUERY_STRING
environment variable.
• For POST
¾Data are read from STDIN.
¾Number of bytes to be read is obtained
from CONTENT_LENGTH.
• Both data available in same format:
var1=value1&var2=value2&……
name=niloy & rollno=7312 & age=24

URL Encoding

• For platform independence, all data


passed to the server are URL-encoded.
¾Variables are separated by ‘&’.
¾Special characters (including ‘&’) are
escaped as 2-digit hex numbers, e,g,
%25 Î ‘%’
%20 Î ‘ ’
¾‘+’ sign is interpreted as a space character.

5
• The process of decoding back:
¾Separate out the variables.
¾Replace all ‘+’ signs by spaces.
¾Replace all %## with the corresponding
ASCII character.

• Which characters are encoded?


¾Control characters: 0x00 through 0x1F,
and 0x7F.
¾8-bit characters: 0x80 through 0xFF
¾Characters given special importance
within URLs: ; / ? : @ & = + $ ,
¾Characters often used to delimit URLs: < >
# % “
¾Characters considered unsafe as they may
have special meaning for other protocols:
{ } | \ ^ [ ] `

6
• A point to note:
¾When the server passes data using the
POST method, the scripts checks the
environment variable CONTENT_TYPE.
¾If the value of CONTENT_TYPE is
application/x-www-form-urlencoded
the data needs to be decoded before use.

Basic Structure of CGI Script

• Step 1: Initialization
¾Check REQUEST_METHOD.
¾Parse string and extract variables
depending on “GET” or “POST”.
¾Check CONTENT_TYPE, to find out if
the string is URL-encoded.
• Step 2: Processing
¾Process the input data.
¾Output the results (MIME-type header,
and the contents).

7
• Step 3: Termination
¾Release the system resources.
¾Terminate the program.

Environment Variables Used

• CONTENT_LENGTH
¾Length of URL-encoded data in bytes.
• CONTENT_TYPE
¾Specifies the type of data as a MIME header.
• QUERY_STRING
¾Information at the end of the URL after ‘?’.
• REMOTE_ADDR
¾IP address of the client making the request.
• REMOTE_HOST
¾Resolved host name of the client.

8
• REQUEST_METHOD
¾“GET” or “POST”.
• SERVER_NAME
¾Web server’s host name, or IP address.
• SERVER_PROTOCOL
¾Say, HTTP/1.0
• SERVER_PORT
¾Port number on server that received the
HTTP request.
• SCRIPT_NAME
¾Name of the CGI script being run.

Response Header

• The most common response header is


Content-Type, which is based on MIME
types.
• Typical values are:

Content-Type: text/plain
text/html
image/gif
video/avi

9
• A complete MIME header looks like this:

Content-Type: text/plain;
charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Description: Postscript

CGI Real-life Examples

• Search Engine
• Page-hit Counter
• Student Registration
• On-line Booking of Tickets
• On-line Purchase of Items
• E-mail Gateways
• Feedback Scripts
• Web-based Games

10
Security with CGI Scripts

• A CGI script is a program that anyone


in the world can run on your machine.
• Do not trust the user input.
¾In particular, do not put user data in a
shell command without verifying the data
carefully.
¾An example in next slide.

• An example
¾Suppose that you have a CGI script that
lets users run the “finger” command on
your host.
¾In Perl, there can be a line:
system “finger $username”
¾A malicious user may enter
isg; rm –r /
as the username.
¾The result --- all files will get deleted.

11
Enter UserId isg; rm –r /

An Example CGI Program

• Using bash shell script:


#!/bin/sh
CAT=/bin/cat
echo Content-type: text/plain
echo ""
if [[ -x $CAT]]
then
$CAT $1 | sort
else
echo Cannot find command on this
system.
fi

12
• What this program does?
¾Sends the contents of a file residing on
the server back to the browser.
• How to invoke?
<A HREF="/cgi-bin/test1.sh?
/home/user1/public_html/text-file.txt">
Click here to activate</A>

$1

Another Example

#!/bin/sh
echo Content-type: text/html
echo ""

/bin/cat << EOM


<HTML>
<HEAD>
<TITLE>File Output: /home/user1/public_html/text-file.txt
</TITLE>
</HEAD>
<BODY bgcolor="#cccccc" text="#000000">
<HR SIZE=5>
<H1>File Output: /home/user1/public_html/text-file.txt </H1>
<HR SIZE=5> <P>

13
<SMALL>
<PRE>
EOM

/bin/cat /home/user1/public_html/text-file.txt
CAT << EOM
</PRE>
</SMALL> <P>
</BODY>
</HTML>
EOM

• What this program does?


¾Outputs the contents of the file “text-
file.txt” as a HTML file.
• How to invoke?
¾Through a dummy HTML form.
¾Through the following link:
<A HREF="/cgi-bin/test2.sh">Click here</A>

14
E-mail Gateways: an Example

• E-mail gateways are very popular on the


web.
• Allows users to send and receive mails,
without having to worry about managing a
mail server.
• Can be designed using CGI scripts, or any
other similar technologies.
• Popular e-mail gateways: yahoo, rediffmail,
hotmail, gmail, etc.

15
Email
Browser Mail Server
Gateway

Writing CGI Scripts using Perl

• Would be discussed later.


¾After discussing the syntax and
semantics of Perl.
¾We will see how the form data can be
extracted and processed.
ƒ Requires string manipulation.

16
SOLUTIONS TO QUIZ
QUESTIONS ON
LECTURE 18

17
Quiz Solutions on Lecture 18

1. What is a hot spot?


A hot spot is a defined region on an
image map which, when clicked,
hyperlinks to a specified URL.
2. What is the essential difference between
client-side and server-side image maps?
In server-side image map, the processing
of mouse click is done on the server
side. In client-side image map, all
information is there in the HTML file, and
can be done locally itself.

Quiz Solutions on Lecture 18

3. What information does the image map


configuration file contain?
Default URL, an optional base URL,
and the geometries of the hot spots.
4. What is the purpose of the default URL
in case of server-side image map?
It specifies the URL where the user will
be taken if he/she clicks on a region
which is not a hot spot.

18
Quiz Solutions on Lecture 18

5. Why is client-side image map faster and


puts less load on the server?
Because all processing is done locally on
the browser.
6. Why is the ISMAP attribute used?
To indicate that the included image is a
clickable map.
7. Why is the USEMAP attribute used?
For linking to an image in client-side
image map.

Quiz Solutions on Lecture 18

8. Show a client-side image map configuration


specification where there are four triangular
shaped areas joined together to form a
square shaped structure.

(0,0) (50,0)

TOP

LEFT (25,25) RIGHT

BOTTOM
(0,50) (50,50)

19
Quiz Solutions on Lecture 18

<MAP NAME = “demo_map”>


<AREA SHAPE=POLY COORDS=“0,0,0,50,25,25”
HREF=“left.html”>
<AREA SHAPE=POLY COORDS=“50,0,50,50,25,25”
HREF=“right.html”>
<AREA SHAPE=POLY COORDS=“0,0,50,0,25,25”
HREF=“top.html”>
<AREA SHAPE=POLY COORDS=“10,50,50,50,25,25”
HREF=“bottom.html”>
</MAP>

QUIZ QUESTIONS ON
LECTURE 19

20
Quiz Questions on Lecture 19

1. What does the REQUEST_METHOD


environment variable specify?
2. How does the form data get accessed in
GET, and in what form?
3. How does the form data get accessed in
POST?
4. Why is the POST method more desirable as
compared to GET in general?
5. Perform URL encoding on the following
string:
http://xyz.com?name=Subir Das

Quiz Questions on Lecture 19

6. How does the CGI script know that the


form data as received has been URL
encoded?
7. What is the function of the UNIX command
“finger”?
8. Write a CGI program using shell script
which will send back the message
“THANK YOU FOR SUBMITTING” every
time a form is submitted to it.

21

You might also like