AI all notes

NUTAN MAHARASHTRA VIDYA PRASARAK MANDAL’S
NUTAN COLLEGE OF ENGINEERING & RESEARCH (NCER)

Department of Computer Science & Engineering
--------------------------------------------------------------------------------------------------------------------------------------
BTCOC701
Lecture
Topic to be covered
Number
Unit 1: Introduction (07Hrs)

1 ⮚ Introduction, What Is AI?,
⮚ The Foundations of Artificial Intelligence

2
⮚ The History of Artificial Intelligence

3
⮚ Agents and Environments Good Behavior

4
⮚ The Concept of Rationality

5
⮚ The Nature of Environments The Structure of Agents.

6
⮚ The Structure of Agents.

7
: Submitted by:
Prof. S. B. Mehta
ASST PROF:- S.B.Mehta NCER, BATU UNIVERSITY, LONERE
DEPARTMENT OF
COMPUTER SCIENCE
Nutan College Of Engineering &
& ENGINEERING
Research, Talegaon Dabhade, Pune-410507
Artificial Intelligence
Unit 1: Introduction of AI
Artificial Intelligence:
 In today's world, technology is growing very fast, and we are getting in touch with different new
technologies day by day Here, one of the booming technologies of computer science is Artificial
Intelligence which is ready to create a new revolution in the world by making intelligent machines.
 The Artificial Intelligence is now all around us. It is currently working with a variety of subfields,
ranging from general to specific, such as self-driving cars, playing chess, proving theorems, playing
music, Painting, etc.
 One of the greatest innovators in the field of machine learning was John McCarthy, widely
recognized as the "Father of Artificial Intelligence".
 In the mid 1950s, McCarthy coined the term "Artificial Intelligence" and defined it as "the science of
making intelligent machines
 According to the father of Artificial Intelligence, John McCarthy, it is “The science and
engineering of making intelligent machines, especially intelligent computer programs”.
 Artificial Intelligence is a way of making a computer, a computer-controlled robot, or a software
think intelligently, in the similar manner the intelligent humans think.
 AI is accomplished by studying how human brain thinks, and how humans learn, decide, and work
while trying to solve a problem, and then using the outcomes of this study as a basis of developing
intelligent software and systems.
 Artificial Intelligence suggest that machines can mimic humans in:
Talking
Thinking
Learning
Planning
Understanding
 Artificial Intelligence is composed of two words Artificial and Intelligence, where Artificial defines
"man-made," and intelligence defines "thinking power", hence AI means "a man-made thinking
power."
 "It is a branch of computer science by which we can create intelligent machines which can
behave like a human, think like humans, and able to make decisions.
Why Artificial Intelligence:

Before Learning about Artificial Intelligence, we should know that what is the importance of AI and
why should we learn it. Following are some main reasons to learn about AI:
 With the help of AI, you can create such software or devices which can solve real-world problems
very easily and with accuracy such as health issues, marketing, traffic issues, etc.
 With the help of AI, you can create your personal virtual Assistant, such as Cortana, Google
Assistant, Siri, etc.
 With the help of AI, you can build such Robots which can work in an environment where survival of
humans can be at risk.
 AI opens a path for other new technologies, new devices, and new Opportunities.
Goals of Artificial Intelligence

1. It helps you reduce the amount of time needed to perform specific tasks.
2. Making it easier for humans to interact with machines.
3. Facilitating human-computer interaction in a way that is more natural and efficient.
4. Improving the accuracy and speed of medical diagnoses.
5. Helping people learn new information more quickly.
6. Enhancing communication between humans and machines.
Types of Artificial Intelligence
1. Artificial narrow intelligence (ANI), which has a narrow range of abilities;

• Narrow AI is a type of AI which is able to perform a dedicated task with intelligence.The most
common and currently available AI is Narrow AI in the world of Artificial Intelligence.
• Artificial narrow intelligence (ANI), also referred to as weak AI or narrow AI, is the only type of
artificial intelligence we have successfully realized to date. Narrow AI is goal-oriented, designed to
perform singular tasks - i.e. facial recognition, speech recognition/voice assistants, driving a car, or se •
While these machines may seem intelligent, they operate under a narrow set of constraints and
limitations, which is why this type is commonly referred to as weak AI. Narrow AI doesn’t mimic or
replicate human intelligence, it merely simulates human behaviour based on a narrow range of parameters
and contexts.
• Consider the speech and language recognition of the Siri virtual assistant on iPhones, vision
recognition of self-driving cars, and recommendation engines that suggest products you make like based
on your purchase history. These systems can only learn or be taught to complete specific tasks.
• Narrow AI’s machine intelligence comes from the use of natural language processing (NLP) to
perform tasks. NLP is evident in chatbots and similar AI technologies. By understanding speech and text
in natural language, AI is programmed to interact with humans in a natural, personalised manner.arching
the internet - and is very intelligent at completing the specific task it is programmed to do.
Examples of narrow AI:
• Rankbrain by Google / Google Search
• Siri by Apple, Alexa by Amazon, Cortana by Microsoft and other virtual assistants
• IBM’s Watson
• Image / facial recognition software
• Disease mapping and prediction tools
• Manufacturing and drone robots
• Email spam filters / social media monitoring tools for dangerous content
• Entertainment or marketing content recommendations based on watch/listen/purchase behaviour
• Self-driving cars
2. Artificial General Intelligence (AGI) / Strong AI / Deep AI : which is on par with
human capabilities
•Artificial general intelligence (AGI), also referred to as strong AI or deep AI, is the concept of a
machine with general intelligence that mimics human intelligence and/or behaviors, with the ability to
learn and apply its intelligence to solve any problem. AGI can think, understand, and act in a way that is
indistinguishable from that of a human in any given situation.
•The machines will not require humans to input programming to function. For all intents and purposes,
this would be when machines act, feel, respond and think just like humans. We will be able to say strong
AI has a mind of its own and will be able to accomplish any task it sets out to complete, just like any
human.
•AI researchers and scientists have not yet achieved strong AI. To succeed, they would need to find a
way to make machines conscious, programming a full set of cognitive abilities. Machines would have to
take experiential learning to the next level, not just improving efficiency on singular tasks, but gaining
the ability to apply experiential knowledge to a wider range of different problems.
•Currently, Most AI examples that you hear about- from chess-playing computers, self-driving cars to
understanding speech or text delivered in natural language rely on deep learning and natural
language processing
•Currently, there is no such system exist which could come under general AI and can perform any task as
perfect as a human.
•The worldwide researchers are now focused on developing machines with General AI.
•As systems with general AI are still under research, and it will take lots of efforts and time to develop
such systems
3. Artificial Superintelligence (ASI) : which is more capable than a human
•Artificial super intelligence (ASI), is the hypothetical AI that doesn’t just mimic or understand human
intelligence and behaviour; ASI is where machines become self-aware and surpass the capacity of human
intelligence and ability.
•In addition to replicating the multi-faceted intelligence of human beings, ASI would theoretically be
exceedingly better at everything we do; math, science, sports, art, medicine, hobbies, emotional
relationships, everything. ASI would have a greater memory and a faster ability to process and analyses
data and stimuli. Consequently, the decision-making and problem solving capabilities of super intelligent
beings would be far superior than those of human beings
Artificial Intelligence type-2: Based on functionality
1. Reactive Machines
 Purely reactive machines are the most basic types of Artificial Intelligence.
 This type of machine reacts depending on the present moment .
 Such AI systems do not store memories or past experiences for future actions.
 These machines only focus on current scenarios and react on it as per possible best action.
 IBM's Deep Blue system is an example of reactive machines.
 Google's AlphaGo is also an example of reactive machines
 Example: Chess Player Machine this type of machine greate with playing games has ability to
recognize which move would be best
2. Limited Memory
• Limited memory machines can store past experiences or some data for a short period of time.
• These machines can use stored data for a limited time period only.
• Self-driving cars are one of the best examples of Limited Memory systems. These cars can store
recent speed of nearby cars, the distance of other cars, speed limit, and other information to navigate the
road. E.g Traffic light , chat Robots
3. Theory of Mind
 Theory of Mind AI should understand the human emotions, people, beliefs, and be able to interact
socially like humans.
 This type of AI machines is still not developed, but researchers are making lots of efforts and
improvement for developing such AI machines.
 This type of AI Machine more intelligent as it reacts according to understood people thought &
emotions, these machines can adjust with people around, build social interactions, predict how people
expect to be Treats & thus behave according of these expectations.
 Example : Robot Sofia – Camera present in Sophia’s eves combined with computer algorithms allow
her to see , sustain eye contact , recognize individual & follow faces
4. Self-Awareness
 Self-awareness AI is the future of Artificial Intelligence. These machines will be super intelligent, and
will have their own consciousness, sentiments, and self-awareness.
 These machines will be smarter than human mind. More aware of their needs and internal sense than
human being
 Self-Awareness AI does not exist in reality still and it is a hypothetical concept. Such system
understands their internal traits, states & conditions and perceives human emotion.
 This type of AI will not only be to understand and evoke emotions in those it interact with but also
have emotions, needs and beliefs of it’s own.
History of Artificial Intelligence
Maturation of Artificial Intelligence (1943-1952)
o Year 1943: The first work which is now recognized as AI was done by Warren McCulloch
and Walter pits in 1943. They proposed a model of artificial neurons.
o Year 1949: Donald Hebb demonstrated an updating rule for modifying the connection
strength between neurons. His rule is now called Hebbian learning.
o Year 1950: The Alan Turing who was an English mathematician and pioneered Machine
learning in 1950. Alan Turing publishes "Computing Machinery and Intelligence" in which
he proposed a test. The test can check the machine's ability to exhibit intelligent behavior
equivalent to human intelligence, called a Turing test.
The birth of Artificial Intelligence (1952-1956)
o Year 1955: An Allen Newell and Herbert A. Simon created the "first artificial intelligence
program"Which was named as "Logic Theorist". This program had proved 38 of 52
Mathematics theorems, and find new and more elegant proofs for some theorems.
o Year 1956: The word "Artificial Intelligence" first adopted by American Computer scientist
John McCarthy at the Dartmouth Conference. For the first time, AI coined as an academic
field.
At that time high-level computer languages such as FORTRAN, LISP, or COBOL were invented. And
the enthusiasm for AI was very high at that time.
The golden years-Early enthusiasm (1956-1974)
o Year 1966: The researchers emphasized developing algorithms which can solve
mathematical problems. Joseph Weizenbaum created the first chatbot in 1966, which was
named as ELIZA.
o Year 1972: The first intelligent humanoid robot was built in Japan which was named as
WABOT-1.
The first AI winter (1974-1980)
o The duration between years 1974 to 1980 was the first AI winter duration. AI winter refers to
the time period where computer scientist dealt with a severe shortage of funding from
government for AI researches.
o During AI winters, an interest of publicity on artificial intelligence was decreased.
A boom of AI (1980-1987)
o Year 1980: After AI winter duration, AI came back with "Expert System". Expert systems
were programmed that emulate the decision-making ability of a human expert.
o In the Year 1980, the first national conference of the American Association of Artificial
Intelligence was held at Stanford University.
The second AI winter (1987-1993)
o The duration between the years 1987 to 1993 was the second AI Winter duration.
o Again Investors and government stopped in funding for AI research as due to high cost but
not efficient result. The expert system such as XCON was very cost effective.
The emergence of intelligent agents (1993-2011)
o Year 1997: In the year 1997, IBM Deep Blue beats world chess champion, Gary Kasparov,
and became the first computer to beat a world chess champion.
o Year 2002: for the first time, AI entered the home in the form of Roomba, a vacuum
cleaner.
o Year 2006: AI came in the Business world till the year 2006. Companies like Facebook,
Twitter, and Netflix also started using AI.
Deep learning, big data and artificial general intelligence (2011-present)
o Year 2011: In the year 2011, IBM's Watson won jeopardy, a quiz show, where it had to solve
the complex questions as well as riddles. Watson had proved that it could understand
natural language and can solve tricky questions quickly.
o Year 2012: Google has launched an Android app feature "Google now", which was able to
provide information to the user as a prediction.
o Year 2014: In the year 2014, Chatbot "Eugene Goostman" won a competition in the
infamous "Turing test."
o Year 2018: The "Project Debater" from IBM debated on complex topics with two master
debaters and also performed extremely well.
o Google has demonstrated an AI program "Duplex" which was a virtual assistant and which
had taken hairdresser appointment on call, and lady on other side didn't notice that she was
talking with the machine.
Application of AI
Artificial Intelligence has various applications in today's society. It is becoming essential for today's time
because it can solve complex problems with an efficient way in multiple industries, such as Healthcare,
entertainment, finance, education, etc. AI is making our daily life more comfortable and fast.Following are
some sectors which have the application of Artificial Intelligence:
1. AI in Astronomy
 Artificial Intelligence can be very useful to solve complex universe problems. AI technology can be
helpful for understanding the universe such as how it works, origin, etc.
2. AI in Healthcare
 In the last, five to ten years, AI becoming more advantageous for the healthcare industry and going to have
a significant impact on this industry.
 Healthcare Industries are applying AI to make a better and faster diagnosis than humans. AI can help
doctors with diagnoses and can inform when patients are worsening so that medical help can reach to the
patient before hospitalization.
3. AI in Gaming
 AI can be used for gaming purpose. The AI machines can play strategic games like chess, where the
machine needs to think of a large number of possible places.
4. AI in Finance
 AI and finance industries are the best matches for each other. The finance industry is implementing
automation, chatbot, adaptive intelligence, algorithm trading, and machine learning into financial
processes.
5. AI in Data Security
 The security of data is crucial for every company and cyber-attacks are growing very rapidly in the digital
world. AI can be used to make your data more safe and secure. Some examples such as AEG bot, AI2
Platform,are used to determine software bug and cyber-attacks in a better way.
6. AI in Social Media
 Social Media sites such as Facebook, Twitter, and Snapchat contain billions of user profiles, which need to
be stored and managed in a very efficient way. AI can organize and manage massive amounts of data. AI
can analyze lots of data to identify the latest trends, hashtag, and requirement of different users.
7. AI in Travel & Transport
 AI is becoming highly demanding for travel industries. AI is capable of doing various travel related works
such as from making travel arrangement to suggesting the hotels, flights, and best routes to the customers.
Travel industries are using AI-powered chatbots which can make human-like interaction with customers
for better and fast response.
8. AI in Automotive Industry
 Some Automotive industries are using AI to provide virtual assistant to their user for better performance.
Such as Tesla has introduced TeslaBot, an intelligent virtual assistant.
 Various Industries are currently working for developing self-driven cars which can make your journey
more safe and secure.
9. AI in Robotics:
 Artificial Intelligence has a remarkable role in Robotics. Usually, general robots are programmed such that
they can perform some repetitive task, but with the help of AI, we can create intelligent robots which can
perform tasks with their own experiences without pre-programmed.
 Humanoid Robots are best examples for AI in robotics, recently the intelligent Humanoid robot named as
Erica and Sophia has been developed which can talk and behave like humans.
10. AI in Entertainment
 We are currently using some AI based applications in our daily life with some entertainment services such
as Netflix or Amazon. With the help of ML/AI algorithms, these services show the recommendations for
programs or shows.
11. AI in Agriculture
 Agriculture is an area which requires various resources, labor, money, and time for best result. Now a day's
agriculture is becoming digital, and AI is emerging in this field. Agriculture is applying AI as agriculture
robotics, solid and crop monitoring, predictive analysis. AI in agriculture can be very helpful for farmers.
12. AI in E-commerce
 AI is providing a competitive edge to the e-commerce industry, and it is becoming more demanding in the
e-commerce business. AI is helping shoppers to discover associated products with recommended size,
color, or even brand.
13. AI in education:
 AI can automate grading so that the tutor can have more time to teach. AI chatbot can communicate with
students as a teaching assistant.
 AI in the future can be work as a personal virtual tutor for students, which will be accessible easily at any
time and any place.
AI Examples
Artificial Intelligence Samples:
 Self Driving Cars
 E-Payment
 Google Maps
 Text Autocorrect
 Automated Translation
 Chatbots
 Social Media
 Face Detection
 Search Algorithms
 Robots
 Automated Investment
 NLP - Natural Language Processing
 Flying Drones
 Dr. Watson
 Apple Siri
 Microsoft Cortana
 Amazon Ale
Advantages of Artificial Intelligence
 High Accuracy with less errors: AI machines or systems are prone to less errors and high
accuracy as it takes decisions as per pre-experience or information.
 High-Speed: AI systems can be of very high-speed and fast-decision making, because of
that AI systems can beat a chess champion in the Chess game.
 High reliability: AI machines are highly reliable and can perform the same action multiple
times with high accuracy.
 Useful for risky areas: AI machines can be helpful in situations such as defusing a bomb,
exploring the ocean floor, where to employ a human can be risky.
 Digital Assistant: AI can be very useful to provide digital assistant to the users such as AI
technology is currently used by various E-commerce websites to show the products as per
customer requirement.
 Useful as a public utility: AI can be very useful for public utilities such as a self-driving car
which can make our journey safer and hassle-free, facial recognition for security purpose,
Natural language processing to communicate with the human in human-language, etc.
Disadvantages of Artificial Intelligence

 High Cost: The hardware and software requirement of AI is very costly as it requires lots of
maintenance to meet current world requirements.
 Can't think out of the box: Even we are making smarter machines with AI, but still they cannot
work out of the box, as the robot will only do that work for which they are trained, or programmed.
 No feelings and emotions: AI machines can be an outstanding performer, but still it does not have
the feeling so it cannot make any kind of emotional attachment with human, and may sometime be
harmful for users if the proper care is not taken.
 Increase dependency on machines: With the increment of technology, people are getting more
dependent on devices and hence they are losing their mental capabilities.
 No Original Creativity: As humans are so creative and can imagine some new ideas but still AI
machines cannot beat this power of human intelligence and cannot be creative and imaginative
What are Agent and Environment?

An AI system can be defined as the study of the rational agent and its environment. The agents sense
the environment through sensors and act on their environment through actuators. An AI agent can
have mental properties such as knowledge, belief, intention, etc.
What is an Agent?
An agent can be anything that perceive its environment through sensors and act upon that
environment through actuators. An Agent runs in the cycle of perceiving, thinking, and acting. An
agent can be: An agent is anything that can perceive its environment through sensors and acts upon
that environment through effectors.
 Human-Agent: A human agent has eyes, ears, and other organs which work for sensors and
hand, legs, vocal tract work for actuators.
 Robotic Agent: A robotic agent can have cameras, infrared range finder, NLP for sensors
and various motors for actuators.
 Software Agent: Software agent can have keystrokes, file contents as sensory input and act
on those inputs and display output on the screen. A software agent has encoded bit strings as
its programs and actions.
Hence the world around us is full of agents such as thermostat, cellphone, camera, and even we are also
agents.
Before moving forward, we should first know about sensors, effectors, and actuators.
 Sensor: Sensor is a device which detects the change in the environment and sends the
information to other electronic devices. An agent observes its environment through sensors.
 Actuators: Actuators are the component of machines that converts energy into motion. The
actuators are only responsible for moving and controlling a system. An actuator can be an electric
motor, gears, rails, etc.
 Effectors: Effectors are the devices which affect the environment. Effectors can be legs, wheels,
arms, fingers, wings, fins, and display screen.
Agent Terminology
 Performance Measure of Agent − It is the criteria, which determines how successful an agent is.
 Behavior of Agent − It is the action that agent performs after any given sequence of percepts.
 Percept − It is agent’s perceptual inputs at a given instance.
 Percept Sequence − It is the history of all that an agent has perceived till date.
 Agent Function − It is a map from the precept sequence to an action.
Intelligent Agents:
An intelligent agent is an autonomous entity which acts upon an environment using sensors and
actuators for achieving goals. An intelligent agent may learn from the environment to achieve their
goals. A thermostat is an example of an intelligent agent.
Following are the main four rules for an AI agent:
Rule 1: An AI agent must have the ability to perceive the environment.
Rule 2: The observation must be used to make decisions.
Rule 3: Decision should result in an action.
Rule 4: The action taken by an AI agent must be a rational action.
The Concept of Rationality:

Rationality is nothing but status of being reasonable, sensible, and having good sense of judgment.
Rationality is concerned with expected actions and results depending upon what the agent has
perceived. Performing actions with the aim of obtaining useful information is an important part of
rationality.
The rationality of an agent is measured by its performance measure. Rationality can be judged on the
basis of following points:
 Performance measure which defines the success criterion.
 Agent prior knowledge of its environment.
 Best possible actions that an agent can perform.
 The sequence of percepts.
Rational Agent:
Rational agent is an agent which has clear preference, models uncertainty, and acts in a way to
maximize its performance measure with all possible actions.
 A rational agent is said to perform the right things. AI is about creating rational agents to use for
game theory and decision theory for various real-world scenarios.
 For an AI agent, the rational action is most important because in AI reinforcement learning
algorithm, for each best possible action, agent gets the positive reward and for each wrong
action, an agent gets a negative reward.
An ideal rational agent is the one, which is capable of doing expected actions to maximize its
performance measure, on the basis of −
 Its percept sequence
 Its built-in knowledge base
Rationality of an agent depends on the following −

 The performance measures, which determine the degree of success.
 Agent’s Percept Sequence till now.
 The agent’s prior knowledge about the environment.
 The actions that the agent can carry out.
A rational agent always performs right action, where the right action means the action that causes the
agent to be most successful in the given percept sequence. The problem the agent solves is characterized
by Performance Measure, Environment, Actuators, and Sensors (PEAS).
Agent Environment in AI
An environment is everything in the world which surrounds the agent, but it is not a part of an agent
itself. An environment can be described as a situation in which an agent is present.
The environment is where agent lives, operate and provide the agent with something to sense and act
upon it. An environment is mostly said to be non-feministic.
Type (Features) of Environment
An environment in artificial intelligence is the surrounding of the agent. The agent takes input from the
environment through sensors and delivers the output to the environment through actuators. There are
several types of environmentsStep-1: Select random K data points from the training set.
1. Fully observable vs Partially Observable

2. Static vs Dynamic
3. Discrete vs Continuous
4. Deterministic vs Stochastic
5. Single-agent vs Multi-agent
6. Episodic vs sequential
7. Known vs Unknown
8. Accessible vs Inaccessible
1. Fully observable vs Partially Observable:

 If an agent sensor can sense or access the complete state of an environment at each point of time then it is a
fully observable environment, else it is partially observable.
 A fully observable environment is easy as there is no need to maintain the internal state to keep track history
of the world.
 Maintaining a fully observable environment is easy as there is no need to keep track of the history of the
surrounding.
 An environment is called unobservable when the agent has no sensors in all environments.
Examples:
Chess – the board is fully observable, so are the opponent’s moves.
Driving – the environment is partially observable because what’s around the corner is not known.
2. Deterministic vs Stochastic:
 If When a uniqueness an agent's current state and selected action can completely determine the next state of
the environment, then such environment is called a deterministic environment.
 In a deterministic, fully observable environment, agent does not need to worry about uncertainty.
 The stochastic environment is random in nature which is not unique and cannot be completely determined
by the agent.
Examples:
Chess – there would be only a few possible moves for a coin at the current state and these moves can be
determined.
Self Driving Cars – the actions of a self-driving car are not unique, it varies time to time.
3. Episodic vs Sequential:
 In an episodic environment, there is a series of one-shot actions, and only the current percept is required for
the action.
 In Episodic task environment, each of the agents action is divided into an atomic incidents or episodes. There
is no dependency between current and previous incident. In each incident agent receives input from
environment and then performs corresponding action.
 Example: Consider an example of Pick and Place robot, which is used to detect defective parts from
conveyer belt. Here, every time robot(agent) will make decision on current part i.e. there is no dependency
between current and previous decision.
 However, in Sequential environment, an agent requires memory of past actions to determine the next best
actions.
 In Sequential environment, previous decision can affect all future decisions. The next action of agent depends
on what action he has taken previously and what action he is supposed to take in future.
Example:
Checkers- Where previous move can affect all the following moves.
4. Single-agent vs Multi-agent
 If only one agent is involved in an environment, and operating by itself then such an environment is called
single agent environment.
 However, if multiple agents are operating in an environment, then such an environment is called a multi-agent
environment.
 The agent design problems in the multi-agent environment are different from single agent environment
consisting of only one agent is said to be a single-agent environment.
 A person left alone in a maze is an example of the single-agent system.
 An environment involving more than one agent is a multi-agent environment.
 The game of football is multi-agent as it involves 11 players in each team.vironment.
5. Static vs Dynamic:
 If the environment can change itself while an agent is deliberating then such environment is called a dynamic
environment else it is called a static environment.
 Static environments are easy to deal because an agent does not need to continue looking at the world while
deciding for an action.
 However for dynamic environment, agents need to keep looking at the world at each action.
 Taxi driving is an example of a dynamic environment whereas Crossword puzzles are an example of a
static environment.
 An environment that keeps constantly changing itself when the agent is up with some action is said to be
dynamic.
 A roller coaster ride is dynamic as it is set in motion and the environment keeps changing every instant.
 An idle environment with no change in its state is called a static environment.
 An empty house is static as there’s no change in the surroundings when an agent enters.
6. Discrete vs Continuous:
 If in an environment there are a finite number of percepts and actions that can be performed within it, then
such an environment is called a discrete environment else it is called continuous environment.
 A chess game comes under discrete environment as there is a finite number of moves that can be performed.
 A self-driving car is an example of a continuous environment.
 If an environment consists of a finite number of actions that can be deliberated in the environment to obtain
the output, it is said to be a discrete environment.
 The game of chess is discrete as it has only a finite number of moves. The number of moves might vary with
every game, but still, it’s finite.
 The environment in which the actions performed cannot be numbered i.e.. is not discrete, is said to be
continuous.
 Self-driving cars are an example of continuous environments as their actions are driving, parking, etc.
which cannot be numbered
7. Known vs Unknown
 Known and unknown are not actually a feature of an environment, but it is an agent's state of knowledge
to perform an action.
 In a known environment, the results for all actions are known to the agent. While in unknown
environment, agent needs to learn how it works in order to perform an action.
 It is quite possible that a known environment to be partially observable and an Unknown environment to
be fully observable.
8. Accessible vs Inaccessible
 If an agent can obtain complete and accurate information about the state's environment, then such an
environment is called an Accessible environment else it is called inaccessible.
 An empty room whose state can be defined by its temperature is an example of an accessible environment.
 Information about an event on earth is an example of Inaccessible environment.
The Structure of Intelligent Agents
The task of AI is to design an agent program which implements the agent function. The
structure of an intelligent agent is a combination of architecture and agent program. It can be viewed
as:
Agent = Architecture + Agent program
Following are the main three terms involved in the structure of an AI agent:
Architecture: Architecture is machinery that an AI agent executes on.

Agent Function: Agent function is used to map a percept to an action.
Agent program: Agent program is an implementation of agent function. An agent program executes
on the physical architecture to produce function f.
Agent’s structure can be viewed as −
 Agent = Architecture + Agent Program
 Architecture = the machinery that an agent executes on.
 Agent Program = an implementation of an agent function.
Types of AI Agents
Agents can be grouped into five classes based on their degree of perceived intelligence and
capability. All these agents can improve their performance and generate better action over the time.
These are given below:
1. Simple Reflex Agent

2. Model-based reflex agent
3. Goal-based agents
4. Utility-based agent
5. Learning agent
1. Simple Reflex agent:
 The Simple reflex agents are the simplest agents. These agents take decisions on the basis
of the current precepts and ignore the rest of the percept history.
 These agents only succeed in the fully observable environment.
 The Simple reflex agent does not consider any part of percepts history during their decision
and action process.
 The Simple reflex agent works on Condition-action rule, which means it maps the current
state to action. Such as a Room Cleaner agent, it works only if there is dirt in the room.
 Problems for the simple reflex agent design approach:
o They have very limited intelligence
o They do not have knowledge of non-perceptual parts of the current state
o Mostly too big to generate and to store.
o Not adaptive to changes in the environment.
2. Model-based reflex agent

 The Model-based agent can work in a partially observable environment, and track the
situation.
 A model-based agent has two important factors:
o Model: It is knowledge about "how things happen in the world," so it is called a
Model-based agent.
o Internal State: It is a representation of the current state based on percept history.
 These agents have the model, "which is knowledge of the world" and based on the model
they perform actions.
 Updating the agent state requires information about:
o How the world evolves
o How the agent's action affects the world.
3. Goal-based agents
 The knowledge of the current state environment is not always sufficient to decide for an
agent to what to do.
 The agent needs to know its goal which describes desirable situations.
 Goal-based agents expand the capabilities of the model-based agent by having the
"goal" information.
 They choose an action, so that they can achieve the goal.
 These agents may have to consider a long sequence of possible actions before deciding
whether the goal is achieved or not. Such considerations of different scenario are called
searching and planning, which makes an agent proactive.
4. Utility-based agents
 These agents are similar to the goal-based agent but provide an extra component of
utility measurement which makes them different by providing a measure of success at a
given state.
 Utility-based agent act based not only goals but also the best way to achieve the goal.
 The Utility-based agent is useful when there are multiple possible alternatives, and an
agent has to choose in order to perform the best action.
 The utility function maps each state to a real number to check how efficiently each
action achieves the goals.
5. Learning Agents
 A learning agent in AI is the type of agent which can learn from its past experiences, or it
has learning capabilities.
 It starts to act with basic knowledge and then able to act and adapt automatically through
learning.
 A learning agent has mainly four conceptual components, which are:
o Learning element: It is responsible for making improvements by learning from
environment
o Critic: Learning element takes feedback from critic which describes that how well the
agent is doing with respect to a fixed performance standard.
o Performance element: It is responsible for selecting external action
o Problem generator: This component is responsible for suggesting actions that will
lead to new and informative experiences.
 Hence, learning agents are able to learn, analyse performance, and look for new ways to
improve the performance.
(Subject In-charge)
Prof.S.B.Mehta)
--------------------------------------------------------------------------------------------------------------------------------------
BTCOE603(B)
Lecture
Topic to be covered
Number
Unit 2: Problem Solving (06Hrs)

1 ⮚ Solving Problems by Searching
⮚ Problem-Solving Agents
2
⮚ Example Problems
3
⮚ Searching for Solutions

4
⮚ Uninformed Search Strategies

5
⮚ Informed (Heuristic) Search Strategies, Heuristic

6
: Submitted by:
Prof. S. B. Mehta
ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

DEPARTMENT OF
COMPUTER SCIENCE
& ENGINEERING
Unit 2: Problem Solving
Problem Solving :
• In computer science, problem-solving refers to artificial intelligence techniques, including various
techniques such as forming efficient algorithms, heuristics, and performing root cause analysis to
find desirable solutions.
• In artificial intelligence, problems can be solved by using searching algorithms, evolutionary
computations, knowledge representations, etc.
• The process of problem-solving using searching consists of the following steps.
1. Define the problem
2. Analyze the problem
3. Identification of possible solutions
4. Choosing the optimal solution
5. Implementation
In today’s fast-paced digitized world, artificial intelligence techniques are used widely to
automate systems that can use the resource and time efficiently. Some of the well-known
problems experienced in everyday life are games and puzzles.
• Travelling Salesman Problem
• Tower of Hanoi Problem
• Water-Jug Problem
• N-Queen Problem
• Chess
• Sudoku
• Crypt-arithmetic Problems
• Magic Squares
• Logical Puzzles and so on.

Problem-solving agent:
 In Artificial Intelligence, Search techniques are universal problem-solving methods. Rational
agents or Problem-solving agents in AI mostly used these search strategies or algorithms to solve
a specific problem and provide the best result. Problem-solving agents are the goal-based agents
and use atomic representation.
 According to psychology, “a problem-solving refers to a state where we wish to reach to a
definite goal from a present state or condition.”
 According to computer science, a problem-solving is a part of artificial intelligence which
encompasses a number of techniques such as algorithms, heuristics to solve a problem.
 Therefore, a problem-solving agent is a goal-driven agent and focuses on satisfying the goal.
Steps performed by Problem-solving agent
1. Goal Formulation: It is the first and simplest step in problem-solving. It organizes the steps/sequence
required to formulate a target/goals which require some action to achieve the goal. Today the
formulation of the goal is based on AI agents.
2. Problem Formulation: It is the most important step of problem-solving which decides what actions
should be taken to achieve the formulated goal. There are following five components involved in
problem formulation:
Components to formulate the associated problem:
• Initial State: This state requires an initial state for the problem which starts the AI agent
towards a specified goal. In this state new methods also initialize problem domain solving by a
specific class.

• Action: It is the description of the possible actions available to the agent. This stage of
problem formulation works with function with a specific class taken from the initial state and all
possible actions done in this stage.
• Transition: It describes what each action does .This stage of problem formulation integrates
the actual action done by the previous action stage and collects the final stage to forward it to their
next stage.
• Goal test: It determines if the given state is a goal state .This stage determines that the
specified goal achieved by the integrated transition model or not, whenever the goal achieves stop
the action and forward into the next stage to determines the cost to achieve the goal.
• Path costing: It assigns a numeric cost to each path that follows the goal. The problem-
solving agent selects a cost function, which reflects its performance measure This component of
problem-solving numerical assigned what will be the cost to achieve the goal. It requires all
hardware software and human working cost.
Note: Initial state, actions, and transition model together define the state-space of the problem implicitly.
State-space of a problem is a set of all states which can be reached from the initial state followed by
any sequence of actions. The state-space forms a directed map or graph where nodes are the states, links
between the nodes are actions, and the path is a sequence of states connected by the sequence of actions.
 Search: It identifies all the best possible sequence of actions to reach the goal state from the current
state. It takes a problem as an input and returns solution as its output.
 Solution: It finds the best algorithm out of various algorithms, which may be proven as the best
optimal solution.
 Execution: It executes the best optimal solution from the searching algorithms to reach the goal state
from the current state.
Example Problems
Basically, there are two types of problem approaches:
Toy Problem: It is a concise and exact description of the problem which is used by the researchers to
compare the performance of algorithms.
Real-world Problem: It is real-world based problems which require solutions. Unlike a toy problem, it
does not depend on descriptions, but we can have a general formulation of the problem.
Some Toy Problems

8 Puzzle Problem: Here, we have a 3x3 matrix with movable tiles numbered from 1 to 8 with a blank
space. The tile adjacent to the blank space can slide into that space. The objective is to reach a specified
goal state similar to the goal state, as shown in the below figure.
In the figure, our task is to convert the current state into goal state by sliding digits into the blank space.

In the above figure, our task is to convert the current(Start) state into goal state by sliding digits into the
blank space.
The problem formulation is as follows:
 States: It describes the location of each numbered tiles and the blank tile.
 Initial State: We can start from any state as the initial state.
 Actions: Here, actions as movements of the blank space is defined, i.e., either left, right, up or down
 Transition Model: It returns the resulting state as per the given state and actions.
 Goal test: It identifies whether we have reached the correct goal-state.
 Path cost: The path cost is the number of steps in the path where the cost of each step is 1.
Note: The 8-puzzle problem is a type of sliding-block problem which is used for testing new search
algorithms in artificial intelligence.
8-queens problem: The aim of this problem is to place eight queens on a chessboard in an order where
no queen may attack another. A queen can attack other queens either diagonally or in same row and
column.
From the following figure, we can understand the problem as well as its correct solution.

it is noticed from the above figure that each queen is set into the chessboard in a position where no other
queen is placed diagonally, in same row or column. Therefore, it is one right approach to the 8-queens
problem.
For this problem, there are two main kinds of formulation:
Incremental formulation: It starts from an empty state where the operator augments a queen at each
step.
Following steps are involved in this formulation:
 States: Arrangement of any 0 to 8 queens on the chessboard.
 Initial State: An empty chessboard
 Actions: Add a queen to any empty box.
 Transition model: Returns the chessboard with the queen added in a box.
 Goal test: Checks whether 8-queens are placed on the chessboard without any attack.
 Path cost: There is no need for path cost because only final states are counted.
In this formulation, there is approximately 1.8 x 1014 possible sequences to investigate.
Complete-state formulation: It starts with all the 8-queens on the chessboard and moves them around,
saving from the attacks.
Following steps are involved in this formulation
States: Arrangement of all the 8 queens one per column with no queen attacking the other queen.
Actions: Move the queen at the location where it is safe from the attacks.
This formulation is better than the incremental formulation as it reduces the state space from 1.8 x 1014
to 2057, and it is easy to find the solutions.
Search Algorithm Terminologies
o Search: Searchingis a step by step procedure to solve a search-problem in a given search space. A
search problem can have three main factors:
a. Search Space: Search space represents a set of possible solutions, which a system may have.
b. Start State: It is a state from where agent begins the search.
c. Goal test: It is a function which observe the current state and returns whether the goal state is
achieved or not.
o Search tree: A tree representation of search problem is called Search tree. The root of the search tree
is the root node which is corresponding to the initial state.
o Actions: It gives the description of all the available actions to the agent.
o Transition model: A description of what each action do, can be represented as a transition model.
o Path Cost: It is a function which assigns a numeric cost to each path.

o Solution: It is an action sequence which leads from the start node to the goal node.
o Optimal Solution: If a solution has the lowest cost among all solutions.
Properties of Search Algorithms:

Following are the four essential properties of search algorithms to compare the efficiency of these
algorithms:
o Completeness: A search algorithm is said to be complete if it guarantees to return a solution if at

least any solution exists for any random input
o Optimality: If a solution found for an algorithm is guaranteed to be the best solution (lowest path
cost) among all other solutions, then such a solution for is said to be an optimal solution.
o Time Complexity: Time complexity is a measure of time for an algorithm to complete its task.
o Space Complexity: It is the maximum storage space required at any point during the search, as the
complexity of the problem.
Types of search algorithms
Based on the search problems we can classify the search algorithms into uninformed (Blind search) search
and informed search (Heuristic search) algorithms.

Uninformed/Blind Search:
 This type of search strategy does not have any additional information about the states except the
information provided in the problem definition.
 They can only generate the successors and distinguish a goal state from a non-goal state. These type
of search does not maintain any internal state, that’s why it is also known as Blind search.
 The uninformed search does not contain any domain knowledge such as closeness, the location of the
goal.
 It operates in a brute-force way as it only includes information about how to traverse the tree and
how to identify leaf and goal nodes.
 Uninformed search applies a way in which search tree is searched without any information about the
search space like initial state operators and test for the goal, so it is also called blind search
 It examines each node of the tree until it achieves the goal node.
It can be divided into five main types:
1. Depth-first search
2. Breadth-first search
3. Uniform cost search
4. Iterative deepening depth-first search
5. Bidirectional Search
Depth-first Search
● Depth-first search is a recursive algorithm for traversing a tree or graph data structure.
● It is called the depth-first search because it starts from the root node and follows each path to its
greatest depth node before moving to the next path.
● DFS uses a LIFO stack data structure for its implementation.
● The process of the DFS algorithm is similar to the BFS algorithm.
Note: Backtracking is an algorithm technique for finding all possible solutions using recursion.
Advantage:
● DFS requires very less memory as it only needs to store a stack of the nodes on the path from root
node to the current node.
● It takes less time to reach to the goal node than BFS algorithm (if it traverses in the right path).

Disadvantage:
● There is the possibility that many states keep re-occurring, and there is no guarantee of finding the
solution.
● DFS algorithm goes for deep down searching and sometime it may go to the infinite loop.
Example:
In the below search tree, we have shown the flow of depth-first search, and it will follow the order as:
Root node--->Left node ----> right node.
It will start searching from root node S, and traverse A, then B, then D and E, after traversing E, it will
backtrack the tree as E has no other successor and still goal node is not found. After backtracking it will
traverse node C and then G, and here it will terminate as it found goal node.
 Completeness: DFS search algorithm is complete within finite state space as it will expand every
node within a limited search tree.
 Time Complexity: Time complexity of DFS will be equivalent to the node traversed by the
algorithm. It is given by:
T(n)= 1+ n2+ n3 +.........+ nm=O(nm)
Where, m= maximum depth of any node and this can be much larger than d (Shallowest
solution depth)
 Space Complexity: DFS algorithm needs to store only single path from the root node, hence space

complexity of DFS is equivalent to the size of the fringe set, which is O(bm).
 Optimal: DFS search algorithm is non-optimal, as it may generate a large number of steps or high
cost to reach to the goal node.
Breadth-first Search:
● Breadth-first search is the most common search strategy for traversing a tree or graph. This algorithm
searches breadthwise in a tree or graph, so it is called breadth-first search.
● BFS algorithm starts searching from the root node of the tree and expands all successor node at the
current level before moving to nodes of next level.
● The breadth-first search algorithm is an example of a general-graph search algorithm.
● Breadth-first search implemented using FIFO queue data structure.
● It starts from the root node, explores the neighboring nodes first and moves towards the next level
neighbors. It generates one tree at a time until the solution is found. It can be implemented using
FIFO queue data structure. This method provides shortest path to the solution.
Advantages:
● BFS will provide a solution if any solution exists.
● If there are more than one solutions for a given problem, then BFS will provide the minimal solution
which requires the least number of steps.
Disadvantages:
● It requires lots of memory since each level of the tree must be saved into memory to expand the next
level.
● BFS needs lots of time if the solution is far away from the root node.
Example:
In the below tree structure, we have shown the traversing of the tree using BFS algorithm from the root node
S to goal node K. BFS search algorithm traverse in layers, so it will follow the path which is shown by the
dotted arrow, and the traversed path will be:
1. S---> A--->B---->C--->D---->G--->H--->E---->F---->I---->K

Time Complexity: Time Complexity of BFS algorithm can be obtained by the number of nodes traversed in
BFS until the shallowest Node. Where the d= depth of shallowest solution and b is a node at every state.
T (b) = 1+b2+b3+.......+ bd= O (bd)
Space Complexity: Space complexity of BFS algorithm is given by the Memory size of frontier which is
O(bd).
Completeness: BFS is complete, which means if the shallowest goal node is at some finite depth, then BFS
will find a solution.
Optimality: BFS is optimal if path cost is a non-decreasing function of the depth of the node.
Differences between BFS and DFS
BFS DFS
Full form BFS stands for Breadth First DFS stands for Depth First Search.
Search.
Technique It a vertex-based technique to It is an edge-based technique

find the shortest path in a graph. because the vertices along the edge
are explored first from the starting to
the end node.
Definition BFS is a traversal technique in DFS is also a traversal technique in

which all the nodes of the same which traversal is started from the
level are explored first, and then root node and explores the nodes as
we move to the next level. far as possible until we reach the
node that has no unvisited adjacent

nodes.
Data Queue data structure is used for Stack data structure is used for the
Structure the BFS traversal. BFS traversal.
Backtracking BFS does not use the DFS uses backtracking to traverse all
backtracking concept. the unvisited nodes.
Number of BFS finds the shortest path In DFS, a greater number of edges
edges having a minimum number of are required to traverse from the
edges to traverse from the source source vertex to the destination
to the destination vertex. vertex.
Optimality BFS traversal is optimal for those DFS traversal is optimal for those
vertices which are to be searched graphs in which solutions are away
closer to the source vertex. from the source vertex.
Speed BFS is slower than DFS. DFS is faster than BFS.
Suitability It is not suitable for the decision It is suitable for the decision tree.
for decision tree because it requires exploring Based on the decision, it explores all
tree all the neighbouring nodes first. the paths. When the goal is found, it
stops its traversal.
Memory It is not memory efficient as it It is memory efficient as it requires

efficient requires more memory than DFS. less memory than BFS.
Informed Search Algorithms :

 This type of search strategy contains some additional information about the states beyond the
problem definition.
 This search uses problem-specific knowledge to find more efficient solutions. This search maintains
some sort of internal states via heuristic functions (which provides hints), so it is also called heuristic
search.
 Informed search algorithms use domain knowledge.
 In an informed search, problem information is available which can guide the search. Informed search
strategies can find a solution more efficiently than an uninformed search strategy. Informed search is
also called a Heuristic search.
 A heuristic is a way which might not always be guaranteed for best solutions but guaranteed to find a
good solution in reasonable time.
 Informed search can solve much complex problem which could not be solved in another way.

 Informed Search algorithms have information on the goal state which helps in more efficient
searching. This information is obtained by a function that estimates how close a state is to the goal
state.
 The informed search algorithm is more useful for large search space. Informed search algorithm uses
the idea of heuristic, so it is also called Heuristic search.
Heuristics function:
 Heuristic is a function which is used in Informed Search, and it finds the most promising path.
 It takes the current state of the agent as its input and produces the estimation of how close
agent is from the goal.
 The heuristic method, however, might not always give the best solution, but it guaranteed to find
a good solution in reasonable time.
 Heuristic function estimates how close a state is to the goal.
 It is represented by h(n), and it calculates the cost of an optimal path between the pair of states. The
value of the heuristic function is always positive.
h(n) <= h*(n)
Here h(n) is heuristic cost, and h*(n) is the estimated cost. Hence heuristic cost should be less than or equal
to the estimated cost.
Pure Heuristic Search:
 Pure heuristic search is the simplest form of heuristic search algorithms.

 It expands nodes based on their heuristic value h(n).
 It maintains two lists, OPEN and CLOSED list.
 In the CLOSED list, it places those nodes which have already expanded and in the OPEN list, it
places nodes which have yet not been expanded.
 On each iteration, each node n with the lowest heuristic value is expanded and generates all its
successors and n is placed to the closed list. The algorithm continues unit a goal state is found.
In the informed search we will discuss two main algorithms which are given below:
o Best First Search Algorithm(Greedy search)

o A* Search Algorithm

Best-first Search Algorithm (Greedy Search):
 Greedy best-first search algorithm always selects the path which appears best at that moment.
 It is the combination of depth-first search and breadth-first search algorithms.
 It uses the heuristic function and search. Best-first search allows us to take the advantages of both
algorithms.
 With the help of best-first search, at each step, we can choose the most promising node. In the best
first search algorithm, we expand the node which is closest to the goal node and the closest cost is
estimated by heuristic function
i.e. f(n)= h(n).
Were, h(n)= estimated cost from node n to the goal.
The greedy best first algorithm is implemented by the priority queue.
Best first search algorithm:

 Step 1: Place the starting node into the OPEN list.
 Step 2: If the OPEN list is empty, Stop and return failure.
 Step 3: Remove the node n, from the OPEN list which has the lowest value of h(n), and places it in
the CLOSED list.
 Step 4: Expand the node n, and generate the successors of node n.
 Step 5: Check each successor of node n, and find whether any node is a goal node or not. If any
successor node is goal node, then return success and terminate the search, else proceed to Step 6.
 Step 6: For each successor node, algorithm checks for evaluation function f(n), and then check if the
node has been in either OPEN or CLOSED list. If the node has not been in both list, then add it to the
OPEN list.
 Step 7: Return to Step 2.
Advantages:
o Best first search can switch between BFS and DFS by gaining the advantages of both the algorithms.
o This algorithm is more efficient than BFS and DFS algorithms.
Disadvantages:
o It can behave as an unguided depth-first search in the worst case scenario.

o It can get stuck in a loop as DFS.
o This algorithm is not optimal.
Example:
Consider the below search problem, and we will traverse it using greedy best-first search. At each
iteration, each node is expanded using evaluation function f(n)=h(n) , which is given in the below
table.
In this search example, we are using two lists which are OPEN and CLOSED Lists. Following are the
iteration for traversing the above example.
Expand the nodes of S and put in the CLOSED list
Initialization: Open [A, B], Closed [S]
Iteration 1: Open [A], Closed [S, B]
Iteration 2: Open [E, F, A], Closed [S, B]

: Open [E, A], Closed [S, B, F]

Iteration 3: Open [I, G, E, A], Closed [S, B, F]
: Open [I, E, A], Closed [S, B, F, G]
Hence the final solution path will be: S----> B----->F----> G
Time Complexity: The worst case time complexity of Greedy best first search is O(bm).
Space Complexity: The worst case space complexity of Greedy best first search is O(b m). Where, m is the
maximum depth of the search space.
Complete: Greedy best-first search is also incomplete, even if the given state space is finite.
Optimal: Greedy best first search algorithm is not optimal.
A* Search Algorithm:
 A* search is the most commonly known form of best-first search.

 It uses heuristic function h(n), and cost to reach the node n from the start state g(n).
 It has combined features of UCS and greedy best-first search, by which it solve the problem
efficiently.
 A* search algorithm finds the shortest path through the search space using the heuristic function.
 This search algorithm expands less search tree and provides optimal result faster. A* algorithm is
similar to UCS except that it uses g(n)+h(n) instead of g(n).
 In A* search algorithm, we use search heuristic as well as the cost to reach the node. Hence we can
combine both costs as following, and this sum is called as a fitness number.
At each point in the search space, only those node is expanded which have the lowest value of f(n), and the
algorithm terminates when the goal node is found.
A* Search Algorithm
 Step1: Place the starting node in the OPEN list.

 Step 2: Check if the OPEN list is empty or not, if the list is empty then return failure and stops.

 Step 3: Select the node from the OPEN list which has the smallest value of evaluation function
(g+h), if node n is goal node then return success and stop, otherwise
 Step 4: Expand node n and generate all of its successors, and put n into the closed list. For each
successor n', check whether n' is already in the OPEN or CLOSED list, if not then compute
evaluation function for n' and place into Open list.
 Step 5: Else if node n' is already in OPEN and CLOSED, then it should be attached to the back
pointer which reflects the lowest g(n') value.
 Step 6: Return to Step 2.
Advantages:
o A* search algorithm is the best algorithm than other search algorithms.

o A* search algorithm is optimal and complete.
o This algorithm can solve very complex problems.
Disadvantages:
o It does not always produce the shortest path as it mostly based on heuristics and approximation.
o A* search algorithm has some complexity issues.
o The main drawback of A* is memory requirement as it keeps all generated nodes in the memory, so
it is not practical for various large-scale problems.
Example:
In this example, we will traverse the given graph using the A* algorithm. The heuristic value of all states is
given in the below table so we will calculate the f(n) of each state using the formula f(n)= g(n) + h(n), where
g(n) is the cost to reach any node from start state.
o Here we will use OPEN and CLOSED list.

Initialization: {(S, 5)}
Iteration1: {(S--> A, 4), (S-->G, 10)}
Iteration2: {(S--> A-->C, 4), (S--> A-->B, 7), (S-->G, 10)}
Iteration3: {(S--> A-->C--->G, 6), (S--> A-->C--->D, 11), (S--> A-->B, 7), (S-->G, 10)}
Iteration 4 :will give the final result, as S--->A--->C--->G it provides the optimal path with cost 6.
Points to remember:

o A* algorithm returns the path which occurred first, and it does not search for all remaining paths.
o The efficiency of A* algorithm depends on the quality of heuristic.
o A* algorithm expands all nodes which satisfy the condition f(n)
Complete: A* algorithm is complete as long as:
o Branching factor is finite.

o Cost at every action is fixed.
Optimal: A* search algorithm is optimal if it follows below two conditions:
 Admissible: the first condition requires for optimality is that h(n) should be an admissible heuristic
for A* tree search. An admissible heuristic is optimistic in nature.
 Consistency: Second required condition is consistency for only A* graph-search.
 If the heuristic function is admissible, then A* tree search will always find the least cost path.
 Time Complexity: The time complexity of A* search algorithm depends on heuristic function, and
the number of nodes expanded is exponential to the depth of solution d. So the time complexity is
O(b^d), where b is the branching factor.
 Space Complexity: The space complexity of A* search algorithm is O(b^d)
Difference between Informed and Uninformed Search in AI
Parameters Informed Search Uninformed Search
Utilizing It uses knowledge during the It does not require using any knowledge during
Knowledge process of searching. the process of searching.
Speed Finding the solution is quicker. Finding the solution is much slower
comparatively.
Completion It can be both complete and It is always bound to be complete.

incomplete.
Consumption of Due to a quicker search, it Due to slow searches, it consumes

Time consumes much less time. comparatively more time.
Cost Incurred The expenses are much lower. The expenses are comparatively higher.
Suggestion/ The AI gets suggestions The AI does not get any suggestions regarding
Direction regarding how and where to what solution to find and where to find it.
find a solution to any problem. Whatever knowledge it gets is out of the
information provided.

Efficiency It costs less and generates It costs more and generates slower results. Thus,
quicker results. Thus, it is it is comparatively less efficient.
comparatively more efficient.
Length of Implementation is shorter using The implementation is lengthier using AI.

Implementation AI.
Examples A few examples include Graph A few examples include Breadth-First Search or
Search and Greedy Search. BFS and Depth-First Search or DFS.
(Subject In-charge)
Prof.S.B.Mehta)

--------------------------------------------------------------------------------------------------------------------------------------
BTCOE603(B)
Lecture
Topic to be covered
Number
Unit 3: Constraint Satisfaction Problem (06Hrs)

1 ⮚ Constraint Satisfaction Problem
⮚ Constraint Type
2
⮚ Constraint Propagation
3
⮚ Backtracking search
4
⮚ Local search
5
⮚ Hill Climbing Algorithm and Simulated Annealing

6
: Submitted by:
Prof. S. B. Mehta

DEPARTMENT OF
COMPUTER SCIENCE
& ENGINEERING
Unit 3: Constraint Satisfaction Problem
Constraint Satisfaction Problem:

• We have seen so many techniques like Local search, Adversarial search to solve different problems.
The objective of every problem-solving technique is one, i.e., to find a solution to reach the goal.
Although, in adversarial search and local search, there were no constraints on the agents while
solving the problems and reaching to its solutions.
• Constraint satisfaction means solving a problem under certain constraints or rules.
• Constraint satisfaction is a technique where a problem is solved when its values satisfy certain
constraints or rules of the problem. Such type of technique leads to a deeper understanding of the
problem structure as well as its complexity.
• CSPs represent a state with a set of variable/value pairs and represent the conditions for a solution by
a set of constraints on the variables. Many important real-world problems can be described as CSPs.
Constraint satisfaction depends on three components, namely:

1. X: It is a set of variables. {X1, …, Xn}.
2. D: It is a set of domains where the variables reside. There is a specific domain for each
variable. {D1, …, Dn}, Each domain Di consists of a set of allowable values, {v1, …, vk} for
variable Xi.
3. C: It is a set of constraints which are followed by the set of variables.
The constraint value consists of a pair of {scope, rel}. The scope is a tuple of variables
which participate in the constraint and rel is a relation which includes a list of values which the
variables can take to satisfy the constraints of the problem.
A relation can be represented as: a. an explicit list of all tuples of values that satisfy the constraint;
or b. an abstract relation that supports two operations. (e.g. if X1 and X2 both have the domain {A,B},
the constraint saying “the two variables must have different values” can be written as a.
<(X1,X2),[(A,B),(B,A)]> or b. <(X1,X2),X1≠X2>.
Solving Constraint Satisfaction Problems

The requirements to solve a constraint satisfaction problem (CSP) are:
 A state-space
 The notion of the solution.
A state in state-space is defined by assigning values to some or all variables such as
{X1=v1, X2=v2, and so on…}.
Assignment
An assignment of values to a variable can be done in three ways:
Each state in a CSP is defined by an assignment of values to some of the variables, {Xi=vi, Xj=vj, …};
 Consistent or Legal Assignment: An assignment which does not violate any constraint or rule is
called Consistent or legal assignment.
 Complete Assignment: An assignment where every variable is assigned with a value, and the
solution to the CSP remains consistent. Such assignment is known as Complete assignment.
 Partial Assignment: An assignment which assigns values to some of the variables only. Such type
of assignments are called Partial assignments.
Types of Domains in CSP
There are following two types of domains which are used by the variables :
 Discrete Domain: It is an infinite domain which can have one state for multiple variables. For
example, a start state can be allocated infinite times for each variable. E.g. Map-coloring problems,
scheduling with time limits, the 8-queens problem.
 Finite Domain: It is a finite domain which can have continuous states describing one domain for one
specific variable. It is also called a continuous domain.
e.g. The set of integers or strings. With infinite domains, to describe constraints, a constraint language
must be used instead of enumerating all allowed combinations of values.
Constraint Types in CSP
With respect to the variables, basically there are following types of constraints:
 Unary Constraints: It is the simplest type of constraints that restricts the value of a single variable.
 Binary Constraints: It is the constraint type which relates two variables. A value x2 will contain a
value which lies between x1 and x3. A binary constraint relates two variables. (e.g. SA≠NSW.) A
binary CSP is one with only binary constraints, can be represented as a constraint graph.
 Global Constraints: It is the constraint type which involves an arbitrary number of variables. (Need
not involve all the variable in a problem.) One of the most common global constraint is Alldiff,
which says that all of the variables involved in the constraint must have different values

 Some special types of solution algorithms are used to solve the following types of constraints:
 Linear Constraints: These type of constraints are commonly used in linear programming where
each variable containing an integer value exists in linear form only.
 Non-linear Constraints: These type of constraints are used in non-linear programming where each
variable (an integer value) exists in a non-linear form.
Note: A special constraint which works in real-world is known as Preference constraint.
To formulate a CSP:
define the variables to be the regions X = {WA, NT, Q, NSW, V, SA, T}.
The domain of each variable is the set Di = {red, green, blue}.
The constraints is C = {SA≠WA, SAW≠NT, SA≠Q, SA≠NSW, SA≠V, WA≠NT, NT≠Q, Q≠NSW,
NSW≠V}. ( SA≠WA is a shortcut for <(SA,WA),SA≠WA>. )
Constraint graph: The nodes of the graph correspond to variables of the problem, and a link connects to
any two variables that participate in a constraint.
Advantage of formulating a problem as a CSP:
1) CSPs yield a natural representation for a wide variety of problems;

2) CSP solvers can be faster than state-space searchers because the CSP solver can quickly eliminate
large swatches of the search space;
3) With CSP, once we find out that a partial assignment is not a solution, we can immediately discard
further refinements of the partial assignment.
4) We can see why a assignment is not a solution—which variables violate a constraint.

Constraint Propagation
In local state-spaces, the choice is only one, i.e., to search for a solution. But in CSP, we have two choices
either:
 We can search for a solution or
 We can perform a special type of inference called constraint propagation.
Constraint propagation is a special type of inference which helps in reducing the legal number of values for
the variables. The idea behind constraint propagation is local consistency.
 In local consistency, variables are treated as nodes, and each binary constraint is treated as an arc in
the given problem.
 If we treat each variable as a node in a graph and each binary constraint as an arc, then the process of
enforcing local consistency in each part of the graph causes inconsistent values to be eliminated
throughout the graph.
There are following local consistencies which are discussed below:
 Node Consistency: A single variable is said to be node consistent if all the values in the variable’s
domain satisfy the unary constraints on the variables. A single variable (a node in the CSP network)
is node-consistent if all the values in the variable’s domain satisfy the variable’s unary constraint.
We say that a network is node-consistent if every variable in the network is node-consistent.
 Arc Consistency: A variable is arc consistent if every value in its domain satisfies the binary
constraints of the variables.
A variable in a CSP is arc-consistent if every value in its domain satisfies the variable’s binary
constraints.
Xi is arc-consistent with respect to another variable Xj if for every value in the current domain Di
there is some value in the domain Dj that satisfies the binary constraint on the arc (Xi, Xj).
A network is arc-consistent if every variable is arc-consistent with every other variable.
Arc consistency tightens down the domains (unary constraint) using the arcs (binary constraints).
 Path Consistency: When the evaluation of a set of two variable with respect to a third variable can
be extended over another variable, satisfying all the binary constraints. It is similar to arc
consistency.
 k-consistency: This type of consistency is used to define the notion of stronger forms of
propagation. Here, we examine the k-consistency of the variables.
CSP Problems
Constraint satisfaction includes those problems which contains some constraints while solving the
problem. CSP includes the following problems:
 Graph Coloring: The problem where the constraint is that no adjacent sides can have the same
color.

.
 Sudoku Playing: The gameplay where the constraint is that no number from 0-9 can be repeated in
the same row or column.
 N-queen problem: In n-queen problem, the constraint is that no queen should be placed either
diagonally, in the same row or column.
 Crossword: In crossword problem, the constraint is that there should be the correct formation of the
words, and it should be meaningful.

Backtracking search for CSPs:
 Backtracking can be defined as a general algorithmic technique that considers searching every
possible combination in order to solve a computational problem.
 Backtracking is a technique based on algorithm to solve problem. It uses recursive calling to find the
solution by building a solution step by step increasing values with time. It removes the solutions that
don’t give rise to the solution of the problem based on the constraints given to solve the problem.
 The Backtracking is an algorithmic-technique to solve a problem by an incremental way. It uses
recursive approach to solve the problems. We can say that the backtracking is used to find all
possible combination to solve an optimization problem
 In backtracking problem, the algorithm tries to find a sequence path to the solution which has some
small checkpoints from where the problem can backtrack if no feasible solution is found for the
problem.
 Backtracking search: A depth-first search that chooses values for one variable at a time and
backtracks when a variable has no legal values left to assign
 In backtracking problem, the algorithm tries to find a sequence path to the solution which has some
small checkpoints from where the problem can backtrack if no feasible solution is found for the
problem
There are three types of problems in backtracking –

1. Decision Problem – In this, we search for a feasible solution.
2. Optimization Problem – In this, we search for the best solution.

3. Enumeration Problem – In this, we find all feasible solutions.
Example,
Here,
Green is the start point, blue is the intermediate point, red are points with no feasible solution, dark green is
end solution.
Here, when the algorithm propagates to an end to check if it is a solution or not, if it is then returns the
solution otherwise backtracks to the point one step behind it to find track to the next point to find solution
Algorithm
1. Step 1 − if current_position is goal, return success
2. Step 2 − else,
3. Step 3 − if current_position is an end point, return failed.
4. Step 4 − else, if current_position is not end point, explore and repeat above steps.
Let’s use this backtracking problem to find the solution to N-Queen Problem.
In N-Queen problem, we are given an NxN chessboard and we have to place n queens on the board in such
a way that no two queens attack each other. A queen will attack another queen if it is placed in horizontal,
vertical or diagonal points in its way. Here, we will do 4-Queen problem.
Here, the solution is –
Here, the binary output for n queen problem with 1’s as queens to the positions are placed.
{0 , 1 , 0 , 0}
{0 , 0 , 0 , 1}

{1 , 0 , 0 , 0}
{0 , 0 , 1 , 0}
For solving n queens problem, we will try placing queen into different positions of one row. And checks if it
clashes with other queens. If current positioning of queens if there are any two queens attacking each other.
If they are attacking, we will backtrack to previous location of the queen and change its positions. And
check clash of queen again.
Backtracking Algorithm: The idea is to place queens one by one in different columns, starting from the
leftmost column. When we place a queen in a column, we check for clashes with already placed queens. In
the current column, if we find a row for which there is no clash, we mark this row and column as part of the
solution. If we do not find such a row due to clashes then we backtrack and return false.
1) Start in the leftmost column

2) If all queens are placed
return true
3) Try all rows in the current column. Do following for every tried row.
a) If the queen can be placed safely in this row then mark this [row,
column] as part of the solution and recursively check if placing
queen here leads to a solution.
b) If placing the queen in [row, column] leads to a solution then return
true.
c) If placing queen doesn't lead to a solution then unmark this [row,
column] (Backtrack) and go to step (a) to try other rows.
4) If all rows have been tried and nothing worked, return false to trigger
backtracking.
Algorithm
Step 1 − Start from 1st position in the array.
Step 2 − Place queens in the board and check. Do,
Step 2.1 − After placing the queen, mark the position as a part of the solution and then recursively check if
this will lead to a solution.
Step 2.2 − Now, if placing the queen doesn’t lead to a solution and trackback and go to step (a) and place
queens to other rows.
Step 2.3 − If placing queen returns a lead to solution return TRUE.
Step 3 − If all queens are placed return TRUE.
Step 4 − If all rows are tried and no solution is found, return FALSE.
Local Search Algorithms and Optimization Problem

A local search algorithm completes its task by traversing on a single current node rather than
multiple paths and following the neighbours of that node generally. local search algorithms” where the
path cost does not matters, and only focus on solution-state needed to reach the goal node.
 Local search algorithms use a very little or constant amount of memory as they operate only on a
single path.
 Most often, they find a reasonable solution in large or infinite state spaces where the classical or
systematic algorithms do not work.
The local search algorithm works for pure optimized problems. A pure optimization problem is one where
all the nodes can give a solution. But the target is to find the best state out of all according to the objective
function.
Working of a Local search algorithm

Let's understand the working of a local search algorithm with the help of an example:
Consider the below state-space landscape having both:
 Location: It is defined by the state.
 Elevation: It is defined by the value of the objective function or heuristic cost function.
The local search algorithm explores the above landscape by finding the following two points:
 Global Minimum: If the elevation corresponds to the cost, then the task is to find the lowest valley,
which is known as Global Minimum.
 Global Maxima: If the elevation corresponds to an objective function, then it finds the highest peak
which is called as Global Maxima. It is the highest point in the valley.
We will understand the working of these points better in Hill-climbing search.
Different types of local searches:

 Hill-climbing Search
 Simulated Annealing

 Local Beam Search
Note: Local search algorithms do not burden to remember all the nodes in the memory; it operates on
complete state-formulatio.
Hill Climbing Algorithm in AI

Hill Climbing Algorithm: Hill climbing search is a local search problem. The purpose of the hill climbing
search is to climb a hill and reach the topmost peak/ point of that hill. It is based on the heuristic search
technique where the person who is climbing up on the hill estimates the direction which will lead him to the
highest peak.
State-space Landscape of Hill climbing algorithm

To understand the concept of hill climbing algorithm, consider the below landscape representing the goal
state/peak and the current state of the climber. The topographical regions shown in the figure can be
defined as:
 Global Maximum: It is the highest point on the hill, which is the goal state.
 Local Maximum: It is the peak higher than all other peaks but lower than the global maximum.
 Flat local maximum: It is the flat area over the hill where it has no uphill or downhill. It is a
saturated point of the hill.
 Shoulder: It is also a flat area where the summit is possible.
 Current state: It is the current position of the person
Types of Hill climbing search algorithm

There are following types of hill-climbing search:
 Simple hill climbing
 Steepest-ascent hill climbing
 Stochastic hill climbing
 Random-restart hill climbing
Simple hill climbing search

Simple hill climbing is the simplest technique to climb a hill. The task is to reach the highest peak of the
mountain. Here, the movement of the climber depends on his move/steps. If he finds his next step better than
the previous one, he continues to move else remain in the same state. This search focus only on his previous
and next step.
Simple hill climbing Algorithm
1. Create a CURRENT node, NEIGHBOUR node, and a GOAL node.

2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Else CURRENT node<= NEIGHBOUR node, move ahead.
4. Loop until the goal is not reached or a point is not found.
Steepest-ascent hill climbing

Steepest-ascent hill climbing is different from simple hill climbing search. Unlike simple hill climbing
search, It considers all the successive nodes, compares them, and choose the node which is closest to the
solution. Steepest hill climbing search is similar to best-first search because it focuses on each node instead
of one.
Note: Both simple, as well as steepest-ascent hill climbing search, fails when there is no closer node.
Steepest-ascent hill climbing algorithm
1. Create a CURRENT node and a GOAL node.
2. If the CURRENT node=GOAL node, return GOAL and terminate the search.
3. Loop until a better node is not found to reach the solution.
4. If there is any better successor node present, expand it.
5. When the GOAL is attained, return GOAL and terminate.
Stochastic hill climbing
Stochastic hill climbing does not focus on all the nodes. It selects one node at random and decides whether it
should be expanded or search for a better one.
Random-restart hill climbing
Random-restart algorithm is based on try and try strategy. It iteratively searches the node and selects the
best one at each step until the goal is not found. The success depends most commonly on the shape of the
hill. If there are few plateaus, local maxima, and ridges, it becomes easy to reach the destination.
Limitations of Hill climbing algorithm

Hill climbing algorithm is a fast and furious approach. It finds the solution state rapidly because it is quite
easy to improve a bad state. But, there are following limitations of this search:
 Local Maxima: It is that peak of the mountain which is highest than all its neighboring states but
lower than the global maxima. It is not the goal peak because there is another peak higher than it.

 Plateau: It is a flat surface area where no uphill exists. It becomes difficult for the climber to decide
that in which direction he should move to reach the goal point. Sometimes, the person gets lost in the
flat area.
 Ridges: It is a challenging problem where the person finds two or more local maxima of the same
height commonly. It becomes difficult for the person to navigate the right point and stuck to that
point itself.

Simulated Annealing
 In simple terms, ‘Annealing’ is a technique, where a metal is heated to a high temperature and slowly
cooled down to improve its physical properties. When the metal is hot, the molecules randomly re-
arrange themselves at a rapid pace.
 As the metal starts to cool down, the re-arranging process occurs at a much slower rate. In the end,
the resultant metal will be a desired workable metal. The factors of time and metal’s energy at a
particular time will supervise the entire process.
 Simulated annealing is similar to the hill climbing algorithm. It works on the current situation. It
picks a random move instead of picking the best move. If the move leads to the improvement of the
current situation, it is always accepted as a step towards the solution state, else it accepts the move
having a probability less than 1. This search technique was first used in 1980 to solve VLSI
layout problems.
Steps:
1. The initial step is to select a subset of features at random.
2. Then choose the no. of iterations. A ML model is then built and the predictive performance
(otherwise called objective function) is calculated.
3. A small percentage of features are randomly included/excluded from the model. This is just to
‘perturb’ the features. Then the predictive performance is calculated once again for this new set of
features.
Advantages of Simulated Annealing
1. Simulated annealing is easy to code and use.

2. It does not rely on restrictive properties of the model and hence is versatile.
3. It can deal with noisy data and highly non-linear models.
4. Provides optimal solution for many problems and is robust.so applied for factory scheduling and
other large optimization tasks.
Disadvantages of Simulated Annealing
1. Repeatedly annealing with a schedule is very slow, especially if the cost function is expensive to
compute
2. For problems where the energy landscape is smooth, or there are few local minima, SA is overkill ---
simpler, faster methods (e.g., gradient descent) will work better. But usually don't know what the
energy landscape is.
3. Heuristic methods, which are problem-specific or take advantage of extra information about the
system, will often be better than general methods. But SA is often comparable to heuristics.
4. The method cannot tell whether it has found an optimal solution. Some other method (e.g. branch
and bound) is required to do this.
Local Beam Search

Local beam search is quite different from random-restart search. It keeps track of k states instead of just one.
It selects k randomly generated states, and expand them at each step. If any state is a goal state, the search
stops with success. Else it selects the best k successors from the complete list and repeats the same process.
In random-restart search where each search process runs independently, but in local beam search, the
necessary information is shared between the parallel search processes.
Disadvantages of Local Beam search
 This search can suffer from a lack of diversity among the k states.
 It is an expensive version of hill climbing search.
Note: A variant of Local Beam Search is Stochastic Beam Search which selects k successors at random
rather than choosing the best k successors.
(Subject In-charge)
Prof.S.B.Mehta)

--------------------------------------------------------------------------------------------------------------------------------------
BTCOE603(B)
Lecture
Topic to be covered
Number
Unit 4: Game Playing (06Hrs)

1 ⮚ Adversarial Search
⮚ Games
2
⮚ Optimal Decisions in Games

3
⮚ Game tree:
4
⮚ Mini-Max Algorithm
5
⮚ Alpha-beta pruning
6
: Submitted by:
Prof. S. B. Mehta

DEPARTMENT OF
COMPUTER SCIENCE
& ENGINEERING
Unit 4: Game Playing
Adversarial Search:
• Adversarial search is a search, where we examine the problem which arises when we try to plan
ahead of the world and other agents are planning against us.
• AI Adversarial search: Adversarial search is a game-playing technique where the agents are
surrounded by a competitive environment.
• A conflicting goal is given to the agents (multiagent). These agents compete with one another and try
to defeat one another in order to win the game.Such conflicting goals give rise to the adversarial
search. Here, game-playing means discussing those games where human intelligence and logic factor
is used, excluding other factors such as luck factor. Tic-tac-toe, chess, checkers, etc., are such type of
games where no luck factor works, only mind works.
• Mathematically, this search is based on the concept of ‘Game Theory.’ According to game theory, a
game is played between two players. To complete the game, one has to win the game and the other
loses automatically.
Games
 We have studied the search strategies which are only associated with a single agent that aims to find
the solution which often expressed in the form of a sequence of actions.
 But, there might be some situations where more than one agent is searching for the solution in the
same search space, and this situation usually occurs in game playing.

 The environment with more than one agent is termed as multi-agent environment, in which each
agent is an opponent of other agent and playing against each other. Each agent needs to consider the
action of other agent and effect of that action on their performance.
 So, Searches in which two or more players with conflicting goals are trying to explore the same
search space for the solution, are called adversarial searches, often known as Games.
 Games are modeled as a Search problem and heuristic evaluation function, and these are the two
main factors which help to model and solve games in AI.
Types of Games in AI:
Deterministic Chance Moves
Perfect information Chess, Checkers, go, Othello Backgammon, monopoly
Imperfect information Battleships, blind, tic-tac-toe Bridge, poker, scrabble,

nuclear war
There are following four types of Game :
o Perfect information: A game with the perfect information is that in which agents can look into the
complete board. Agents have all the information about the game, and they can see each other moves
also. Examples are Chess, Checkers, Go, etc.
o Imperfect information: If in a game agents do not have all information about the game and not
aware with what's going on, such type of games are called the game with imperfect information, such
as tic-tac-toe, Battleship, blind, Bridge, etc.
o Deterministic games: Deterministic games are those games which follow a strict pattern and set of
rules for the games, and there is no randomness associated with them. Examples are chess, Checkers,
Go, tic-tac-toe, etc.
o Non-deterministic games: Non-deterministic are that game which have various unpredictable
events and has a factor of chance or luck. This factor of chance or luck is introduced by either dice or
cards. These are random, and each action response is not fixed. Such games are also called as
stochastic games.
Example: Backgammon, Monopoly, Poker, etc.

Zero-Sum Game
 Zero-sum games are adversarial search which involves pure competition.
 In Zero-sum game each agent's gain or loss of utility is exactly balanced by the losses or gains of
utility of another agent.
 One player of the game try to maximize one single value, while other player tries to minimize it.
 Each move by one player in the game is called as ply.
 Chess and tic-tac-toe are examples of a Zero-sum game.
Zero-sum game: Embedded thinking

The Zero-sum game involved embedded thinking in which one agent or player is trying to figure out:
 What to do.
 How to decide the move
 Needs to think about his opponent as well
 The opponent also thinks what to do
Each of the players is trying to find out the response of his opponent to their actions. This requires
embedded thinking or backward reasoning to solve the game problems in AI.
Elements of Game Playing search (Formalization of the problem) :
A game can be defined as a type of search in AI which can be formalized of the following elements:
o Initial state: It specifies how the game is set up at the start.

o Player(s): It specifies which player has moved in the state space.
o Action(s): It returns the set of legal moves in state space.
o Result(s, a): It is the transition model, which specifies the result of moves in the state space.
o Terminal-Test(s): Terminal test is true if the game is over, else it is false at any case. The state
where the game ends is called terminal states.
o Utility(s, p): It defines the final value with which the game has ended. This function is also known
as Objective function or Payoff function. The price which the winner will get. i.e. (-1): If the
PLAYER loses. (+1): If the PLAYER wins. (0): If there is a draw between the PLAYERS.
o E.g.For Chess, the outcomes are a win, loss, or draw and its payoff values are +1, 0, ½. And for tic-
tac-toe, utility values are +1, -1, and 0.

Game tree:
A game tree is a tree where nodes of the tree are the game states and Edges of the tree are the moves
by players. Game tree involves initial state, actions function, and result Function.
Example: Tic-Tac-Toe game tree:
The following figure is showing part of the game-tree for tic-tac-toe game. Following are some key
points of the game:
o There are two players MAX and MIN.

o Players have an alternate turn and start with MAX.
o MAX maximizes the result of the game tree
A game-tree for tic-tac-toe

 INITIAL STATE (S0): The top node in the game-tree represents the initial state in the tree and
shows all the possible choice to pick out one.
 PLAYER (s): There are two players, MAX and MIN. MAX begins the game by picking one best
move and place X in the empty square box.

 ACTIONS (s): Both the players can make moves in the empty boxes chance by chance.
 RESULT (s, a): The moves made by MIN and MAX will decide the outcome of the game.
 TERMINAL-TEST(s): When all the empty boxes will be filled, it will be the terminating state of
the game.
 UTILITY: At the end, we will get to know who wins: MAX or MIN, and accordingly, the price will
be given to them.
Example Explanation:
 From the initial state, MAX has 9 possible moves as he starts first. MAX place x and MIN place o,
and both player plays alternatively until we reach a leaf node where one player has three in a row or
all squares are filled.
 Both players will compute each node, minimax, the minimax value which is the best achievable
utility against an optimal adversary.
 Suppose both the players are well aware of the tic-tac-toe and playing the best play. Each player is
doing his best to prevent another one from winning. MIN is acting against Max in the game.
 So in the game tree, we have a layer of Max, a layer of MIN, and each layer is called as Ply. Max
place x, then MIN puts o to prevent Max from winning, and this game continues until the terminal
node.
 In this either MIN wins, MAX wins, or it's a draw. This game-tree is the whole search space of
possibilities that MIN and MAX are playing tic-tac-toe and taking turns alternately.
Hence adversarial Search for the minimax procedure works as follows:
 It aims to find the optimal strategy for MAX to win the game.
 It follows the approach of Depth-first search.
 In the game tree, optimal leaf node could appear at any depth of the tree.
 Propagate the minimax values up to the tree until the terminal node discovered.
In a given game tree, the optimal strategy can be determined from the minimax value of each node, which
can be written as MINIMAX(n). MAX prefer to move to a state of maximum value and MIN prefer to move
to a state of minimum value then:

Techniques required to get the best optimal solution
There is always a need to choose those algorithms which provide the best optimal solution in a limited time. So, we
use the following techniques which could fulfill our requirements:
 Pruning: A technique which allows ignoring the unwanted portions of a search tree which make no difference
in its final result.
 Heuristic Evaluation Function: It allows to approximate the cost value at each level of the search tree,
before reaching the goal node
Types of algorithms in Adversarial search

 In a normal search, we follow a sequence of actions to reach the goal or to finish the game optimally.
But in an adversarial search, the result depends on the players which will decide the result of the
game. It is also obvious that the solution for the goal state will be an optimal solution because the
player will try to win the game with the shortest path and under limited time.
There are following types of adversarial search:

1. Minmax Algorithm
2. Alpha-beta Pruning.
Mini-Max Algorithm in Artificial Intelligence

 Mini-max algorithm is a recursive or backtracking algorithm which is used in decision-making and
game theory. It provides an optimal move for the player assuming that opponent is also playing
optimally.
 Mini-Max algorithm uses recursion to search through the game-tree.
 Min-Max algorithm is mostly used for game playing in AI. Such as Chess, Checkers, tic-tac-toe, go,
and various tow-players game. This Algorithm computes the minimax decision for the current state.
 In this algorithm two players play the game, one is called MAX and other is called MIN.
 Both the players fight it as the opponent player gets the minimum benefit while they get the
maximum benefit.
 Both Players of the game are opponent of each other, where MAX will select the maximized value
and MIN will select the minimized value.

 The minimax algorithm performs a depth-first search algorithm for the exploration of the complete
game tree.
 The minimax algorithm proceeds all the way down to the terminal node of the tree, then backtrack
the tree as the recursion.
Working of Min-Max Algorithm:

o The working of the minimax algorithm can be easily described using an example. Below we have
taken an example of game-tree which is representing the two-player game.
o In this example, there are two players one is called Maximizer and other is called Minimizer.
o Maximizer will try to get the Maximum possible score, and Minimizer will try to get the minimum
possible score.
o This algorithm applies DFS, so in this game-tree, we have to go all the way through the leaves to
reach the terminal nodes.
o At the terminal node, the terminal values are given so we will compare those value and backtrack the
tree until the initial state occurs.
Following are the main steps involved in solving the two-player game tree:
Step-1: In the first step, the algorithm generates the entire game-tree and apply the utility function to get the
utility values for the terminal states. In the below tree diagram, let's take A is the initial state of the tree.
Suppose maximizer takes first turn which has worst-case initial value =- infinity, and minimizer will take
next turn which has worst-case initial value = +infinity.

Step 2: Now, first we find the utilities value for the Maximizer, its initial value is -∞, so we will compare
each value in terminal state with initial value of Maximizer and determines the higher nodes values. It will
find the maximum among the all.
o For node D max(-1,- -∞) => max(-1,4)= 4
o For Node E max(2, -∞) => max(2, 6)= 6
o For Node F max(-3, -∞) => max(-3,-5) = -3
o For node G max(0, -∞) = max(0, 7) = 7
Step 3: In the next step, it's a turn for minimizer, so it will compare all nodes value with +∞, and will find
the 3rd layer node values.
o For node B= min(4,6) = 4
o For node C= min (-3, 7) = -3

Step 4: Now it's a turn for Maximizer, and it will again choose the maximum of all nodes value and find the
maximum value for the root node. In this game tree, there are only 4 layers, hence we reach immediately to
the root node, but in real games, there will be more than 4 layers.
o For node A max(4, -3)= 4
That was the complete workflow of the minimax two player game.
Properties of Mini-Max algorithm:
o Complete- Min-Max algorithm is Complete. It will definitely find a solution (if exist), in the finite
search tree.
o Optimal- Min-Max algorithm is optimal if both opponents are playing optimally.
o Time complexity- As it performs DFS for the game-tree, so the time complexity of Min-Max
algorithm is O(bm), where b is branching factor of the game-tree, and m is the maximum depth of the
tree.
o Space Complexity- Space complexity of Mini-max algorithm is also similar to DFS which is O(bm).
Limitation of the minimax Algorithm:
The main drawback of the minimax algorithm is that it gets really slow for complex games such as Chess,
go, etc. This type of games has a huge branching factor, and the player has lots of choices to decide. This
limitation of the minimax algorithm can be improved from alpha-beta pruning which we have discussed in
the next topic.
Alpha-Beta Pruning
o Alpha-beta pruning is a modified version of the minimax algorithm. It is an optimization technique
for the minimax algorithm.

o As we have seen in the minimax search algorithm that the number of game states it has to examine
are exponential in depth of the tree. Since we cannot eliminate the exponent, but we can cut it to half.
Hence there is a technique by which without checking each node of the game tree we can compute
the correct minimax decision, and this technique is called pruning. This involves two threshold
parameter Alpha and beta for future expansion, so it is called alpha-beta pruning. It is also called
as Alpha-Beta Algorithm.
o Alpha-beta pruning can be applied at any depth of a tree, and sometimes it not only prune the tree
leaves but also entire sub-tree.
o The two-parameter can be defined as:
a. Alpha: The best (highest-value) choice we have found so far at any point along the path of
Maximizer. The initial value of alpha is -∞.
b. Beta: The best (lowest-value) choice we have found so far at any point along the path of Minimizer.
The initial value of beta is +∞.
o The Alpha-beta pruning to a standard minimax algorithm returns the same move as the standard
algorithm does, but it removes all the nodes which are not really affecting the final decision but
making algorithm slow. Hence by pruning these nodes, it makes the algorithm fast.
Condition for Alpha-beta pruning:

The main condition which required for alpha-beta pruning is:
α>=β
Key points about alpha-beta pruning:
 The Max player will only update the value of alpha.

 The Min player will only update the value of beta.
 While backtracking the tree, the node values will be passed to upper nodes instead of values of alpha
and beta.
 We will only pass the alpha, beta values to the child nodes.
Working of Alpha-Beta Pruning:

Let's take an example of two-player search tree to understand the working of Alpha-beta pruning
Step 1: At the first step the, Max player will start first move from node A where α= -∞ and β= +∞, these
value of alpha and beta passed down to node B where again α= -∞ and β= +∞, and Node B passes the same
value to its child D.

Step 2: At Node D, the value of α will be calculated as its turn for Max. The value of α is compared with
firstly 2 and then 3, and the max (2, 3) = 3 will be the value of α at node D and node value will also 3
Step 3: Now algorithm backtrack to node B, where the value of β will change as this is a turn of Min, Now
β= +∞, will compare with the available subsequent nodes value, i.e. min (∞, 3) = 3, hence at node B now α=
-∞, and β= 3.
In the next step, algorithm traverse the next successor of Node B which is node E, and the values of α= -∞,
and β= 3 will also be passed.

Step 4: At node E, Max will take its turn, and the value of alpha will change. The current value of alpha will
be compared with 5, so max (-∞, 5) = 5, hence at node E α= 5 and β= 3, where α>=β, so the right successor
of E will be pruned, and algorithm will not traverse it, and the value at node E will be 5.
Step 5: At next step, algorithm again backtrack the tree, from node B to node A. At node A, the value of
alpha will be changed the maximum available value is 3 as max (-∞, 3)= 3, and β= +∞, these two values
now passes to right successor of A which is Node C.
At node C, α=3 and β= +∞, and the same values will be passed on to node F.
Step 6: At node F, again the value of α will be compared with left child which is 0, and max(3,0)= 3, and
then compared with right child which is 1, and max(3,1)= 3 still α remains 3, but the node value of F will
become 1.

Step 7: Node F returns the node value 1 to node C, at C α= 3 and β= +∞, here the value of beta will be
changed, it will compare with 1 so min (∞, 1) = 1. Now at C, α=3 and β= 1, and again it satisfies the
condition α>=β, so the next child of C which is G will be pruned, and the algorithm will not compute the
entire sub-tree G.
Step 8: C now returns the value of 1 to A here the best value for A is max (3, 1) = 3. Following is the final
game tree which is the showing the nodes which are computed and nodes which has never computed. Hence
the optimal value for the maximizer is 3 for this example.

Move Ordering in Alpha-Beta pruning:
The effectiveness of alpha-beta pruning is highly dependent on the order in which each node is examined.
Move order is an important aspect of alpha-beta pruning.
It can be of two types:
o Worst ordering: In some cases, alpha-beta pruning algorithm does not prune any of the leaves of
the tree, and works exactly as minimax algorithm. In this case, it also consumes more time because
of alpha-beta factors, such a move of pruning is called worst ordering. In this case, the best move
occurs on the right side of the tree. The time complexity for such an order is O(bm).
o Ideal ordering: The ideal ordering for alpha-beta pruning occurs when lots of pruning happens in
the tree, and best moves occur at the left side of the tree. We apply DFS hence it first search left of
the tree and go deep twice as minimax algorithm in the same amount of time. Complexity in ideal
ordering is O(bm/2).
Rules to find good ordering:

Following are some rules to find good ordering in alpha-beta pruning:
o Occur the best move from the shallowest node.
o Order the nodes in the tree such that the best nodes are checked first.
o Use domain knowledge while finding the best move. Ex: for Chess, try order: captures first, then
threats, then forward moves, backward moves.
o We can bookkeep the states, as there is a possibility that states may repeat.
(Subject In-charge)
Prof.S.B.Mehta)

--------------------------------------------------------------------------------------------------------------------------------------
BTCOC701
Lecture
Topic to be covered
Number
Unit 3: Knowledge and Reasoning (07Hrs)

1 ⮚ Knowledge representation issues, Representation & mapping
⮚ Approaches to knowledge representation, Issues in knowledge representation

2
⮚ Using predicate logic: Representing simple fact in logic

3
⮚ Representing instant & ISA relationship, Computable functions & predicates

4
⮚ Computable functions & predicates Representing knowledge using rules:

5
⮚ Procedural verses declarative knowledge Logic programming

6
⮚ Forward Chaining, Backward Chaining, Resolution

7
:Submitted by:
Prof. S. B. Mehta

DEPARTMENT OF
COMPUTER SCIENCE
& ENGINEERING
Unit 3: Knowledge & Reasoning
Knowledge-Based Agent:
o An intelligent agent needs knowledge about the real world for taking decisions and reasoning to
act efficiently.
o Knowledge-based agents are those agents who have the capability of maintaining an internal
state of knowledge, reason over that knowledge, update their knowledge after observations
and take actions. These agents can represent the world with some formal representation and
act intelligently.
o Knowledge-based agents are composed of two main parts:
o Knowledge-base and
o Inference system.
A knowledge-based agent must able to do the following:
o An agent should be able to represent states, actions, etc.
o An agent Should be able to incorporate new percepts
o An agent can update the internal representation of the world
o An agent can deduce the internal representation of the world
o An agent can deduce appropriate actions.
The architecture of knowledge-based agent:

The above diagram is representing a generalized architecture for a knowledge-based agent. The
knowledge-based agent (KBA) take input from the environment by perceiving the environment. The
input is taken by the inference engine of the agent and which also communicate with KB to decide
as per the knowledge store in KB. The learning element of KBA regularly updates the KB by learning
new knowledge.
Knowledge base: Knowledge-base is a central component of a knowledge-based agent, it is also

known as KB. It is a collection of sentences (here 'sentence' is a technical term and it is not identical
to sentence in English). These sentences are expressed in a language which is called a knowledge
representation language. The Knowledge-base of KBA stores fact about the world.
Why use a knowledge base?

Knowledge-base is required for updating knowledge for an agent to learn with experiences and take
action as per the knowledge.
Inference system
Inference means deriving new sentences from old. Inference system allows us to add a new sentence
to the knowledge base. A sentence is a proposition about the world. Inference system applies logical
rules to the KB to deduce new information.
Inference system generates new facts so that an agent can update the KB. An inference system works
mainly in two rules which are given as:
o Forward chaining
o Backward chaining
Operations Performed by KBA
Following are three operations which are performed by KBA in order to show the intelligent

behavior:
1. TELL: This operation tells the knowledge base what it perceives from the environment.
2. ASK: This operation asks the knowledge base what action it should perform.
3. Perform: It performs the selected action.
Various levels of knowledge-based agent:
A knowledge-based agent can be viewed at different levels which are given below:
1. Knowledge level
Knowledge level is the first level of knowledge-based agent, and in this level, we need to specify what
the agent knows, and what the agent goals are. With these specifications, we can fix its behavior. For
example, suppose an automated taxi agent needs to go from a station A to station B, and he knows the
way from A to B, so this comes at the knowledge level.
2. Logical level:
At this level, we understand that how the knowledge representation of knowledge is stored. At this
level, sentences are encoded into different logics. At the logical level, an encoding of knowledge into
logical sentences occurs. At the logical level we can expect to the automated taxi agent to reach to the
destination B.
3. Implementation level:
This is the physical representation of logic and knowledge. At the implementation level agent perform
actions as per logical and knowledge level. At this level, an automated taxi agent actually implement his
knowledge and logic so that he can reach to the destination.
What is knowledge representation?

Humans are best at understanding, reasoning, and interpreting knowledge. Human knows things, which
is knowledge and as per their knowledge they perform various actions in the real world. But how
machines do all these things comes under knowledge representation and reasoning. Hence we can
describe Knowledge representation as following:
o Knowledge representation and reasoning (KR, KRR) is the part of Artificial intelligence which
concerned with AI agents thinking and how thinking contributes to intelligent behavior of agents.

o It is responsible for representing information about the real world so that a computer can
understand and can utilize this knowledge to solve the complex real world problems such as
diagnosis a medical condition or communicating with humans in natural language.
o It is also a way which describes how we can represent knowledge in artificial intelligence.
Knowledge representation is not just storing data into some database, but it also enables an
intelligent machine to learn from that knowledge and experiences so that it can behave intelligently
like a human.
What to Represent:
Following are the kind of knowledge which needs to be represented in AI systems:
o Object: All the facts about objects in our world domain. E.g., Guitars contains strings, trumpets
are brass instruments.
o Events: Events are the actions which occur in our world.
o Performance: It describe behavior which involves knowledge about how to do things.
o Meta-knowledge: It is knowledge about what we know.
o Facts: Facts are the truths about the real world and what we represent.
o Knowledge-Base: The central component of the knowledge-based agents is the knowledge base.
It is represented as KB. The Knowledgebase is a group of the Sentences (Here, sentences are used
as a technical term and not identical with the English language).
Knowledge: Knowledge is awareness or familiarity gained by experiences of facts, data, and situations.
Following are the types of knowledge in artificial intelligence:

Types of knowledge
Following are the various types of knowledge:
1. Declarative Knowledge:
o Declarative knowledge is to know about something.

o It includes concepts, facts, and objects.
o It is also called descriptive knowledge and expressed in declarative sentences.
o It is simpler than procedural language.
2. Procedural Knowledge
o It is also known as imperative knowledge.

o Procedural knowledge is a type of knowledge which is responsible for knowing how to do
something.
o It can be directly applied to any task.
o It includes rules, strategies, procedures, agendas, etc.
o Procedural knowledge depends on the task on which it can be applied.
3. Meta-knowledge:
o Knowledge about the other types of knowledge is called Meta-knowledge.
4. Heuristic knowledge:

o Heuristic knowledge is representing knowledge of some experts in a filed or subject.
o Heuristic knowledge is rules of thumb based on previous experiences, awareness of approaches,
and which are good to work but not guaranteed.
5. Structural knowledge:
o Structural knowledge is basic knowledge to problem-solving.

o It describes relationships between various concepts such as kind of, part of, and grouping of
something.
o It describes the relationship that exists between concepts or objects.
Techniques of knowledge representation
There are mainly four ways of knowledge representation which are given as follows:
1. Logical Representation
2. Semantic Network Representation
3. Frame Representation
4. Production Rules
1. Logical Representation
Logical representation is a language with some concrete rules which deals with propositions and has no
ambiguity in representation. Logical representation means drawing a conclusion based on various
conditions. This representation lays down some important communication rules. It consists of precisely

defined syntax and semantics which supports the sound inference. Each sentence can be translated into
logics using syntax and semantics.
Syntax:
o Syntaxes are the rules which decide how we can construct legal sentences in the logic.
o It determines which symbol we can use in knowledge representation.
o How to write those symbols.
Semantics:
o Semantics are the rules by which we can interpret the sentence in the logic.
o Semantic also involves assigning a meaning to each sentence.
Logical representation can be categorised into mainly two logics:
a. Propositional Logics
b. Predicate logics
Advantages of logical representation:
1. Logical representation enables us to do logical reasoning.

2. Logical representation is the basis for the programming languages.
Disadvantages of logical Representation:
1. Logical representations have some restrictions and are challenging to work with.
2. Logical representation technique may not be very natural, and inference may not be so efficient.
2. Semantic Network Representation
Semantic networks are alternative of predicate logic for knowledge representation. In Semantic networks,
we can represent our knowledge in the form of graphical networks. This network consists of nodes
representing objects and arcs which describe the relationship between those objects. Semantic networks
can categorize the object in different forms and can also link those objects. Semantic networks are easy
to understand and can be easily extended.
This representation consist of mainly two types of relations:
a. IS-A relation (Inheritance)

b. Kind-of-relation

Example: Following are some statements which we need to represent in the form of nodes and arcs.
Statements:
a. Jerry is a cat.
b. Jerry is a mammal
c. Jerry is owned by Priya.
d. Jerry is brown colored.
e. All Mammals are animal.
In the above diagram, we have represented the different type of knowledge in the form of nodes and arcs.
Each object is connected with another object by some relation.
Drawbacks in Semantic representation:
1. Semantic networks take more computational time at runtime as we need to traverse the complete
network tree to answer some questions. It might be possible in the worst case scenario that after
traversing the entire tree, we find that the solution does not exist in this network.
2. Semantic networks try to model human-like memory (Which has 1015 neurons and links) to store
the information, but in practice, it is not possible to build such a vast semantic network.
3. These types of representations are inadequate as they do not have any equivalent quantifier, e.g.,
for all, for some, none, etc.
4. Semantic networks do not have any standard definition for the link names.
5. These networks are not intelligent and depend on the creator of the system.
Advantages of Semantic network:

1. Semantic networks are a natural representation of knowledge.
2. Semantic networks convey meaning in a transparent manner.
3. These networks are simple and easily understandable.
3. Frame Representation
A frame is a record like structure which consists of a collection of attributes and its values to describe an
entity in the world. Frames are the AI data structure which divides knowledge into substructures by
representing stereotypes situations. It consists of a collection of slots and slot values. These slots may be
of any type and sizes. Slots have names and values which are called facets.
Facets: The various aspects of a slot is known as Facets. Facets are features of frames which enable us
to put constraints on the frames. Example: IF-NEEDED facts are called when data of any particular slot
is needed. A frame may consist of any number of slots, and a slot may include any number of facets and
facets may have any number of values. A frame is also known as slot-filter knowledge representation in
artificial intelligence.
Frames are derived from semantic networks and later evolved into our modern-day classes and objects. A
single frame is not much useful. Frames system consist of a collection of frames which are connected. In
the frame, knowledge about an object or event can be stored together in the knowledge base. The frame
is a type of technology which is widely used in various applications including Natural language processing
and machine visions.
Example: 1
Let's take an example of a frame for a book
Slots Filters
Title Artificial Intelligence
Genre Computer Science
Author Peter Norvig
Edition Third Edition
Year 1996
Page 1152

Example 2:
Let's suppose we are taking an entity, Peter. Peter is an engineer as a profession, and his age is 25, he lives
in city London, and the country is England. So following is the frame representation for this:
Slots Filter
Name Peter
Profession Doctor
Age 25
Marital status Single
Weight 78
Advantages of frame representation:
1. The frame knowledge representation makes the programming easier by grouping the related data.
2. The frame representation is comparably flexible and used by many applications in AI.
3. It is very easy to add slots for new attribute and relations.
4. It is easy to include default data and to search for missing values.
5. Frame representation is easy to understand and visualize.
Disadvantages of frame representation:
1. In frame system inference mechanism is not be easily processed.

2. Inference mechanism cannot be smoothly proceeded by frame representation.
3. Frame representation has a much generalized approach.
4. Production Rules
Production rules system consist of (condition, action) pairs which mean, "If condition then action". It has
mainly three parts:
o The set of production rules

o Working Memory
o The recognize-act-cycle

In production rules agent checks for the condition and if the condition exists then production rule fires
and corresponding action is carried out. The condition part of the rule determines which rule may be
applied to a problem. And the action part carries out the associated problem-solving steps. This complete
process is called a recognize-act cycle.
The working memory contains the description of the current state of problems-solving and rule can write
knowledge to the working memory. This knowledge match and may fire other rules.
If there is a new situation (state) generates, then multiple production rules will be fired together, this is
called conflict set. In this situation, the agent needs to select a rule from these sets, and it is called a
conflict resolution.
Example:
o IF (at bus stop AND bus arrives) THEN action (get into the bus)
o IF (on the bus AND paid AND empty seat) THEN action (sit down).
o IF (on bus AND unpaid) THEN action (pay charges).
o IF (bus arrives at destination) THEN action (get down from the bus).
Advantages of Production rule:
1. The production rules are expressed in natural language.

2. The production rules are highly modular, so we can easily remove, add or modify an individual
rule.
Disadvantages of Production rule:
1. Production rule system does not exhibit any learning capabilities, as it does not store the result of
the problem for the future uses.
2. During the execution of the program, many rules may be active hence rule-based production
systems are inefficient.
Propositional logic in Artificial intelligence

Propositional logic (PL) is the simplest form of logic where all the statements are made by
propositions. A proposition is a declarative statement which is either true or false. It is a technique
of knowledge representation in logical and mathematical form.
Example:

a) It is Sunday.
b) The Sun rises from West (False proposition)
c) 3+3= 7(False proposition)
d) 5 is a prime number.
Following are some basic facts about propositional logic:
 Propositional logic is also called Boolean logic as it works on 0 and 1.
 In propositional logic, we use symbolic variables to represent the logic, and we can use any
symbol for a representing a proposition, such A, B, C, P, Q, R, etc.
 Propositions can be either true or false, but it cannot be both.
 Propositional logic consists of an object, relations or function, and logical connectives.
 These connectives are also called logical operators.
 The propositions and connectives are the basic elements of the propositional logic.
 Connectives can be said as a logical operator which connects two sentences.
 A proposition formula which is always true is called tautology, and it is also called a valid
sentence.
 A proposition formula which is always false is called Contradiction.
 A proposition formula which has both true and false values is called
 Statements which are questions, commands, or opinions are not propositions such as "Where is
Rohini", "How are you", "What is your name", are not propositions.
Syntax of propositional logic:

The syntax of propositional logic defines the allowable sentences for the knowledge representation.
There are two types of Propositions:
a. Atomic Propositions
b. Compound propositions
o Atomic Proposition:
Atomic propositions are the simple propositions. It consists of a single proposition symbol.
These are the sentences which must be either true or false.
Example:
a) 2+2 is 4, it is an atomic proposition as it is a true fact.

b) "The Sun is cold" is also a proposition as it is a false fact.
o Compound proposition:
Compound propositions are constructed by combining simpler or atomic propositions, using
parenthesis and logical connectives.
Example:
a) "It is raining today, and street is wet."
b) "Ankit is a doctor, and his clinic is in Mumbai."
Logical Connectives:
Logical connectives are used to connect two simpler propositions or representing a sentence logically.
We can create compound propositions with the help of logical connectives. There are mainly five
connectives, which are given as follows:
1. Negation: A sentence such as ¬ P is called negation of P. A literal can be either Positive literal
or negative literal.
2. Conjunction: A sentence which has ∧ connective such as, P ∧ Q is called a conjunction.

Example: Rohan is intelligent and hardworking. It can be written as,
P= Rohan is intelligent,
Q= Rohan is hardworking. → P∧ Q.
3. Disjunction: A sentence which has ∨ connective, such as P ∨ Q. is called disjunction, where P

and Q are the propositions.
Example: "Ritika is a doctor or Engineer",
Here P= Ritika is Doctor. Q= Ritika is Doctor, so we can write it as P ∨ Q.
4. Implication: A sentence such as P → Q, is called an implication. Implications are also known as

if-then rules. It can be represented as
If it is raining, then the street is wet.
Let P= It is raining, and Q= Street is wet, so it is represented as P → Q
5. Biconditional: A sentence such as P⇔ Q is a Biconditional sentence, example If I am

breathing, then I am alive
P= I am breathing, Q= I am alive, it can be represented as P ⇔ Q.
Following is the summarized table for Propositional Logic Connectives:

Truth Table:
In propositional logic, we need to know the truth values of propositions in all possible scenarios. We
can combine all the possible combination with logical connectives, and the representation of these
combinations in a tabular format is called Truth table. Following are the truth table for all logical
connectives:

Truth table with three propositions:
We can build a proposition composing three propositions P, Q, and R. This truth table is made-up of 8n Tuples
as we have taken three proposition symbols.
Precedence of connectives:
Just like arithmetic operators, there is a precedence order for propositional connectors or logical
operators. This order should be followed while evaluating a propositional problem. Following is the list
of the precedence order for operators:4
Precedence Operators
First Precedence Parenthesis
Second Precedence Negation
Third Precedence Conjunction(AND)
Fourth Precedence Disjunction(OR)
Fifth Precedence Implication
Six Precedence Biconditional

Note: For better understanding use parenthesis to make sure of the correct interpretations. Such as ¬R∨
Q, It can be interpreted as (¬R) ∨ Q.
Logical equivalence:
Logical equivalence is one of the features of propositional logic. Two propositions are said to be
logically equivalent if and only if the columns in the truth table are identical to each other.
Let's take two propositions A and B, so for logical equivalence, we can write it as A⇔B. In below truth
table we can see that column for ¬A∨ B and A→B, are identical hence A is Equivalent to B
Properties of Operators:
o Commutativity:
o P∧ Q= Q ∧ P, or
o P ∨ Q = Q ∨ P.
o Associativity:
o (P ∧ Q) ∧ R= P ∧ (Q ∧ R),
o (P ∨ Q) ∨ R= P ∨ (Q ∨ R)
o Identity element:
o P ∧ True = P,
o P ∨ True= True.
o Distributive:
o P∧ (Q ∨ R) = (P ∧ Q) ∨ (P ∧ R).
o P ∨ (Q ∧ R) = (P ∨ Q) ∧ (P ∨ R).
o DE Morgan's Law:
o ¬ (P ∧ Q) = (¬P) ∨ (¬Q)
o ¬ (P ∨ Q) = (¬ P) ∧ (¬Q).
o Double-negation elimination:
o ¬ (¬P) = P.

Limitations of Propositional logic:
o We cannot represent relations like ALL, some, or none with propositional logic. Example:
a. All the girls are intelligent.
b. Some apples are sweet.
o Propositional logic has limited expressive power.
o In propositional logic, we cannot describe statements in terms of their properties or logical

relationships.
First-Order Logic in Artificial intelligence

In the topic of Propositional logic, we have seen that how to represent statements using propositional
logic. But unfortunately, in propositional logic, we can only represent the facts, which are either true or
false. PL is not sufficient to represent the complex sentences or natural language statements. The
propositional logic has very limited expressive power. Consider the following sentence, which we cannot
represent using PL logic.
o "Some humans are intelligent", or

o "Sachin likes cricket."
To represent the above statements, PL logic is not sufficient, so we required some more powerful logic,
such as first-order logic.
First-Order logic:
o First-order logic is another way of knowledge representation in artificial intelligence. It is an
extension to propositional logic.
o FOL is sufficiently expressive to represent the natural language statements in a concise way.
o First-order logic is also known as Predicate logic or First-order predicate logic. First-order
logic is a powerful language that develops information about the objects in a more easy way and
can also express the relationship between those objects.
o First-order logic (like natural language) does not only assume that the world contains facts like
propositional logic but also assumes the following things in the world:
o Objects: A, B, people, numbers, colors, wars, theories, squares, pits, wumpus, ......
o Relations: It can be unary relation such as: red, round, is adjacent, or n-any relation
such as: the sister of, brother of, has color, comes between
o Function: Father of, best friend, third inning of, end of, ......
o As a natural language, first-order logic also has two main parts:
a. Syntax

b. Semantics
Syntax of First-Order logic:

The syntax of FOL determines which collection of symbols is a logical expression in first-order logic.
The basic syntactic elements of first-order logic are symbols. We write statements in short-hand notation
in FOL.
Basic Elements of First-order logic:

Following are the basic elements of FOL syntax:
Constant 1, 2, A, John, Mumbai, cat,....
Variables x, y, z, a, b,....
Predicates Brother, Father, >,....
Function sqrt, LeftLegOf, ....
Connectives ∧, ∨, ¬, ⇒, ⇔
Equality ==
Quantifier ∀, ∃
Atomic sentences:
o Atomic sentences are the most basic sentences of first-order logic. These sentences are formed
from a predicate symbol followed by a parenthesis with a sequence of terms.
o We can represent atomic sentences as Predicate (term1, term2, ......, term n).
Example: Ravi and Ajay are brothers: => Brothers(Ravi, Ajay).

Chinky is a cat: => cat (Chinky).
Complex Sentences:
o Complex sentences are made by combining atomic sentences using connectives.
First-order logic statements can be divided into two parts:
o Subject: Subject is the main part of the statement.

o Predicate: A predicate can be defined as a relation, which binds two atoms together in a statement.

Consider the statement: "x is an integer.", it consists of two parts, the first part x is the subject of the
statement and second part "is an integer," is known as a predicate.
Quantifiers in First-order logic:

o A quantifier is a language element which generates quantification, and quantification specifies the
quantity of specimen in the universe of discourse.
o These are the symbols that permit to determine or identify the range and scope of the variable in
the logical expression. There are two types of quantifier:
a. Universal Quantifier, (for all, everyone, everything)
b. Existential quantifier, (for some, at least one).
Universal Quantifier:
Universal quantifier is a symbol of logical representation, which specifies that the statement within
its range is true for everything or every instance of a particular thing.
The Universal quantifier is represented by a symbol ∀, which resembles an inverted A.
Note: In universal quantifier we use implication "→".
If x is a variable, then ∀x is read as:
o For all x
o For each x
o For every x.
Example:
All man drink coffee.
Let a variable x which refers to a cat so all x can be represented in UOD as below:

∀x man(x) → drink (x, coffee).
It will be read as: There are all x where x is a man who drink coffee.
Existential Quantifier:
Existential quantifiers are the type of quantifiers, which express that the statement within its scope is
true for at least one instance of something.
It is denoted by the logical operator ∃, which resembles as inverted E. When it is used with a predicate
variable then it is called as an existential quantifier.
Note: In Existential quantifier we always use AND or Conjunction symbol (∧).
If x is a variable, then existential quantifier will be ∃x or ∃(x). And it will be read as:
o There exists a 'x.'
o For some 'x.'
o For at least one 'x.'
Example:
Some boys are intelligent.

∃x: boys(x) ∧ intelligent(x)
It will be read as: There are some x where x is a boy who is intelligent.
Points to remember:
 The main connective for universal quantifier ∀ is implication →.
 The main connective for existential quantifier ∃ is and ∧.
Properties of Quantifiers:
 In universal quantifier, ∀x∀y is similar to ∀y∀x.
 In Existential quantifier, ∃x∃y is similar to ∃y∃x.
 ∃x∀y is not similar to ∀y∃x.
Some Examples of FOL using quantifier:
1. All birds fly.

In this question the predicate is "fly(bird)."
And since there are all birds who fly so it will be represented as follows.
∀x bird(x) →fly(x).
2. Every man respects his parent.

In this question, the predicate is "respect(x, y)," where x=man, and y= parent.
Since there is every man so will use ∀, and it will be represented as follows:
∀x man(x) → respects (x, parent).

3. Some boys play cricket.
In this question, the predicate is "play(x, y)," where x= boys, and y= game. Since there are some boys
so we will use ∃, and it will be represented as:
∃x boys(x) → play(x, cricket).
4. Not all students like both Mathematics and Science.

In this question, the predicate is "like(x, y)," where x= student, and y= subject.
Since there are not all students, so we will use ∀ with negation, so following representation for this:
¬∀ (x) [ student(x) → like(x, Mathematics) ∧ like(x, Science)].
5. Only one student failed in Mathematics.

In this question, the predicate is "failed(x, y)," where x= student, and y= subject.
Since there is only one student who failed in Mathematics, so we will use following representation for
this:
∃(x) [ student(x) → failed (x, Mathematics) ∧∀ (y) [¬(x==y) ∧ student(y) → ¬failed (x,
Mathematics)].
Knowledge Engineering in First-order logic

What is knowledge-engineering?
The process of constructing a knowledge-base in first-order logic is called as knowledge- engineering.
In knowledge-engineering, someone who investigates a particular domain, learns important concept of
that domain, and generates a formal representation of the objects, is known as knowledge engineer.
In this topic, we will understand the Knowledge engineering process in an electronic circuit domain,
which is already familiar. This approach is mainly suitable for creating special-purpose knowledge
base.
The knowledge-engineering process:

Following are some main steps of the knowledge-engineering process. Using these steps, we will
develop a knowledge base which will allow us to reason about digital circuit (One-bit full adder) which
is given below

1. Identify the task:
The first step of the process is to identify the task, and for the digital circuit, there are various
reasoning tasks.
At the first level or highest level, we will examine the functionality of the circuit:
o Does the circuit add properly?

o What will be the output of gate A2, if all the inputs are high?
At the second level, we will examine the circuit structure details such as:
o Which gate is connected to the first input terminal?

o Does the circuit have feedback loops?
2. Assemble the relevant knowledge:

In the second step, we will assemble the relevant knowledge which is required for digital circuits. So for
digital circuits, we have the following required knowledge:
o Logic circuits are made up of wires and gates.

o Signal flows through wires to the input terminal of the gate, and each gate produces the
corresponding output which flows further.
o In this logic circuit, there are four types of gates used: AND, OR, XOR, and NOT.
o All these gates have one output terminal and two input terminals (except NOT gate, it has one
input terminal).

3. Decide on vocabulary:
The next step of the process is to select functions, predicate, and constants to represent the circuits,
terminals, signals, and gates. Firstly we will distinguish the gates from each other and from other objects.
Each gate is represented as an object which is named by a constant, such as, Gate(X1). The functionality
of each gate is determined by its type, which is taken as constants such as AND, OR, XOR, or NOT.
Circuits will be identified by a predicate: Circuit (C1).
For the terminal, we will use predicate: Terminal(x).
For gate input, we will use the function In(1, X1) for denoting the first input terminal of the gate, and for
output terminal we will use Out (1, X1).
The function Arity(c, i, j) is used to denote that circuit c has i input, j output.
The connectivity between gates can be represented by predicate Connect(Out(1, X1), In(1, X1)).
We use a unary predicate On (t), which is true if the signal at a terminal is on.
4. Encode general knowledge about the domain:

To encode the general knowledge about the logic circuit, we need some following rules:
o If two terminals are connected then they have the same input signal, it can be represented as:
∀ t1, t2 Terminal (t1) ∧ Terminal (t2) ∧ Connect (t1, t2) → Signal (t1) = Signal (2).
o Signal at every terminal will have either value 0 or 1, it will be represented as:
∀ t Terminal (t) →Signal (t) = 1 ∨Signal (t) = 0.
5. Encode a description of the problem instance:

Now we encode problem of circuit C1, firstly we categorize the circuit and its gate components. This step
is easy if ontology about the problem is already thought. This step involves the writing simple atomics
sentences of instances of concepts, which is known as ontology.
For the given circuit C1, we can encode the problem instance in atomic sentences as below:
1. Since in the circuit there are two XOR, two AND, and one OR gate so atomic sentences for the
For XOR gate: Type(x1)= XOR, Type(X2) = XOR
2. For AND gate: Type(A1) = AND, Type(A2)= AND
3. For OR gate: Type (O1) = OR.
And then represent the connections between all the gates.

Note: Ontology defines a particular theory of the nature of existence

6. Pose queries to the inference procedure and get answers:
In this step, we will find all the possible set of values of all the terminal for the adder circuit. The first
query will be:
What should be the combination of input which would generate the first output of circuit C1, as 0 and a
second output to be 1?
1. ∃ i1, i2, i3 Signal (In(1, C1))=i1 ∧ Signal (In(2, C1))=i2 ∧ Signal (In(3, C1))= i3
2. ∧ Signal (Out(1, C1)) =0 ∧ Signal (Out(2, C1))=1
7. Debug the knowledge base:

Now we will debug the knowledge base, and this is the last step of the complete process. In this step,
we will try to debug the issues of knowledge base.
In the knowledge base, we may have omitted assertions like 1 ≠ 0.
Inference in First-Order Logic

Inference in First-Order Logic is used to deduce new facts or sentences from existing sentences. Before
understanding the FOL inference rule, let's understand some basic terminologies used in FOL.
Substitution:
Substitution is a fundamental operation performed on terms and formulas. It occurs in all inference
systems in first-order logic. The substitution is complex in the presence of quantifiers in FOL. If we
write F[a/x], so it refers to substitute a constant "a" in place of variable "x".
Note: First-order logic is capable of expressing facts about some or all objects in the universe
Equality:
First-Order logic does not only use predicate and terms for making atomic sentences but also uses another
way, which is equality in FOL. For this, we can use equality symbols which specify that the two terms
refer to the same object.
Example: Brother (John) = Smith.
As in the above example, the object referred by the Brother (John) is similar to the object referred
by Smith. The equality symbol can also be used with negation to represent that two terms are not the
same objects.
Example: ￢(x=y) which is equivalent to x ≠y.
FOL inference rules for quantifier:

As propositional logic we also have inference rules in first-order logic, so following are some basic
inference rules in FOL:

o Universal Generalization
o Universal Instantiation
o Existential Instantiation
o Existential introduction
1. Universal Generalization:
o Universal generalization is a valid inference rule which states that if premise P(c) is true for any
arbitrary element c in the universe of discourse, then we can have a conclusion as ∀ x P(x).
o It can be represented as: .

o This rule can be used if we want to show that every element has a similar property.
o In this rule, x must not appear as a free variable.
Example: Let's represent, P(c): "A byte contains 8 bits", so for ∀ x P(x) "All bytes contain 8 bits.", it
will also be true.
2. Universal Instantiation:
o Universal instantiation is also called as universal elimination or UI is a valid inference rule. It can
be applied multiple times to add new sentences.
o The new KB is logically equivalent to the previous KB.
o As per UI, we can infer any sentence obtained by substituting a ground term for the variable.
o The UI rule state that we can infer any sentence P(c) by substituting a ground term c (a constant
within domain x) from ∀ x P(x) for any object in the universe of discourse.
o It can be represented as: .
Example:1.
IF "Every person like ice-cream"=> ∀x P(x) so we can infer that

"John likes ice-cream" => P(c)
Example: 2.
Let's take a famous example,
"All kings who are greedy are Evil." So let our knowledge base contains this detail as in the form of FOL:
∀x king(x) ∧ greedy (x) → Evil (x),
So from this information, we can infer any of the following statements using Universal Instantiation:
o King(John) ∧ Greedy (John) → Evil (John),

o King(Richard) ∧ Greedy (Richard) → Evil (Richard),
o King(Father(John)) ∧ Greedy (Father(John)) → Evil (Father(John)),
3. Existential Instantiation:
o Existential instantiation is also called as Existential Elimination, which is a valid inference rule in
first-order logic.
o It can be applied only once to replace the existential sentence.
o The new KB is not logically equivalent to old KB, but it will be satisfiable if old KB was
satisfiable.
o This rule states that one can infer P(c) from the formula given in the form of ∃x P(x) for a new
constant symbol c.
o The restriction with this rule is that c used in the rule must be a new term for which P(c ) is true.
o It can be represented as:
Example:
From the given sentence: ∃x Crown(x) ∧ OnHead(x, John),
So we can infer: Crown(K) ∧ OnHead( K, John), as long as K does not appear in the knowledge base.
o The above used K is a constant symbol, which is called Skolem constant.

o The Existential instantiation is a special case of Skolemization process.
4. Existential introduction
o An existential introduction is also known as an existential generalization, which is a valid

inference rule in first-order logic.
o This rule states that if there is some element c in the universe of discourse which has a property P,
then we can infer that there exists something in the universe which has the property P.
o It can be represented as:

o Example: Let's say that,
"Priyanka got good marks in English."
"Therefore, someone got good marks in English."
(Subject In-charge)
Prof.S.B.Mehta)

DEPARTMENT OF
COMPUTER
SCIENCE &
Nutan College Of Engineering
& Research, Talegaon Dabhade, ENGINEERING
Pune-410507
Unit 4: Probabilistic Reasoning
Probabilistic reasoning:
Uncertainty:
Till now, we have learned knowledge representation using first-order logic and propositional logic with
certainty, which means we were sure about the predicates. With this knowledge representation, we might
write A→B, which means if A is true then B is true, but consider a situation where we are not sure about
whether A is true or not then we cannot express this statement, this situation is called uncertainty.
So to represent uncertain knowledge, where we are not sure about the predicates, we need uncertain
reasoning or probabilistic reasoning.
Causes of uncertainty:
Following are some leading causes of uncertainty to occur in the real world.
1. Information occurred from unreliable sources.

2. Experimental Errors
3. Equipment fault
4. Temperature variation
5. Climate change.
Probabilistic reasoning:
Probabilistic reasoning is a way of knowledge representation where we apply the concept of probability
to indicate the uncertainty in knowledge. In probabilistic reasoning, we combine probability theory with
logic to handle the uncertainty.
We use probability in probabilistic reasoning because it provides a way to handle the uncertainty that is
the result of someone's laziness and ignorance.
In the real world, there are lots of scenarios, where the certainty of something is not confirmed, such as
"It will rain today," "behavior of someone for some situations," "A match between two teams or two

players." These are probable sentences for which we can assume that it will happen but not sure about it,
so here we use probabilistic reasoning.
Need of probabilistic reasoning in AI:
o When there are unpredictable outcomes.

o When specifications or possibilities of predicates becomes too large to handle.
o When an unknown error occurs during an experiment.
In probabilistic reasoning, there are two ways to solve problems with uncertain knowledge:
o Bayes' rule
o Bayesian Statistics
As probabilistic reasoning uses probability and related terms, so before understanding probabilistic
reasoning, let's understand some common terms:
Probability: Probability can be defined as a chance that an uncertain event will occur. It is the
numerical measure of the likelihood that an event will occur. The value of probability always remains
between 0 and 1 that represent ideal uncertainties.
1. ≤ P(A) ≤ 1, where P(A) is the probability of an event A.
1. P(A) = 0, indicates total uncertainty in an event A.
1. P(A) =1, indicates total certainty in an event A.
We can find the probability of an uncertain event by using the below formula.
o P(¬A) = probability of a not happening event.

o P(¬A) + P(A) = 1.
Event: Each possible outcome of a variable is called an event.
Sample space: The collection of all possible events is called sample space.
Random variables: Random variables are used to represent the events and objects in the real world.
Prior probability: The prior probability of an event is probability computed before observing new
information.
Posterior Probability: The probability that is calculated after all evidence or information has taken into
account. It is a combination of prior probability and new information.

Conditional probability:
Conditional probability is a probability of occurring an event when another event has already happened.
Let's suppose, we want to calculate the event A when event B has already occurred, "the probability of A
under the conditions of B", it can be written as:
Where P(A⋀B)= Joint probability of a and B
P(B)= Marginal probability of B.
If the probability of A is given and we need to find the probability of B, then it will be given as:
It can be explained by using the below Venn diagram, where B is occurred event, so sample space will be
reduced to set B, and now we can only calculate event A when event B is already occurred by dividing
the probability of P(A⋀B) by P( B ).
Example:
In a class, there are 70% of the students who like English and 40% of the students who likes English and
mathematics, and then what is the percent of students those who like English also like mathematics?
Solution:
Let, A is an event that a student likes Mathematics
B is an event that a student likes English.

Hence, 57% are the students who like English also like Mathematics.
Bayes' theorem:
Bayes' theorem is also known as Bayes' rule, Bayes' law, or Bayesian reasoning, which determines the
probability of an event with uncertain knowledge.
In probability theory, it relates the conditional probability and marginal probabilities of two random
events.
Bayes' theorem was named after the British mathematician Thomas Bayes. The Bayesian inference is
an application of Bayes' theorem, which is fundamental to Bayesian statistics.
It is a way to calculate the value of P(B|A) with the knowledge of P(A|B).
ayes' theorem allows updating the probability prediction of an event by observing new information of the
real world.
Example: If cancer corresponds to one's age then by using Bayes' theorem, we can determine the
probability of cancer more accurately with the help of age.
Bayes' theorem can be derived using product rule and conditional probability of event A with known
event B:
As from product rule we can write:
1. P(A ⋀ B)= P(A|B) P(B) or
Similarly, the probability of event B with known event A:
1. P(A ⋀ B)= P(B|A) P(A)
Equating right hand side of both the equations, we will get:
The above equation (a) is called as Bayes' rule or Bayes' theorem. This equation is basic of most
modern AI systems for probabilistic inference.
It shows the simple relationship between joint and conditional probabilities. Here,
P(A|B) is known as posterior, which we need to calculate, and it will be read as Probability of
hypothesis A when we have occurred an evidence B.
P(B|A) is called the likelihood, in which we consider that hypothesis is true, then we calculate the
probability of evidence.
P(A) is called the prior probability, probability of hypothesis before considering the evidence

P(B) is called marginal probability, pure probability of an evidence.
In the equation (a), in general, we can write P (B) = P(A)*P(B|Ai), hence the Bayes' rule can be written
as:
Where A1, A2, A3,........, An is a set of mutually exclusive and exhaustive events.
Example-2:
Question: From a standard deck of playing cards, a single card is drawn. The probability that the
card is king is 4/52, then calculate posterior probability P(King|Face), which means the drawn face
card is a king card.
Solution:
P(king): probability that the card is King= 4/52= 1/13
P(face): probability that a card is a face card= 3/13
P(Face|King): probability of face card when we assume it is a king = 1
Putting all values in equation (i) we will get:
Application of Bayes' theorem in Artificial intelligence:
Following are some applications of Bayes' theorem:
o It is used to calculate the next step of the robot when the already executed step is given.
o Bayes' theorem is helpful in weather forecasting.
o It can solve the Monty Hall problem.
Bayesian Belief Network in artificial intelligence

Bayesian belief network is key computer technology for dealing with probabilistic events and to solve a
problem which has uncertainty. We can define a Bayesian network as:

"A Bayesian network is a probabilistic graphical model which represents a set of variables and their
conditional dependencies using a directed acyclic graph."
It is also called a Bayes network, belief network, decision network, or Bayesian model.
Bayesian networks are probabilistic, because these networks are built from a probability distribution,
and also use probability theory for prediction and anomaly detection.
Real world applications are probabilistic in nature, and to represent the relationship between multiple
events, we need a Bayesian network. It can also be used in various tasks including prediction, anomaly
detection, diagnostics, automated insight, reasoning, time series prediction, and decision making
under uncertainty.
Bayesian Network can be used for building models from data and experts opinions, and it consists of two
parts:
o Directed Acyclic Graph

o Table of conditional probabilities.
The generalized form of Bayesian network that represents and solve decision problems under uncertain
knowledge is known as an Influence diagram.
A Bayesian network graph is made up of nodes and Arcs (directed links), where:
o Each node corresponds to the random variables, and a variable can be continuous or discrete.
o Arc or directed arrows represent the causal relationship or conditional probabilities between
random variables. These directed links or arrows connect the pair of nodes in the graph.
These links represent that one node directly influence the other node, and if there is no directed
link that means that nodes are independent with each other
o In the above diagram, A, B, C, and D are random variables represented by the nodes
of the network graph.
o If we are considering node B, which is connected with node A by a directed arrow,
then node A is called the parent of Node B.
o Node C is independent of node A.

Note: The Bayesian network graph does not contain any cyclic graph. Hence, it is known as a directed
acyclic graph or DAG.
The Bayesian network has mainly two components:
o Causal Component
o Actual numbers
Each node in the Bayesian network has condition probability distribution P(Xi |Parent(Xi) ), which
determines the effect of the parent on that node.
Bayesian network is based on Joint probability distribution and conditional probability. So let's first
understand the joint probability distribution:
Joint probability distribution:
If we have variables x1, x2, x3,....., xn, then the probabilities of a different combination of x1, x2, x3.. xn,
are known as Joint probability distribution.
P[x1, x2, x3,....., xn], it can be written as the following way in terms of the joint probability distribution.
= P[x1| x2, x3,....., xn]P[x2, x3,....., xn]
= P[x1| x2, x3,....., xn]P[x2|x3,....., xn]....P[xn-1|xn]P[xn].
In general for each variable Xi, we can write the equation as:
P(Xi|Xi-1,........., X1) = P(Xi |Parents(Xi ))
Explanation of Bayesian network:

Let's understand the Bayesian network through an example by creating a directed acyclic graph:
Example: Harry installed a new burglar alarm at his home to detect burglary. The alarm reliably responds
at detecting a burglary but also responds for minor earthquakes. Harry has two neighbors David and
Sophia, who have taken a responsibility to inform Harry at work when they hear the alarm. David always
calls Harry when he hears the alarm, but sometimes he got confused with the phone ringing and calls at
that time too. On the other hand, Sophia likes to listen to high music, so sometimes she misses to hear the
alarm. Here we would like to compute the probability of Burglary Alarm.
Problem:
Calculate the probability that alarm has sounded, but there is neither a burglary, nor an earthquake
occurred, and David and Sophia both called the Harry.
Solution:

o The Bayesian network for the above problem is given below. The network structure is showing
that burglary and earthquake is the parent node of the alarm and directly affecting the probability
of alarm's going off, but David and Sophia's calls depend on alarm probability.
o The network is representing that our assumptions do not directly perceive the burglary and also
do not notice the minor earthquake, and they also not confer before calling.
o The conditional distributions for each node are given as conditional probabilities table or CPT.
o Each row in the CPT must be sum to 1 because all the entries in the table represent an exhaustive
set of cases for the variable.
o In CPT, a boolean variable with k boolean parents contains 2K probabilities. Hence, if there are
two parents, then CPT will contain 4 probability values
List of all events occurring in this network:
o Burglary (B)
o Earthquake(E)
o Alarm(A)
o David Calls(D)
o Sophia calls(S)
We can write the events of problem statement in the form of probability: P[D, S, A, B, E], can rewrite the
above probability statement using joint probability distribution:
P[D, S, A, B, E]= P[D | S, A, B, E]. P[S, A, B, E]
=P[D | S, A, B, E]. P[S | A, B, E]. P[A, B, E]
= P [D| A]. P [ S| A, B, E]. P[ A, B, E]
= P[D | A]. P[ S | A]. P[A| B, E]. P[B, E]
= P[D | A ]. P[S | A]. P[A| B, E]. P[B |E]. P[E]

Let's take the observed probability for the Burglary and earthquake component:
P(B= True) = 0.002, which is the probability of burglary.
P(B= False)= 0.998, which is the probability of no burglary.
P(E= True)= 0.001, which is the probability of a minor earthquake
P(E= False)= 0.999, Which is the probability that an earthquake not occurred.
We can provide the conditional probabilities as per the below tables:
Conditional probability table for Alarm A:
The Conditional probability of Alarm A depends on Burglar and earthquake:
B E P(A= P(A= False)

True)
True True 0.94 0.06
True False 0.95 0.04
False True 0.31 0.69
False False 0.001 0.999
Conditional probability table for David Calls:

The Conditional probability of David that he will call depends on the probability of Alarm.
A P(D= True) P(D= False)
True 0.91 0.09
False 0.05 0.95
Conditional probability table for Sophia Calls:
The Conditional probability of Sophia that she calls is depending on its Parent Node "Alarm."
A P(S= True) P(S= False)
True 0.75 0.25
False 0.02 0.98
From the formula of joint distribution, we can write the problem statement in the form of probability
distribution:
P(S, D, A, ¬B, ¬E) = P (S|A) *P (D|A)*P (A|¬B ^ ¬E) *P (¬B) *P (¬E).
= 0.75* 0.91* 0.001* 0.998*0.999
= 0.00068045.
Hence, a Bayesian network can answer any query about the domain by using Joint distribution.
The semantics of Bayesian Network:
There are two ways to understand the semantics of the Bayesian network, which is given below:
1. To understand the network as the representation of the Joint probability distribution.
It is helpful to understand how to construct the network.
2. To understand the network as an encoding of a collection of conditional independence statements.
It is helpful in designing inference procedure.
Fuzzy Logic | Introduction
What is Fuzzy Logic?
The 'Fuzzy' word means the things that are not clear or are vague. Sometimes, we cannot decide in real
life that the given problem or statement is either true or false. At that time, this concept provides many
values between the true and false and gives the flexibility to find the best solution to that problem.

Example of Fuzzy Logic as comparing to Boolean Logic
Fuzzy logic contains the multiple logical values and these values are the truth values of a variable or
problem between 0 and 1. This concept was introduced by Lofti Zadeh in 1965 based on the Fuzzy Set
Theory. This concept provides the possibilities which are not given by computers, but similar to the range
of possibilities generated by humans.
In the Boolean system, only two possibilities (0 and 1) exist, where 1 denotes the absolute truth value and
0 denotes the absolute false value. But in the fuzzy system, there are multiple possibilities present between
the 0 and 1, which are partially false and partially true.
The Fuzzy logic can be implemented in systems such as micro-controllers, workstation-based or large
network-based systems for achieving the definite output. It can also be implemented in both hardware or
software.
Characteristics of Fuzzy Logic
Following are the characteristics of fuzzy logic:
1. This concept is flexible and we can easily understand and implement it.
2. It is used for helping the minimization of the logics created by the human.
3. It is the best method for finding the solution of those problems which are suitable for approximate
or uncertain reasoning.
4. It always offers two values, which denote the two possible solutions for a problem and statement.
5. It allows users to build or create the functions which are non-linear of arbitrary complexity.
6. In fuzzy logic, everything is a matter of degree.
7. In the Fuzzy logic, any system which is logical can be easily fuzzified.
8. It is based on natural language processing.

9. It is also used by the quantitative analysts for improving their algorithm's execution.
10. It also allows users to integrate with the programming.
Architecture of a Fuzzy Logic System

In the architecture of the Fuzzy Logic system, each component plays an important role. The architecture
consists of the different four components which are given below.
1. Rule Base
2. Fuzzification
3. Inference Engine
4. Defuzzification
Following diagram shows the architecture or process of a Fuzzy Logic system:
1. Rule Base
Rule Base is a component used for storing the set of rules and the If-Then conditions given by the experts
are used for controlling the decision-making systems. There are so many updates that come in the Fuzzy
theory recently, which offers effective methods for designing and tuning of fuzzy controllers. These
updates or developments decreases the number of fuzzy set of rules.
2. Fuzzification
Fuzzification is a module or component for transforming the system inputs, i.e., it converts the crisp
number into fuzzy steps. The crisp numbers are those inputs which are measured by the sensors and then
fuzzification passed them into the control systems for further processing. This component divides the
input signals into following five states in any Fuzzy Logic system:

o Large Positive (LP)
o Medium Positive (MP)
o Small (S)
o Medium Negative (MN)
o Large negative (LN)
3. Inference Engine
This component is a main component in any Fuzzy Logic system (FLS), because all the information is
processed in the Inference Engine. It allows users to find the matching degree between the current fuzzy
input and the rules. After the matching degree, this system determines which rule is to be added according
to the given input field. When all rules are fired, then they are combined for developing the control actions.
4. Defuzzification
Defuzzification is a module or component, which takes the fuzzy set inputs generated by the Inference
Engine, and then transforms them into a crisp value. It is the last step in the process of a fuzzy logic
system. The crisp value is a type of value which is acceptable by the user. Various techniques are present
to do this, but the user has to select the best one for reducing the errors.
Classical and Fuzzy Set Theory

To learn about classical and Fuzzy set theory, firstly you have to know about what is set.
Set
A set is a term, which is a collection of unordered or ordered elements. Following are the various examples
of a set:
1. A set of all-natural numbers

2. A set of students in a class.
3. A set of all cities in a state.
4. A set of upper-case letters of the alphabet.
Types of Set:
There are following various categories of set:
1. Finite
2. Empty
3. Infinite
4. Proper
5. Universal
6. Subset

7. Singleton
8. Equivalent Set
9. Disjoint Set
Applications of Fuzzy Logic
Following are the different application areas where the Fuzzy Logic concept is widely used:
1. It is used in Businesses for decision-making support system.

2. It is used in Automative systems for controlling the traffic and speed, and for improving the
efficiency of automatic transmissions. Automative systems also use the shift scheduling method
for automatic transmissions.
3. This concept is also used in the Defence in various areas. Defence mainly uses the Fuzzy logic
systems for underwater target recognition and the automatic target recognition of thermal infrared
images.
4. It is also widely used in the Pattern Recognition and Classification in the form of Fuzzy logic-
based recognition and handwriting recognition. It is also used in the searching of fuzzy images.
5. Fuzzy logic systems also used in Securities.
6. It is also used in microwave oven for setting the lunes power and cooking strategy.
7. This technique is also used in the area of modern control systems such as expert systems.
8. Finance is also another application where this concept is used for predicting the stock market, and
for managing the funds.
9. It is also used for controlling the brakes.
10. It is also used in the industries of chemicals for controlling the ph, and chemical distillation
process.
11. It is also used in the industries of manufacturing for the optimization of milk and cheese
production.
12. It is also used in the vacuum cleaners, and the timings of washing machines.
13. It is also used in heaters, air conditioners, and humidifiers.
Advantages of Fuzzy Logic

Fuzzy Logic has various advantages or benefits. Some of them are as follows:
1. The methodology of this concept works similarly as the human reasoning.

2. Any user can easily understand the structure of Fuzzy Logic.
3. It does not need a large memory, because the algorithms can be easily described with fewer data.

4. It is widely used in all fields of life and easily provides effective solutions to the problems which
have high complexity.
5. This concept is based on the set theory of mathematics, so that's why it is simple.
6. It allows users for controlling the control machines and consumer products.
7. The development time of fuzzy logic is short as compared to conventional methods.
8. Due to its flexibility, any user can easily add and delete rules in the FLS system.
Disadvantages of Fuzzy Logic

Fuzzy Logic has various disadvantages or limitations. Some of them are as follows:
1. The run time of fuzzy logic systems is slow and takes a long time to produce outputs.
2. Users can understand it easily if they are simple.
3. The possibilities produced by the fuzzy logic system are not always accurate.
4. Many researchers give various ways for solving a given statement using this technique which leads
to ambiguity.
5. Fuzzy logics are not suitable for those problems that require high accuracy.
6. The systems of a Fuzzy logic need a lot of testing for verification and validation.
Classical Set
It is a type of set which collects the distinct objects in a group. The sets with the crisp boundaries are
classical sets. In any set, each single entity is called an element or member of that set.
Mathematical Representation of Sets
Any set can be easily denoted in the following two different ways:
1. Roaster Form: This is also called as a tabular form. In this form, the set is represented in the following
way:
Set_name = { element1, element2, element3, ......, element N}
The elements in the set are enclosed within the brackets and separated by the commas.
Following are the two examples which describes the set in Roaster or Tabular form:
Example 1:
Set of Natural Numbers: N={1, 2, 3, 4, 5, 6, 7, ......,n).
Example 2:
Set of Prime Numbers less than 50: X={2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47}.

2. Set Builder Form: Set Builder form defines a set with the common properties of an element in a set.
In this form, the set is represented in the following way:
A = {x:p(x)}
The following example describes the set in the builder form:
The set {2, 4, 6, 8, 10, 12, 14, 16, 18} is written as:
B = {x:2 ≤ x < 20 and (x%2) = 0}
Operations on Classical Set
Following are the various operations which are performed on the classical sets:
1. Union Operation
2. Intersection Operation
3. Difference Operation
4. Complement Operation
1. Union:
This operation is denoted by (A U B). A U B is the set of those elements which exist in two
different sets A and B. This operation combines all the elements from both the sets and make a
new set. It is also called a Logical OR operation.
It can be described as:
A ∪ B = { x | x ∈ A OR x ∈ B }.
Example:
Set A = {10, 11, 12, 13}, Set B = {11, 12, 13, 14, 15}, then A ∪ B = {10, 11, 12, 13, 14, 15}
2. Intersection
This operation is denoted by (A ∩ B). A ∩ B is the set of those elements which are common in
both set A and B. It is also called a Logical OR operation.
A ∩ B = { x | x ∈ A AND x ∈ B }.
Example:
Set A = {10, 11, 12, 13}, Set B = {11, 12, 14} then A ∩ B = {11, 12}

3. Difference Operation
This operation is denoted by (A - B). A-B is the set of only those elements which exist only in set
A but not in set B.
A - B = { x | x ∈ A AND x ∉ B }.
4. Complement Operation: This operation is denoted by (A`). It is applied on a single set. A` is

the set of elements which do not exist in set A.
A′ = {x|x ∉ A}.
Fuzzy Set
The set theory of classical is the subset of Fuzzy set theory. Fuzzy logic is based on this theory,
which is a generalisation of the classical theory of set (i.e., crisp set) introduced by Zadeh in 1965.
A fuzzy set is a collection of values which exist between 0 and 1. Fuzzy sets are denoted or
represented by the tilde (~) character. The sets of Fuzzy theory were introduced in 1965 by Lofti
A. Zadeh and Dieter Klaua. In the fuzzy set, the partial membership also exists. This theory
released as an extension of classical set theory.
1. Fuzzy set is a set having degrees of membership between 1 and 0. Fuzzy sets are
represented with tilde character(~). For example, Number of cars following traffic signals at a
particular time out of all cars present will have membership value between [0,1].
2. Partial membership exists when member of one fuzzy set can also be a part of other fuzzy
sets in the same universe.
3. The degree of membership or truth is not same as probability, fuzzy truth represents
membership in vaguely defined sets
This theory is denoted mathematically asA fuzzy set (Ã) is a pair of U and M, where U is the
Universe of discourse and M is the membership function which takes on values in the interval [
0, 1 ]. The universe of discourse (U) is also denoted by Ω or X.

Membership function
Definition: A graph that defines how each point in the input space is mapped to membership value
between 0 and 1. Input space is often referred to as the universe of discourse or universal set (u), which
contains all the possible elements of concern in each particular application
Operations on Fuzzy Set
Given Ã and B are the two fuzzy sets, and X be the universe of discourse with the following respective
member functions:
The operations of Fuzzy set are as follows:
1. Union Operation: The union operation of a fuzzy set is defined by:
μA∪B(x) = max (μA(x), μB(x))
Example:
Let's suppose A is a set which contains following elements:
A = {( X1, 0.6 ), (X2, 0.2), (X3, 1), (X4, 0.4)}
And, B is a set which contains following elements:
B = {( X1, 0.1), (X2, 0.8), (X3, 0), (X4, 0.9)}
then,
AUB = {( X1, 0.6), (X2, 0.8), (X3, 1), (X4, 0.9)}
Because, according to this operation
For X1
μA∪B(X1) = max (μA(X1), μB(X1))
μA∪B(X1) = max (0.6, 0.1)
μA∪B(X1) = 0.6
For X2
μA∪B(X2) = max (0.2, 0.8)
μA∪B(X2) = 0.8

For X3
μA∪B(X3) = max (1, 0)
μA∪B(X3) = 1
For X4
μA∪B(X4) = max (0.4, 0.9)
μA∪B(X4) = 0.9
2. Intersection Operation:The intersection operation of fuzzy set is defined by:
μA∩B(x) = min (μA(x), μB(x))
Example:
A = {( X1, 0.3 ), (X2, 0.7), (X3, 0.5), (X4, 0.1)}
And, B is a set which contains following elements:
B = {( X1, 0.8), (X2, 0.2), (X3, 0.4), (X4, 0.9)}
then,
A∩B = {( X1, 0.3), (X2, 0.2), (X3, 0.4), (X4, 0.1)}
For X1
μA∩B(X1) = min (μA(X1), μB(X1))
μA∩B(X1) = min (0.3, 0.8)
μA∩B(X1) = 0.3
For X2
μA∩B(X2) = min (0.7, 0.2)
μA∩B(X2) = 0.2

For X3
μA∩B(X3) = min (0.5, 0.4)
μA∩B(X3) = 0.4
For X4
μA∩B(X4) = min (0.1, 0.9)
μA∩B(X4) = 0.1
3. Complement Operation: The complement operation of fuzzy set is defined by:
μĀ(x) = 1-μA(x),
Example:
A = {( X1, 0.3 ), (X2, 0.8), (X3, 0.5), (X4, 0.1)}
then,
Ā= {( X1, 0.7 ), (X2, 0.2), (X3, 0.5), (X4, 0.9)}
For X1
μĀ(X1) = 1-μA(X1)
μĀ(X1) = 1 - 0.3
μĀ(X1) = 0.7
For X2
μĀ(X2) = 1-μA(X2)
μĀ(X2) = 1 - 0.8
μĀ(X2) = 0.2
For X3
μĀ(X3) = 1-μA(X3)

μĀ(X3) = 1 - 0.5
μĀ(X3) = 0.5
For X4
μĀ(X4) = 1-μA(X4)
μĀ(X4) = 1 - 0.1
μĀ(X4) = 0.9
Classical Set Theory Fuzzy Set Theory
1. This theory is a class of those sets 1. This theory is a class of those sets having
having sharp boundaries. un-sharp boundaries.
2. This set theory is defined by exact 2. This set theory is defined by ambiguous
boundaries only 0 and 1. boundaries.
3. In this theory, there is no uncertainty 3. In this theory, there always exists

about the boundary's location of a set. uncertainty about the boundary's location of a
set.
4. This theory is widely used in the 4. It is mainly used for fuzzy controllers.
design of digital systems.
Advantages of Fuzzy Logic System
 This system can work with any type of inputs whether it is imprecise, distorted or noisy input
information.
 The construction of Fuzzy Logic Systems is easy and understandable.
 Fuzzy logic comes with mathematical concepts of set theory and the reasoning of that is quite
simple.
 It provides a very efficient solution to complex problems in all fields of life as it resembles human
reasoning and decision-making.
 The algorithms can be described with little data, so little memory is required.
Disadvantages of Fuzzy Logic Systems
 Many researchers proposed different ways to solve a given problem through fuzzy logic which
leads to ambiguity. There is no systematic approach to solve a given problem through fuzzy logic.
 Proof of its characteristics is difficult or impossible in most cases because every time we do not
get a mathematical description of our approach.
 As fuzzy logic works on precise as well as imprecise data so most of the time accuracy is
compromised.

Application
 It is used in the aerospace field for altitude control of spacecraft and satellites.
 It has been used in the automotive system for speed control, traffic control.
 It is used for decision-making support systems and personal evaluation in the large company
business.
 It has application in the chemical industry for controlling the pH, drying, chemical distillation
process.
 Fuzzy logic is used in Natural language processing and various intensive applications in Artificial
Intelligence.
 Fuzzy logic is extensively used in modern control systems such as expert systems.
Fuzzy Logic is used with Neural Networks as it mimics how a person would make decisions, only much
faster. It is done by Aggregation of data and changing it into more meaningful data by forming partial
truths as Fuzzy
Fuzzy Logic - Applications

Aerospace
In aerospace, fuzzy logic is used in the following areas −
 Altitude control of spacecraft

 Satellite altitude control
 Flow and mixture regulation in aircraft deicing vehicles
Automotive
In automotive, fuzzy logic is used in the following areas −
 Trainable fuzzy systems for idle speed control

 Shift scheduling method for automatic transmission
 Intelligent highway systems
 Traffic control
 Improving efficiency of automatic transmissions
Business
In business, fuzzy logic is used in the following areas −
 Decision-making support systems

 Personnel evaluation in a large company
Defense
In defense, fuzzy logic is used in the following areas −
 Underwater target recognition

 Automatic target recognition of thermal infrared images
 Naval decision support aids
 Control of a hypervelocity interceptor
 Fuzzy set modeling of NATO decision making

Electronics
In electronics, fuzzy logic is used in the following areas −
 Control of automatic exposure in video cameras

 Humidity in a clean room
 Air conditioning systems
 Washing machine timing
 Microwave ovens
 Vacuum cleaners
Finance
In the finance field, fuzzy logic is used in the following areas −
 Banknote transfer control

 Fund management
 Stock market predictions
Industrial Sector
In industrial, fuzzy logic is used in following areas −
 Cement kiln controls heat exchanger control

 Activated sludge wastewater treatment process control
 Water purification plant control
 Quantitative pattern analysis for industrial quality assurance
 Control of constraint satisfaction problems in structural design
 Control of water purification plants
Manufacturing
In the manufacturing industry, fuzzy logic is used in following areas −
 Optimization of cheese production

 Optimization of milk production
Marine
In the marine field, fuzzy logic is used in the following areas −
 Autopilot for ships

 Optimal route selection
 Control of autonomous underwater vehicles
 Ship steering
Medical
In the medical field, fuzzy logic is used in the following areas −
 Medical diagnostic support system

 Control of arterial pressure during anesthesia

 Multivariable control of anesthesia
 Modeling of neuropathological findings in Alzheimer's patients
 Radiology diagnoses
 Fuzzy inference diagnosis of diabetes and prostate cancer
Securities
In securities, fuzzy logic is used in following areas −
 Decision systems for securities trading

 Various security appliances
Transportation
In transportation, fuzzy logic is used in the following areas −
 Automatic underground train operation

 Train schedule control
 Railway acceleration
 Braking and stopping
Pattern Recognition and Classification
In Pattern Recognition and Classification, fuzzy logic is used in the following areas −
 Fuzzy logic based speech recognition

 Fuzzy logic based
 Handwriting recognition
 Fuzzy logic based facial characteristic analysis
 Command analysis
 Fuzzy image search
Psychology
In Psychology, fuzzy logic is used in following areas −
 Fuzzy logic based analysis of human behavior

 Criminal investigation and prevention based on fuzzy logic reasoning
Planning
 Artificial intelligence is an important technology in the future. Whether it is intelligent robots,
self-driving cars, or smart cities, they will all use different aspects of artificial intelligence!!! But
Planning is very important to make any such AI project.
 Even Planning is an important part of Artificial Intelligence which deals with the tasks and
domains of a particular problem. Planning is considered the logical side of acting.
 Everything we humans do is with a definite goal in mind, and all our actions are oriented towards
achieving our goal. Similarly, Planning is also done for Artificial Intelligence.

For example, Planning is required to reach a particular destination. It is necessary to find the best route
in Planning, but the tasks to be done at a particular time and why they are done are also very important.
That is why Planning is considered the logical side of acting. In other words, Planning is about deciding
the tasks to be performed by the artificial intelligence system and the system's functioning under domain-
independent conditions.
What is a Plan?
We require domain description, task specification, and goal description for any planning system. A plan
is considered a sequence of actions, and each action has its preconditions that must be satisfied before it
can act and some effects that can be positive or negative.
So, we have Forward State Space Planning (FSSP) and Backward State Space Planning (BSSP) at the
basic level.
What is planning in AI?

Planning in artificial intelligence is about decision-making actions performed by robots or computer
programs to achieve a specific goal.
Execution of the plan is about choosing a sequence of tasks with a high probability of accomplishing a
specific task.
Block-world planning problem
o The block-world problem is known as the Sussmann anomaly.

o The non-interlaced planners of the early 1970s were unable to solve this problem. Therefore it is
considered odd.
o When two sub-goals, G1 and G2, are given, a non-interleaved planner either produces a plan for
G1 that is combined with a plan for G2 or vice versa.
o In the block-world problem, three blocks labeled 'A', 'B', and 'C' are allowed to rest on a flat
surface. The given condition is that only one block can be moved at a time to achieve the target.
The start position and target position are shown in the following diagram.

Components of the planning system
The plan includes the following important steps:
o Choose the best rule to apply the next rule based on the best available guess.
o Apply the chosen rule to calculate the new problem condition.
o Find out when a solution has been found.
o Detect dead ends so they can be discarded and direct system effort in more useful directions.
o Find out when a near-perfect solution is found.
Target stack plan
o It is one of the most important planning algorithms used by STRIPS.

o Stacks are used in algorithms to capture the action and complete the target. A knowledge base is
used to hold the current situation and actions.
o A target stack is similar to a node in a search tree, where branches are created with a choice of
action.
The important steps of the algorithm are mentioned below:
1. Start by pushing the original target onto the stack. Repeat this until the pile is empty. If the stack
top is a mixed target, push its unsatisfied sub-targets onto the stack.
2. If the stack top is a single unsatisfied target, replace it with action and push the action precondition
to the stack to satisfy the condition.
Non-linear Planning
This Planning is used to set a goal stack and is included in the search space of all possible sub-goal
orderings. It handles the goal interactions by the interleaving method.
Advantages of non-Linear Planning

Non-linear Planning may be an optimal solution concerning planning length (depending on the search
strategy used).
Disadvantages of Nonlinear Planning
It takes a larger search space since all possible goal orderings are considered.
Complex algorithm to understand.
Algorithm
1. Choose a goal 'g' from the goal set

2. If 'g' does not match the state, then
o Choose an operator 'o' whose add-list matches goal g
o Push 'o' on the OpStack
o Add the preconditions of 'o' to the goal set
3. While all preconditions of the operator on top of OpenStack are met in a state
o Pop operator o from top of opstack
o state = apply(o, state)
o plan = [plan; o]
Goal Stack Planning:

Goal Stack Planning is one of the earliest methods in artificial intelligence in which we work backwards
from the goal state to the initial state.
We start at the goal state and we try fulfilling the preconditions required to achieve the initial state. These
preconditions in turn have their own set of preconditions, which are required to be satisfied first. We keep
solving these “goals” and “sub-goals” until we finally arrive at the Initial State. We make use of a stack
to hold these goals that need to be fulfilled as well the actions that we need to perform for the same.
Apart from the “Initial State” and the “Goal State”, we maintain a “World State” configuration as well.
Goal Stack uses this world state to work its way from Goal State to Initial State. World State on the other
hand starts off as the Initial State and ends up being transformed into the Goal state.
At the end of this algorithm we are left with an empty stack and a set of actions which helps us navigate
from the Initial State to the World State.
Goal Stack Planning Algorithm:
Step-1: Push the original goal on the stack. Repeat until the stack is empty.
Step-2: If the stack top is a compound goal, push its unsatisfied subgoals on the stack.
Step-3: If the stack top is a single unsatisfied goal. It replaces it with an action that makes it
satisfied and push the action’s precondition on the stack.
Step-4: If stack top is an action, pop it from the stack, execute it and change the knowledge base by the
action’s effects.

Step-5: If stack top is a satisfying goal, pop it from the stack.
Representing the configurations as a list of “predicates”
Predicates can be thought of as a statement which helps us convey the information about a configuration
in Blocks World.
Given below are the list of predicates as well as their intended meaning
1. ON(A,B) : Block A is on B
2. ONTABLE(A) : A is on table
3. CLEAR(A) : Nothing is on top of A
4. HOLDING(A) : Arm is holding A.
5. ARMEMPTY : Arm is holding nothing
Using these predicates, we represent the Initial State and the Goal State in our example like this:
Initial State — ON(B,A) ∧ ONTABLE(A) ∧ ONTABLE(C) ∧ ONTABLE(D) ∧ CLEAR(B) ∧

CLEAR(C) ∧ CLEAR(D) ∧ ARMEMPTY
Initial State
Goal State — ON(C,A) ∧ ON(B,D) ∧ ONTABLE(A) ∧ ONTABLE(D) ∧ CLEAR(B) ∧ CLEAR(C) ∧

ARMEMPTY

Goal State
Thus a configuration can be thought of as a list of predicates describing the current scenario.
“Operations” performed by the robot arm
The Robot Arm can perform 4 operations:
1. STACK(X,Y) : Stacking Block X on Block Y

2. UNSTACK(X,Y) : Picking up Block X which is on top of Block Y
3. PICKUP(X) : Picking up Block X which is on top of the table
4. PUTDOWN(X) : Put Block X on the table
All the four operations have certain preconditions which need to be satisfied to perform the same. These
preconditions are represented in the form of predicates.
Hierarchical Planning:
Hierarchical Planning is a method used in Artificial Intelligence (AI) that breaks down complex problems
into smaller, more manageable sub-problems. This approach allows AI systems to efficiently plan and
execute actions to achieve their goals.
At its core, Hierarchical Planning involves two main levels: the high-level planner and the low-level
planner. The high-level planner is responsible for setting the overall goal and breaking it down into smaller
sub-goals. It then passes these sub-goals to the low-level planner, which generates a sequence of actions
to achieve each sub-goal.
This process continues until all sub-goals have been accomplished, and the overall goal has been achieved.
The high-level planner may also monitor the progress of the low-level planner and make adjustments as
necessary.
For complicated projects with several subtasks, hierarchical planning is especially helpful since it enables
the AI system to divide the issue into more manageable chunks. It is also helpful in scenarios when the
activities required to accomplish a goal may alter depending on the setting or circumstances.
Overall, Hierarchical Planning is an essential technique in the field of AI, as it allows AI systems to
efficiently plan and execute actions to achieve their goals.
For example, in autonomous robots, hierarchical planning can be used to plan high-level goals, such as
navigating to a specific location, and decompose them into lower-level tasks, such as obstacle avoidance,
path planning, and motion control.

Components of Hierarchical Planning
Hierarchical planning in artificial intelligence (AI) typically involves several key components, including:
 High-level goals: The overall objectives or tasks that the AI system aims to achieve.
 Task decomposition: Breaking down high-level goals into lower-level tasks or subgoals.
 Planning hierarchy: The organization of tasks or subgoals into a hierarchical structure, such as
a tree or a directed acyclic graph (DAG).
 Plan generation at different levels: Reasoning and planning at different levels of the hierarchy,
with plans generated for achieving subgoals or actions.
 Plan synthesis: Combining the plans for achieving subgoals or actions into a cohesive plan for
execution.
 Plan execution: Carrying out the actions or subgoals in the plan in the correct order.
 Plan adaptation: Revising plans at different levels of abstraction to accommodate changes in the
environment or goals.
Advantages of Hierarchical Planning

Hierarchical planning offers several advantages, including:
 Scalability: Hierarchical planning allows for reasoning and planning at different levels of
abstraction, enabling efficient handling of complex tasks and environments.
 Flexibility: Hierarchical planning provides the flexibility to adapt plans to changes in the
environment or goals, making them more robust and adaptable.
 Abstraction and reuse: The use of a hierarchy of tasks or subgoals allows for the abstraction and
reuse of plans, making planning more efficient and reducing the need for redundant planning.

 Higher-level reasoning: Hierarchical planning allows for higher-level reasoning and decision-
making, enabling AI systems to make strategic choices and coordinate actions at a higher level of
abstraction.
 Task organization: Hierarchical planning helps in organizing tasks or subgoals into a coherent
structure, providing a clear overview of the planning process and facilitating better coordination
and management of tasks.
Techniques Used in Hierarchical Planning
Hierarchical planning in artificial intelligence (AI) involves the use of various techniques to effectively
decompose tasks, abstract them at different levels, allocate tasks to appropriate agents or resources, and
integrate execution plans. Here's a brief overview of these techniques:
 Decomposition techniques: Decomposition techniques involve breaking down high-level goals

or tasks into lower-level tasks or subgoals. This can be done using methods such as goal
decomposition, task network decomposition, or state-based decomposition. Goal decomposition
involves breaking down high-level goals into smaller subgoals that can be achieved independently.
Task network decomposition involves representing tasks and their dependencies as a directed
graph and decomposing it into smaller subgraphs. State-based decomposition involves dividing
the planning problem into smaller subproblems based on different states of the environment.
 Abstraction techniques: Abstraction techniques involve representing tasks or actions at different
levels of abstraction. This can be done using methods such as state abstraction, action abstraction,
or temporal abstraction. State abstraction involves representing the state of the environment at a
higher level of abstraction, reducing the complexity of the planning problem. Action abstraction
involves representing actions at a higher level of abstraction, allowing for more generalizable
plans. Temporal abstraction involves representing actions or plans at a higher level of time
granularity, such as abstracting a sequence of actions into a single abstract action.
 Task allocation techniques: Task allocation techniques involve assigning tasks or subgoals to
appropriate agents or resources in a hierarchical planning system. This can be done using methods
such as centralized allocation, decentralized allocation, or market-based allocation. Centralized
allocation involves a central planner assigning tasks to agents or resources. Decentralized
allocation involves agents or resources autonomously selecting tasks based on local information.
Market-based allocation involves agents or resources bidding for tasks in a market-like
mechanism.
 Plan integration techniques: Plan integration techniques involve combining plans generated at
different levels of abstraction into a cohesive plan for execution. This can be done using methods
such as plan merging, plan refinement, or plan composition. Plan merging involves combining
plans for achieving different subgoals into a single plan. Plan refinement involves refining a high-
level plan by generating detailed plans for achieving lower-level subgoals. Plan composition
involves combining plans for achieving different tasks or actions into a coherent and executable
plan.
Applications of Hierarchical Planning in AI

Hierarchical planning has been widely used in various applications of artificial intelligence (AI),
including:
 Robotics: Hierarchical planning is commonly used in robotics to plan and execute complex tasks
involving multiple levels of abstraction. For example, in autonomous robots, hierarchical planning
can be used to plan high-level goals, such as navigating to a specific location, and decompose
them into lower-level tasks, such as obstacle avoidance, path planning, and motion control.
Hierarchical planning can also be used in robot task planning, where high-level goals, such as

picking and placing objects, can be decomposed into subgoals, such as perception, grasping, and
manipulation, to be executed by the robot.
 Autonomous Systems: Hierarchical planning is used in various autonomous systems, such as
autonomous vehicles, drones, and unmanned aerial vehicles (UAVs). Hierarchical planning can
be used to plan and coordinate actions at different levels of abstraction, such as high-level goals
like reaching a destination or following a mission plan, and low-level tasks like obstacle
avoidance, navigation, and communication. Hierarchical planning allows autonomous systems to
efficiently plan and execute complex tasks in dynamic and uncertain environments.
 Manufacturing: Hierarchical planning is employed in manufacturing systems to plan and
optimize production processes. High-level goals, such as production scheduling, resource
allocation, and task coordination, can be decomposed into lower-level tasks, such as machine
scheduling, material handling, and quality control, which can be planned and executed
hierarchically. Hierarchical planning enables efficient coordination and management of
manufacturing processes to improve productivity and resource utilization.
 Transportation: Hierarchical planning is utilized in transportation systems for tasks such as route
planning, traffic management, and logistics. High-level goals, such as finding optimal routes,
coordinating multiple vehicles, and managing traffic flow, can be decomposed into lower-level
tasks, such as path planning, traffic control, and fleet coordination. Hierarchical planning can help
transportation systems make informed decisions, optimize resource utilization, and improve
overall efficiency.
(Subject In-charge)
Prof.S.B.Mehta)

DEPARTMENT OF
COMPUTER
SCIENCE &
Nutan College Of Engineering
& Research, Talegaon Dabhade, ENGINEERING
Pune-410507
Unit 5: Natural Language Processing
What is NLP?
NLP stands for Natural Language Processing, which is a part of Computer Science, Human language, and
Artificial Intelligence. It is the technology that is used by machines to understand, analyse, manipulate,
and interpret human's languages. It helps developers to organize knowledge for performing tasks such as
translation, automatic summarization, Named Entity Recognition (NER), speech recognition, relationship
extraction, and topic segmentation.
Processing of Natural Language is required when you want an intelligent system like robot to perform as
per your instructions, when you want to hear decision from a dialogue based clinical expert system, etc.
The field of NLP involves making computers to perform useful tasks with the natural languages humans
use. The input and output of an NLP system can be
 Speech
 Written Text

Components of NLP
There are the following two components of NLP -
1. Natural Language Understanding (NLU)
Natural Language Understanding (NLU) helps the machine to understand and analyse human language
by extracting the metadata from content such as concepts, entities, keywords, emotion, relations, and
semantic roles.
NLU mainly used in Business applications to understand the customer's problem in both spoken and
written language.
NLU involves the following tasks -
o It is used to map the given input into useful representation.

o It is used to analyze different aspects of the language.
The next and hardest step of NLP is the understanding part.
 First, the computer must comprehend the meaning of each word. It tries to figure out whether the word is
a noun or a verb, whether it’s in the past or present tense, and so on. This is called Part-of-Speech tagging
(POS).
 A lexicon (a vocabulary) and a set of grammatical rules are also built into NLP systems. The most difficult
part of NLP is understanding.
 The machine should be able to grasp what you said by the conclusion of the process. There are several
challenges in accomplishing this when considering problems such as words having several meanings
(polysemy) or different words having similar meanings (synonymy), but developers encode rules into their
NLU systems and train them to learn to apply the rules correctly.
2. Natural Language Generation (NLG)
Natural Language Generation (NLG) acts as a translator that converts the computerized data into natural
language representation. It mainly involves Text planning, Sentence planning, and Text Realization.
NLG is much simpler to accomplish. NLG converts a computer’s machine-readable language into text and can also
convert that text into audible speech using text-to-speech technology.
 First, the NLP system identifies what data should be converted to text. If you asked the computer a question
about the weather, it most likely did an online search to find your answer, and from there it decides that the
temperature, wind, and humidity are the factors that should be read aloud to you.
 Then, it organizes the structure of how it’s going to say it. This is similar to NLU except backward. NLG
system can construct full sentences using a lexicon and a set of grammar rules.
 Finally, text-to-speech takes over. The text-to-speech engine uses a prosody model to evaluate the text and
identify breaks, duration, and pitch. The engine then combines all the recorded phonemes into one cohesive
string of speech using a speech database.

Difference between NLU and NLG
NLU NLG
NLU is the process of reading and interpreting NLG is the process of writing or generating
language. language.
It produces non-linguistic outputs from natural It produces constructing natural language

language inputs. outputs from non-linguistic inputs.
Advantages of NLP
o NLP helps users to ask questions about any subject and get a direct response within seconds.
o NLP offers exact answers to the question means it does not offer unnecessary and unwanted information.
o NLP helps computers to communicate with humans in their languages.
o It is very time efficient.
o Most of the companies use NLP to improve the efficiency of documentation processes, accuracy of
documentation, and identify the information from large databases.
Disadvantages of NLP
A list of disadvantages of NLP is given below:
o NLP may not show context.

o NLP is unpredictable
o NLP may require more keystrokes.
o NLP is unable to adapt to the new domain, and it has a limited function that's why NLP is built for a single
and specific task only.
Some common roles in Natural Language Processing (NLP) include:
 NLP engineer: designing and implementing NLP systems and models
 NLP researcher: conducting research on NLP techniques and algorithms
 ML engineer: Designing and deployment of various machine learning models including NLP.
 NLP data scientist: analyzing and interpreting NLP data
 NLP consultant: providing expertise in NLP to organizations and businesses.
Phases of NLP
There are the following five phases of NLP:

1. Lexical Analysis and Morphological
The first phase of NLP is the Lexical Analysis. This phase scans the source code as a stream of characters
and converts it into meaningful lexemes. It divides the whole text into paragraphs, sentences, and words.
It involves identifying and analyzing the structure of words. Lexicon of a language means the collection
of words and phrases in a language. Lexical analysis is dividing the whole chunk of txt into paragraphs,
sentences, and words.
2. Syntactic Analysis (Parsing)
Syntactic Analysis is used to check grammar, word arrangements, and shows the relationship among the
words. It involves analysis of words in the sentence for grammar and arranging words in a manner that
shows the relationship among the words. The sentence such as “The school goes to boy” is rejected by
English syntactic analyzer.
Example: Agra goes to the Poonam
In the real world, Agra goes to the Poonam, does not make any sense, so this sentence is rejected by the
Syntactic analyzer.
3. Semantic Analysis
Semantic analysis is concerned with the meaning representation. It mainly focuses on the literal meaning
of words, phrases, and sentences. It draws the exact meaning or the dictionary meaning from the text. The
text is checked for meaningfulness. It is done by mapping syntactic structures and objects in the task
domain. The semantic analyzer disregards sentence such as “hot ice-cream”.

4. Discourse Integration
Discourse Integration depends upon the sentences that proceeds it and also invokes the meaning of the
sentences that follow it. he meaning of any sentence depends upon the meaning of the sentence just before
it. In addition, it also brings about the meaning of immediately succeeding sentence.
5. Pragmatic Analysis
Pragmatic is the fifth and last phase of NLP. It helps you to discover the intended effect by applying a set
of rules that characterize cooperative dialogues. During this, what was said is re-interpreted on what it
actually meant. It involves deriving those aspects of language which require real world knowledge.
For Example: "Open the door" is interpreted as a request instead of an order.
Applications of NLP
There are the following applications of NLP -
1. Question Answering
Question Answering focuses on building systems that automatically answer the questions asked by
humans in a natural language.
2. Spam Detection
Spam detection is used to detect unwanted e-mails getting to a user's inbox.

3. Sentiment Analysis
Sentiment Analysis is also known as opinion mining. It is used on the web to analyse the attitude,
behaviour, and emotional state of the sender. This application is implemented through a combination of
NLP (Natural Language Processing) and statistics by assigning the values to the text (positive, negative,
or natural), identify the mood of the context (happy, sad, angry, etc.)
4. Machine Translation
Machine translation is used to translate text or speech from one natural language to another natural
language.
Example: Google Translator
5. Spelling correction
Microsoft Corporation provides word processor software like MS-word, PowerPoint for the spelling
correction.
6. Speech Recognition
Speech recognition is used for converting spoken words into text. It is used in applications, such as mobile,
home automation, video recovery, dictating to Microsoft Word, voice biometrics, voice user interface,
and so on.

7. Chatbot
Implementing the Chatbot is one of the important applications of NLP. It is used by many companies to
provide the customer's chat services.
8. Information extraction
Information extraction is one of the most important applications of NLP. It is used for extracting structured
information from unstructured or semi-structured machine-readable documents.
9. Natural Language Understanding (NLU)
It converts a large set of text into more formal representations such as first-order logic structures that are
easier for the computer programs to manipulate notations of the natural language processing.
Form of Learning
1. Supervised Learning:
 In supervised learning, the algorithm is trained on a labeled dataset, which means that it learns
from input-output pairs. It tries to map inputs to correct outputs.
 Common applications include classification and regression tasks, such as image recognition and
predicting house prices.
2. Unsupervised Learning:
 Unsupervised learning involves training an algorithm on unlabeled data, and the goal is to find
patterns or structures within the data.
 Clustering and dimensionality reduction are typical unsupervised learning tasks, as seen in
customer segmentation and feature extraction.
3. Semi-Supervised Learning:
 Semi-supervised learning combines elements of both supervised and unsupervised learning. It uses
a small amount of labeled data and a large amount of unlabeled data for training.
 It's useful when labeling data is expensive or time-consuming.
4. Reinforcement Learning:
 Reinforcement learning involves an agent that learns to interact with an environment to maximize
a reward. It learns by trial and error, receiving feedback in the form of rewards or penalties.
 Applications include game playing, robotics, and autonomous systems.
5. Self-Supervised Learning:
 Self-supervised learning is a type of unsupervised learning where the algorithm creates its own
labels or tasks from the data.
 It's often used for pre-training deep neural networks, where the model learns to predict parts of the
input data from other parts.
6. Transfer Learning:
 Transfer learning leverages knowledge gained from one task to improve performance on another
related task.

 Pre-trained models, such as BERT for natural language processing, are widely used examples.
7. Meta-Learning:
 Meta-learning focuses on teaching models to learn how to learn. It involves training models to
adapt quickly to new tasks.
 This can be useful in situations where an AI system needs to continually learn and adapt.
8. Online Learning:
 In online learning, models are updated continuously as new data becomes available. It is well-
suited for applications where data streams in real-time.
 Online learning can be used in recommendation systems and fraud detection.
9. Few-Shot and Zero-Shot Learning:
 Few-shot and zero-shot learning involve training models to make predictions with very few or no
examples from a particular class.
 This is valuable when data is scarce, as seen in identifying rare objects or languages.
10. Heuristic Search:
 Heuristic search methods use domain-specific knowledge or rules to guide the learning process.
 These methods are common in game playing AI, such as chess and Go.
Inductive Learning
 Inductive Learning Algorithm (ILA) is an iterative and inductive machine learning algorithm that is used
for generating a set of classification rules, which produces rules of the form “IF-THEN”, for a set of
examples, producing rules at each iteration and appending to the set of rules.
 There are basically two methods for knowledge extraction firstly from domain experts and then with
machine learning. For a very large amount of data, the domain experts are not very useful and reliable. So
we move towards the machine learning approach for this work. To use machine learning One method is to
replicate the expert’s logic in the form of algorithms but this work is very tedious, time taking, and
expensive. So we move towards the inductive algorithms which generate the strategy for performing a task
and need not instruct separately at each step.
 An technique of machine learning called inductive learning trains a model to generate predictions based on
examples or observations. During inductive learning, the model picks up knowledge from particular
examples or instances and generalizes it such that it can predict outcomes for brand-new data.
 When using inductive learning, a rule or method is not explicitly programmed into the model. Instead, the
model is trained to spot trends and connections in the input data and then utilize this knowledge to predict
outcomes from fresh data. Making a model that can precisely anticipate the result of subsequent instances
is the aim of inductive learning.
 In supervised learning situations, where the model is trained using labeled data, inductive learning is
frequently utilized. A series of samples with the proper output labels are used to train the model. The model
then creates a mapping between the input data and the output data using this training data. The output for
fresh instances may be predicted using the model after it has been trained.
 Inductive learning is used by a number of well-known machine learning algorithms, such as decision trees,
k-nearest neighbors, and neural networks. Because it enables the development of models that can accurately
anticipate new data, even when the underlying patterns and relationships are complicated and poorly
understood, inductive learning is an essential method for machine learning.
Advantages

 Because inductive learning models are flexible and adaptive, they are well suited for handling difficult,
complex, and dynamic information.
 Finding hidden patterns and relationships in data: Inductive learning models are ideally suited for tasks like
pattern recognition and classification because they can identify links and patterns in data that may not be
immediately apparent to humans.
 Huge datasets − Inductive learning models are suitable for applications requiring the processing of massive
quantities of data because they can efficiently handle enormous volumes of data.
 Appropriate for situations where the rules are ambiguous − Since inductive learning models may learn from
examples without explicit programming, they are suitable for situations when the rules are not precisely
described or understood beforehand.
Disadvantages
 May overfit to particular data − Inductive learning models that have overfit to specific training data, or that
have learned the noise in the data rather than the underlying patterns, may perform badly on fresh data.
 computationally costly possible − The employment of inductive learning models in real-time applications
may be constrained by their computationally costly nature, especially for complex datasets.
 Limited interpretability − Inductive learning models may be difficult to understand, making it difficult to
understand how they arrive at their predictions, in applications where the decision-making process must be
transparent and explicable.
 Inductive learning models are only as good as the data they are trained on, therefore if the data is inaccurate
or inadequate, the model may not perform effectively.
Decision Tree Classification Algorithm

o Decision Tree is a Supervised learning technique that can be used for both classification and
Regression problems, but mostly it is preferred for solving Classification problems. It is a tree-
structured classifier, where internal nodes represent the features of a dataset, branches
represent the decision rules and each leaf node represents the outcome.
o In a Decision tree, there are two nodes, which are the Decision Node and Leaf Node. Decision
nodes are used to make any decision and have multiple branches, whereas Leaf nodes are the
output of those decisions and do not contain any further branches.
o The decisions or the test are performed on the basis of features of the given dataset.
o It is a graphical representation for getting all the possible solutions to a problem/decision based
on given conditions.
o It is called a decision tree because, similar to a tree, it starts with the root node, which expands on
further branches and constructs a tree-like structure.
o In order to build a tree, we use the CART algorithm, which stands for Classification and
Regression Tree algorithm.
o A decision tree simply asks a question, and based on the answer (Yes/No), it further split the tree
into subtrees.
o Below diagram explains the general structure of a decision tree:

Why use Decision Trees?
There are various algorithms in Machine learning, so choosing the best algorithm for the given dataset
and problem is the main point to remember while creating a machine learning model. Below are the two
reasons for using the Decision tree:
o Decision Trees usually mimic human thinking ability while making a decision, so it is easy to
understand.
o The logic behind the decision tree can be easily understood because it shows a tree-like structure.
Decision Tree Terminologies

 Root Node: Root node is from where the decision tree starts. It represents the entire dataset, which
further gets divided into two or more homogeneous sets.
 Leaf Node: Leaf nodes are the final output node, and the tree cannot be segregated further after
getting a leaf node.
 Splitting: Splitting is the process of dividing the decision node/root node into sub-nodes
according to the given conditions.
 Branch/Sub Tree: A tree formed by splitting the tree.
 Pruning: Pruning is the process of removing the unwanted branches from the tree.
 Parent/Child node: The root node of the tree is called the parent node, and other nodes are called
the child nodes.
How does the Decision Tree algorithm Work?

In a decision tree, for predicting the class of the given dataset, the algorithm starts from the root node of
the tree. This algorithm compares the values of root attribute with the record (real dataset) attribute and,

based on the comparison, follows the branch and jumps to the next node.
For the next node, the algorithm again compares the attribute value with the other sub-nodes and move
further. It continues the process until it reaches the leaf node of the tree. The complete process can be
better understood using the below algorithm:
o Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
o Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
o Step-3: Divide the S into subsets that contains possible values for the best attributes.
o Step-4: Generate the decision tree node, which contains the best attribute.
o Step-5: Recursively make new decision trees using the subsets of the dataset created in step -3.
Continue this process until a stage is reached where you cannot further classify the nodes and
called the final node as a leaf node.
Example: Suppose there is a candidate who has a job offer and wants to decide whether he should accept
the offer or Not. So, to solve this problem, the decision tree starts with the root node (Salary attribute by
ASM). The root node splits further into the next decision node (distance from the office) and one leaf
node based on the corresponding labels. The next decision node further gets split into one decision node
(Cab facility) and one leaf node. Finally, the decision node splits into two leaf nodes (Accepted offers and
Declined offer). Consider the below diagram:
Explanation-based learning (EBL)

 Explanation-based learning is a type of machine learning that uses a very strong, or even flawless,
domain theory to generalise or create ideas from training data. It is also tied to Encoding, which
aids in learning.
 Explanation-Based Learning (EBL) is a moral strategy for improving supervised learning by
utilising available domain knowledge. As a result, the rate of learning, the certainty of learning,
the precision of the taught concept, or a combination of these can be enhanced.

 It is a method for an intelligent system to learn by seeing instances. The capacity of EBl systems
to generate justified generalisations from single training instances distinguishes them.
 Explanation-based learning can learn from just one training event. Rather than taking multiple
examples, explanation-based learning emphasises learning a single, specific model. Take, for
example, the Ludoo game. Explanation-based learning can learn from just one training event.
Explanation-based learning (EBL) is a form of learning in artificial intelligence (AI) that involves
understanding the underlying domain or problem space and generating explanations for specific instances.
Unlike other machine learning approaches that rely on numerous examples, EBL focuses on a deep
understanding of individual examples, generating explanations based on underlying principles, and
generalizing those explanations to cover other related instances.
Here’s a more detailed overview of EBL:
1. Analyzing the Problem: EBL starts by analyzing a particular instance of a problem, applying domain
knowledge to understand why a specific solution works.
2. Generating an Explanation: Based on the analysis, EBL generates an explanation of why the solution is
correct. This explanation leverages domain knowledge, logical reasoning, and underlying principles to
articulate why the solution makes sense.
3. Generalizing the Explanation: The explanation is then generalized to create a rule or a pattern that can
be applied to other similar instances. This generalization allows the system to learn from a single example
and apply the learning to many related cases.
4. Storing the Generalized Rule: The generalized rule is stored and used to solve similar problems in the
future. By having this rule, the system can quickly solve related problems without having to analyze each
one individually.
5. Integration with Other Learning Techniques: EBL can be integrated with other learning techniques,
such as inductive learning and analogical reasoning, to form more comprehensive and robust learning
models.
The advantage of EBL is that it can provide a deep understanding of specific problems, allowing for intelligent
generalization to related instances. However, it requires significant domain knowledge and logical reasoning
capabilities, making it a more complex approach than many other machine learning techniques.
Neural Network
The term "Artificial Neural Network" is derived from Biological neural networks that develop the
structure of a human brain. Similar to the human brain that has neurons interconnected to one another,
artificial neural networks also have neurons that are interconnected to one another in various layers of the
networks. These neurons are known as nodes.

The architecture of an artificial neural network:
To understand the concept of the architecture of an artificial neural network, we have to understand what
a neural network consists of. In order to define a neural network that consists of a large number of artificial
neurons, which are termed units arranged in a sequence of layers. Lets us look at various types of layers
available in an artificial neural network.
Artificial Neural Network primarily consists of three layers:
Input Layer:
As the name suggests, it accepts inputs in several different formats provided by the programmer.
Hidden Layer:
The hidden layer presents in-between input and output layers. It performs all the calculations to find
hidden features and patterns.
Output Layer:
The input goes through a series of transformations using the hidden layer, which finally results in output

that is conveyed using this layer.
The artificial neural network takes input and computes the weighted sum of the inputs and includes a bias.
This computation is represented in the form of a transfer function.
It determines weighted total is passed as an input to an activation function to produce the output.
Activation functions choose whether a node should fire or not. Only those who are fired make it to the
output layer. There are distinctive activation functions available that can be applied upon the sort of task
we are performing.
How do artificial neural networks work?

Artificial Neural Network can be best represented as a weighted directed graph, where the artificial
neurons form the nodes. The association between the neurons outputs and neuron inputs can be viewed
as the directed edges with weights. The Artificial Neural Network receives the input signal from the
external source in the form of a pattern and image in the form of a vector. These inputs are then
mathematically assigned by the notations x(n) for every n number of inputs.
Afterward, each of the input is multiplied by its corresponding weights ( these weights are the details
utilized by the artificial neural networks to solve a specific problem ). In general terms, these weights
normally represent the strength of the interconnection between neurons inside the artificial neural
network. All the weighted inputs are summarized inside the computing unit.
If the weighted sum is equal to zero, then bias is added to make the output non-zero or something else to
scale up to the system's response. Bias has the same input, and weight equals to 1. Here the total of

weighted inputs can be in the range of 0 to positive infinity. Here, to keep the response in the limits of the
desired value, a certain maximum value is benchmarked, and the total of weighted inputs is passed through
the activation function.
The activation function refers to the set of transfer functions used to achieve the desired output. There is
a different kind of the activation function, but primarily either linear or non-linear sets of functions. Some
of the commonly used sets of activation functions are the Binary, linear, and Tan hyperbolic sigmoidal
activation functions. Let us take a look at each of them in details:
Types of Artificial Neural Network:
There are various types of Artificial Neural Networks (ANN) depending upon the human brain neuron
and network functions, an artificial neural network similarly performs tasks. The majority of the artificial
neural networks will have some similarities with a more complex biological partner and are very effective
at their expected tasks. For example, segmentation or classification.
Feedback ANN:
In this type of ANN, the output returns into the network to accomplish the best-evolved results internally.
As per the University of Massachusetts, Lowell Centre for Atmospheric Research. The feedback networks
feed information back into itself and are well suited to solve optimization issues. The Internal system error
corrections utilize feedback ANNs.
Feed-Forward ANN:
A feed-forward network is a basic neural network comprising of an input layer, an output layer, and at
least one layer of a neuron. Through assessment of its output by reviewing its input, the intensity of the
network can be noticed based on group behavior of the associated neurons, and the output is decided. The
primary advantage of this network is that it figures out how to evaluate and recognize input patterns.
Genetic learning
Genetic Algorithms(GAs) are adaptive heuristic search algorithms that belong to the larger part of
evolutionary algorithms. Genetic algorithms are based on the ideas of natural selection and genetics.
These are intelligent exploitation of random search provided with historical data to direct the search into
the region of better performance in solution space. They are commonly used to generate high-quality
solutions for optimization problems and search problems.
Genetic algorithms simulate the process of natural selection which means those species who can adapt to
changes in their environment are able to survive and reproduce and go to next generation. In simple words,
they simulate “survival of the fittest” among individual of consecutive generation for solving a problem.
Each generation consist of a population of individuals and each individual represents a point in search
space and possible solution
Genetic Algorithms are being widely used in different real-world applications, for example, Designing
electronic circuits, code-breaking, image processing, and artificial creativity.

What is a Genetic Algorithm?
let's first understand basic terminologies to better understand this algorithm:
o Population: Population is the subset of all possible or probable solutions, which can solve the
given problem.
o Chromosomes: A chromosome is one of the solutions in the population for the given problem,
and the collection of gene generate a chromosome.
o Gene: A chromosome is divided into a different gene, or it is an element of the chromosome.
o Allele: Allele is the value provided to the gene within a particular chromosome.
o Fitness Function: The fitness function is used to determine the individual's fitness level in the
population. It means the ability of an individual to compete with other individuals. In every
iteration, individuals are evaluated based on their fitness function.
o Genetic Operators: In a genetic algorithm, the best individual mate to regenerate offspring better
than parents. Here genetic operators play a role in changing the genetic composition of the next
generation.
o Selection
How Genetic Algorithm Work?

The genetic algorithm works on the evolutionary generational cycle to generate high-quality solutions.
These algorithms use different operations that either enhance or replace the population to give an
improved fit solution.
It basically involves five phases to solve the complex optimization problems, which are given as below:
o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination
1. Initialization
The process of a genetic algorithm starts by generating the set of individuals, which is called population.
Here each individual is the solution for the given problem. An individual contains or is characterized by
a set of parameters called Genes. Genes are combined into a string and generate chromosomes, which is
the solution to the problem. One of the most popular techniques for initialization is the use of random
binary strings.

Fitness Assignment
Fitness function is used to determine how fit an individual is? It means the ability of an individual to
compete with other individuals. In every iteration, individuals are evaluated based on their fitness
function. The fitness function provides a fitness score to each individual. This score further determines
the probability of being selected for reproduction. The high the fitness score, the more chances of getting
selected for reproduction.
3. Selection
The selection phase involves the selection of individuals for the reproduction of offspring. All the selected
individuals are then arranged in a pair of two to increase reproduction. Then these individuals transfer
their genes to the next generation.
There are three types of Selection methods available, which are:
o Roulette wheel selection

o Tournament selection
o Rank-based selection
4. Reproduction
After the selection process, the creation of a child occurs in the reproduction step. In this step, the genetic
algorithm uses two variation operators that are applied to the parent population. The two operators
involved in the reproduction phase are given below:
o Crossover: The crossover plays a most significant role in the reproduction phase of the genetic algorithm.
In this process, a crossover point is selected at random within the genes. Then the crossover operator swaps
genetic information of two parents from the current generation to produce a new individual representing
the offspring.
The genes of parents are exchanged among themselves until the crossover point is met. These newly
generated offspring are added to the population. This process is also called or crossover. Types of
crossover styles available:
o One point crossover
o Two-point crossover
o Livery crossover

o Inheritable Algorithms crossover
Mutation
The mutation operator inserts random genes in the offspring (new child) to maintain the diversity in the
population. It can be done by flipping some bits in the chromosomes.
Mutation helps in solving the issue of premature convergence and enhances diversification. The below
image shows the mutation process:
Types of mutation styles available,
o Flip bit mutation
o Gaussian mutation
o Exchange/Swap mutation
5. Termination
After the reproduction phase, a stopping criterion is applied as a base for termination. The algorithm
terminates after the threshold fitness solution is reached. It will identify the final solution as the best
solution in the population.
Advantages of Genetic Algorithm

o The parallel capabilities of genetic algorithms are best.
o It helps in optimizing various problems such as discrete functions, multi-objective problems, and
continuous functions.
o It provides a solution for a problem that improves over time.
o A genetic algorithm does not need derivative information.
Limitations of Genetic Algorithms

o Genetic algorithms are not efficient algorithms for solving simple problems.
o It does not guarantee the quality of the final solution to a problem.
o Repetitive calculation of fitness values may generate some computational challenges.

Expert Systems:
An expert system is a computer program that is designed to solve complex problems and to provide decision-
making ability like a human expert. It performs this by extracting knowledge from its knowledge base using the
reasoning and inference rules according to the user queries.
It solves the most complex issue as an expert by extracting the knowledge stored in its knowledge base. The system
helps in decision making for compsex problems using both facts and heuristics like a human expert. It is called
so because it contains the expert knowledge of a specific domain and can solve any complex problem of that
particular domain. These systems are designed for a specific domain, such as medicine, science,
Characteristics of Expert System
o High Performance: The expert system provides high performance for solving any type of
complex problem of a specific domain with high efficiency and accuracy.
o Understandable: It responds in a way that can be easily understandable by the user. It can take
input in human language and provides the output in the same way.
o Reliable: It is much reliable for generating an efficient and accurate output.
o Highly responsive: ES provides the result for any complex query within a very short period of
time.
Components of Expert System

An expert system mainly consists of three components:
o User Interface
o Inference Engine
o Knowledge Base

1. User Interface
With the help of a user interface, the expert system interacts with the user, takes queries as an input in a
readable format, and passes it to the inference engine. After getting the response from the inference
engine, it displays the output to the user. In other words, it is an interface that helps a non-expert user
to communicate with the expert system to find a solution.
2. Inference Engine(Rules of Engine)
o The inference engine is known as the brain of the expert system as it is the main processing unit
of the system. It applies inference rules to the knowledge base to derive a conclusion or deduce
new information. It helps in deriving an error-free solution of queries asked by the user.
o With the help of an inference engine, the system extracts the knowledge from the knowledge
base.
o There are two types of inference engine:
o Deterministic Inference engine: The conclusions drawn from this type of inference engine are
assumed to be true. It is based on facts and rules.
o Probabilistic Inference engine: This type of inference engine contains uncertainty in
conclusions, and based on the probability.
Inference engine uses the below modes to derive the solutions:
o Forward Chaining: It starts from the known facts and rules, and applies the inference rules to
add their conclusion to the known facts.
o Backward Chaining: It is a backward reasoning method that starts from the goal and works
backward to prove the known facts.
3. Knowledge Base
o The knowledgebase is a type of storage that stores knowledge acquired from the different
experts of the particular domain. It is considered as big storage of knowledge. The more the
knowledge base, the more precise will be the Expert System.
o It is similar to a database that contains information and rules of a particular domain or subject.

o One can also view the knowledge base as collections of objects and their attributes. Such as a
Lion is an object and its attributes are it is a mammal, it is not a domestic animal, etc.
Components of Knowledge Base
o Factual Knowledge: The knowledge which is based on facts and accepted by knowledge
engineers comes under factual knowledge.
o Heuristic Knowledge: This knowledge is based on practice, the ability to guess, evaluation, and
experiences.
Knowledge Representation: It is used to formalize the knowledge stored in the knowledge base using
the If-else rules.
Knowledge Acquisitions: It is the process of extracting, organizing, and structuring the domain
knowledge, specifying the rules to acquire the knowledge from various experts, and store that knowledge
into the knowledge base.
Knowledge representation
It is the method used to organize and formalize the knowledge in the knowledge base. It is in the form of
IF-THEN-ELSE rules.
Knowledge Acquisition
The success of any expert system majorly depends on the quality, completeness, and accuracy of the
information stored in the knowledge base.
The knowledge base is formed by readings from various experts, scholars, and the Knowledge Engineers.
The knowledge engineer is a person with the qualities of empathy, quick learning, and case analyzing
skills.
He acquires information from subject expert by recording, interviewing, and observing him at work, etc.
He then categorizes and organizes the information in a meaningful way, in the form of IF-THEN-ELSE
rules, to be used by interference machine. The knowledge engineer also monitors the development of the
ES.
Advantages of Expert System
o These systems are highly reproducible.
o They can be used for risky places where the human presence is not safe.
o Error possibilities are less if the KB contains correct knowledge.
o The performance of these systems remains steady as it is not affected by emotions, tension, or
fatigue.
o They provide a very high speed to respond to a particular query.
Limitations of Expert System
o The response of the expert system may get wrong if the knowledge base contains the wrong
information.

o Like a human being, it cannot produce a creative output for different scenarios.
o Its maintenance and development costs are very high.
o Knowledge acquisition for designing is much difficult.
o For each domain, we require a specific ES, which is one of the big limitations.
o It cannot learn from itself and hence requires manual updates.
Applications of Expert System
o In designing and manufacturing domain
It can be broadly used for designing and manufacturing physical devices such as camera lenses
and automobiles.
o In the knowledge domain
These systems are primarily used for publishing the relevant knowledge to the users. The two
popular ES used for this domain is an advisor and a tax advisor.
o In the finance domain
In the finance industries, it is used to detect any type of possible fraud, suspicious activity, and
advise bankers that if they should provide loans for business or not.
o In the diagnosis and troubleshooting of devices
In medical diagnosis, the ES system is used, and it was the first area where these systems were
used.
o Planning and Scheduling
The expert systems can also be used for planning and scheduling some particular tasks for
achieving the goal of that task.
Expert System Shells

Expert system shells are toolkits that can be used to develop expert systems. They consist of some built
expert system components with an empty knowledge base. Hence, in most cases, the knowledge engineer
is left with only populating the knowledge base. It is essentially a special-purpose tool that is built-in in
line with the requirements and standards of a particular domain or expert-knowledge area applications. It
may be defined as a software package that facilitates the building of knowledge-based expert systems by
providing a knowledge representation scheme and an inference engine.
Expert System Shell Structure
The Expert System Shell refers to a software module containing an:
1. User interface (built-in)

2. Inference engine (built-in)
3. A structured skeleton of a knowledge base (in its empty state) with the suitable knowledge
representation facilities
Not only the above components but also some ES Shells provide facilities for database connectivity
through interpreter, web integration, and natural language processing (NLP) features.

The user interface is the portal available for both end-users (who use the expert system to get solutions)
and the knowledge engineer (who perform the knowledge engineering and modelling).
The knowledge base can be connected with an external database (like MySQL) since the knowledge base
is not optimal in storing extensive data. The knowledge base cannot directly access the database and these
access features are mediated through an interpreter.
Some ES Shells have inbuilt knowledge base editors which facilitate the Knowledge Engineer to easily
update and check the knowledge base. Knowledge Engineer collects the expertise knowledge in a specific
domain and models in populating the knowledge base.
Inference engine which is the most important part of an expert system access the knowledge base and
solves the problem by either backward chaining or forward chaining of facts and rules in the knowledge
base. In ES Shells, the inference engine is also a built-in component that is usually programmed in Prolog.
Most ES shells are composed of another component called ‘Explanation System’ which provides the user
with reasons and explanations to provide a certain answer, by considering the ‘case specification data’
available.
In an expert system shell, the design of the user interface and other software components are programmed
by the software engineer. Therefore, an expert system is a collaborative design of 03 major parties: expert,
knowledge engineer and software engineer. (Depending on the size these parties may vary from
individuals to large teams)
What is the main advantage of expert system shells?
The advantage of expert system shells was that they were somewhat easier for nonprogrammers to use.
The advantage of Prolog environments was that they were not focused only on if-then rules; Prolog
environments provided a much better realization of a complete first-order logic environment.
(Subject In-charge)
Prof.S.B.Mehta)

AI all notes

Uploaded by

Copyright:

Available Formats

AI all notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AI all notes

Uploaded by

Copyright:

Available Formats

NUTAN MAHARASHTRA VIDYA PRASARAK MANDAL’S

NUTAN COLLEGE OF ENGINEERING & RESEARCH (NCER)

Unit 1: Introduction (07Hrs)

⮚ The Foundations of Artificial Intelligence

⮚ The History of Artificial Intelligence

⮚ Agents and Environments Good Behavior

⮚ The Concept of Rationality

⮚ The Nature of Environments The Structure of Agents.

⮚ The Structure of Agents.

Why Artificial Intelligence:

Goals of Artificial Intelligence

1. Artificial narrow intelligence (ANI), which has a narrow range of abilities;

The birth of Artificial Intelligence (1952-1956)

The golden years-Early enthusiasm (1956-1974)

The first AI winter (1974-1980)

The second AI winter (1987-1993)

The emergence of intelligent agents (1993-2011)

Deep learning, big data and artificial general intelligence (2011-present)

Advantages of Artificial Intelligence

Disadvantages of Artificial Intelligence

What are Agent and Environment?

 Percept − It is agent’s perceptual inputs at a given instance.

 Agent Function − It is a map from the precept sequence to an action.

The Concept of Rationality:

Rationality of an agent depends on the following −

1. Fully observable vs Partially Observable

1. Fully observable vs Partially Observable:

The Structure of Intelligent Agents

Agent = Architecture + Agent program

Architecture: Architecture is machinery that an AI agent executes on.

Agent’s structure can be viewed as −

 Agent = Architecture + Agent Program

 Architecture = the machinery that an agent executes on.

 Agent Program = an implementation of an agent function.

1. Simple Reflex Agent

2. Model-based reflex agent

Unit 2: Problem Solving (06Hrs)

⮚ Searching for Solutions

⮚ Uninformed Search Strategies

⮚ Informed (Heuristic) Search Strategies, Heuristic

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

Unit 2: Problem Solving

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

Steps performed by Problem-solving agent

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

Some Toy Problems

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

Search Algorithm Terminologies

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

Properties of Search Algorithms:

o Completeness: A search algorithm is said to be complete if it guarantees to return a solution if at

Types of search algorithms

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

● DFS uses a LIFO stack data structure for its implementation.

● The process of the DFS algorithm is similar to the BFS algorithm.

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE

Root node--->Left node ----> right node.

T(n)= 1+ n2+ n3 +.........+ nm=O(nm)

ASST PROF:- S.B.MEHTA NCER, BATU UNIVERSITY, LONERE