0% found this document useful (0 votes)
14 views

Parsing HTML in PHP Using Native Classes - CoralNodes

Uploaded by

vinir80220
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Parsing HTML in PHP Using Native Classes - CoralNodes

Uploaded by

vinir80220
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

30.09.

2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

TOPICS
Parsing WordPress Tips (9)
HTML in Web Performance (9)
PHP using Plugins (8)
Native Web Hosting (5)
Classes ThemesDisclosure:
(4) A liate links used

Updated on March 19, 2019 by Web Design (3)


Abhinav
Web Development (3)

As you might already Web Analytics (3)


know, PHP is a popular
Apps & Tools (3)
backend language that
powers many popular Web Security (2)

CMSs, including Social Media (1)


WordPress. If you are
stepping into
WordPress or PHP
development, you will
nd this article helpful.

You might already know


how to parse HTML
using Javascript or
JQuery if you have ever
dealt with DOM
(Document Object
Model) manipulation on
the front-end.

https://www.coralnodes.com/pars ng-html- n-php/ 1/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

Related: Should you


learn JQuery in 2019?

Since Javascript runs


on the client-side, it can
interact with the
browser DOM.

But what if we want to


process HTML data on
the server? In this post,
let us look at some of Disclosure: A liate links used

the useful PHP classes


which enables us to
process HTML on the
server-side.

Table of
Contents
1. What is Parsing &
What are its Uses?
2. Important DOM
classes in PHP
3. DOMDocument,
Nodes & Elements
4. Practical
Examples
4.1. Selecting by ID
4.2. Selecting a Tag
by Its Name
4.3. Find elements
with a particular
class

https://www.coralnodes.com/pars ng-html- n-php/ 2/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

4.4. Extract links


from a page
4.5. Modifying &
Saving HTML
4.5.1. Inserting new

HTML element into

the document

4.5.2. Deleting an

element from the

document

4.6. Manipulating
Disclosure: A liate links used
Attributes
5. Conclusion

What is
Parsing &
What are
its Uses?

“ Parsing
(in
this
case)
is
the
process
of
extracting

https://www.coralnodes.com/pars ng-html- n-php/ 3/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

or
modifying
useful
information
from
an
HTML
or
XML Disclosure: A liate links used

string.
A
parser
gives
us
easy
ways
to
query
raw
data
instead
of
using
regex.

https://www.coralnodes.com/pars ng-html- n-php/ 4/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

Suppose you want to


get all the links on a
web page. PHP DOM
parsing classes can
help you.

The Table of Contents


you see above is
another simple
application of PHP DOM
parsing classes. In that
Disclosure: A liate links used
plugin, it extracts all the
headings from the page,
sorts it, creates a new
element, and inserts it
back into the page
content.

Important
DOM
classes in
PHP

There are around


nineteen DOM-related
classes in PHP. Some
of the important ones
are:

DOMDocument
(extends DOMNode
class)

DOMNode

https://www.coralnodes.com/pars ng-html- n-php/ 5/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

DOMNodeList

DOMXPath

DOMElement
(extends DOMNode
class)

DOMDocument,
Nodes &
Elements
Disclosure: A liate links used

The DOMDocument is the


rst one to mention
here. It takes HTML as
input and returns an
object that gives access
to DOM elements. It can
load HTML or XML from
a string or le. The class
de nes several methods
like getElementById
which resemble the
functions in Javascript.

$dom = new DOMDocumen

//examples

//methods to load HTM


$dom->loadHTML($html_
$dom->loadHTMLFile('p

//methods to load XML


$dom->load('path/to/x
$dom->loadXML($xml_st

https://www.coralnodes.com/pars ng-html- n-php/ 6/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

$documentElement = $d
//object of DOMElemen

In this post, we will


mainly think about
HTML manipulation
over XML.

Nodes

The DOM made from


HTML is a tree-like Disclosure: A liate links used

structure made up of
individual nodes. These
nodes can be of any
type, say an element,
text, comment, attribute
etc. DOMNode is the base
class from which all
types of node classes
inherit.

Elements

The DOMElement class


extends the DOMNode
class which can
represent the elements
in your HTML markup.
An object of DOMElement
can be any element like
an image, div, span,
table etc.

https://www.coralnodes.com/pars ng-html- n-php/ 7/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

Practical
Examples

Without going more into


the theories, let us dive
into some practical
examples. First of all,
we want some HTML
data. For that, let us use
one of the posts in this
blog about image Disclosure: A liate links used

optimization.

We will do the following


jobs with our sample
HTML:

Select element by Id

Get elements by its


tag name

Find elements by
class

Find all links in a


page

Inserting HTML
element

Deleting an element

Dealing with
attributes

Here is the curl request:

https://www.coralnodes.com/pars ng-html- n-php/ 8/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

header('Content-Type:
$url = "https://www.c

$ch = curl_init();
curl_setopt($ch, CURL
curl_setopt($ch, CURL
curl_setopt($ch, CURL

$res = curl_exec($ch)

curl_close($ch);

The variable $res


Disclosure: A liate links used
contains the whole
HTML from the web-
page.

Selecting by ID

If you look at our


sample page, you can
see that it contains two
tables. Suppose I want
to nd the number of
rows in the rst table.
Using chrome dev-tools,
I found that the required
table has the Id –
#tablepress-3 .

$dom = new DomDocumen


@ $dom->loadHTML($res

$table = $dom->getEle
$child_elements = $ta
$row_count = $child_e

echo "No. of rows in


https://www.coralnodes.com/pars ng-html- n-php/ 9/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

The above code gives


the output:

No. of rows in the t

Selecting a Tag
by Its Name

Both the DOMDocument


Disclosure: A liate links used
and DOMElement classes
have the method
getElementsByTagName()
which allows us to
select elements using
the name of the tag. For
example, if we have to
get all the h2 headings
from a page, we can use
this function.

$dom = new DomDocumen


@ $dom->loadHTML($res

$h2s = $dom->getEleme
foreach( $h2s as $h2
echo $h2->textCon
}

The result:

Test Images
Results after Compre
ShortPixel

https://www.coralnodes.com/pars ng-html- n-php/ 10/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

reSmush.it
Imagify
TinyPNG Compress JPE
Kraken.IO
EWWW Image Optimizer
WP Smush
Do you actually need
Consclusion

Find elements
with a particular
class Disclosure: A liate links used

In Javascript, the
querySelectorAll()
method makes it easy
to select any elements
using a CSS selector. In
PHP, it is not that
straightforward. Instead,
we have to use the
DOMXpath class to query
and traverse the DOM
tree.

Example: Select all the


tables with the class
tablepress.

$dom = new DomDocumen


@ $dom->loadHTML($res

$xpath = new DOMXpath


$tables = $xpath->que
$count = $tables->len

https://www.coralnodes.com/pars ng-html- n-php/ 11/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

echo "No. of tables "

Just like
getElementByTagName() ,
the query() method of
DOMXpath also returns a
DOMNodeList . It takes an
expression as an
argument. This XPath
expression is so
Disclosure: A liate links used
versatile that we can
perform almost any type
of queries.

If you are new to XPath,


this cheatsheet from
Devhints.io contains a
wide list of CSS & JS
selectors and their
corresponding XPath
expressions. It will help
you in nding out the
appropriate expression
for the query you want
to perform.

Extract links
from a page

Parsing opens a number


of opportunities.
Extracting the links from
a web-page is one such
use. That’s how
https://www.coralnodes.com/pars ng-html- n-php/ 12/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

crawlers crawl the world


wide web.

Suppose I want to nd
all the external links to a
particular website on a
web-page. In our
sample page, what I like
to do is to nd all the
outbound links to the
wordpress.org website
Disclosure: A liate links used
from the blog post. So,
this is how I did it.

$dom = new DomDocumen


@ $dom->loadHTML($res

$links = $dom->getEle
$urls = [];
foreach($links as $li
$url = $link->get
$parsed_url = par
if( isset($parsed
$urls[] = $ur
}
}
var_dump($urls);

Modifying &
Saving HTML

So far we saw how to


extract or select the
required data from
HTML. Now, let us see
how we can modify it by

https://www.coralnodes.com/pars ng-html- n-php/ 13/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

adding or deleting
elements and attributes.

Inserting new
HTML element into
the document
In this example, we will
see how to add an
image with a link after
the rst paragraph. This
is how you insert banner
ads between posts. Disclosure: A liate links used

$dom = new DomDocumen


@ $dom->loadHTML($htm

$ps = $dom->getElemen
$first_para = $ps->it

$html_to_add = '<div>
$dom_to_add = new DOM
@ $dom_to_add->loadHT
$new_element = $dom_t

$imported_element = $
$first_para->parentNo

$output = @ $dom->sav
echo $output;

Note that The


saveHTML() method
return the manipulated
html string.

Deleting an element
from the document

https://www.coralnodes.com/pars ng-html- n-php/ 14/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

To delete an element
from our HTML, we can
make use of the
removeChild() method
from the DOMElement
class.

$html = '<p>This is o
<div class="del">Dele
<p>This is our second
<p>This is our third
<div class="del">Dele
Disclosure: A liate links used

$dom = new DomDocumen


@ $dom->loadHTML($htm
$documentElement = $d
echo $dom->saveHTML()

$xpath = new DOMXpath


$elems = $xpath->quer

foreach( $elems as $e
$elem->parentNode
}
echo '<br><br>-------
echo $dom->saveHTML()

Here we have
performed an XPath
query to nd all the
elements with the class
del . Then we remove
each node from the
document by iterating
over the DOMNodeList
object using a foreach
loop.

https://www.coralnodes.com/pars ng-html- n-php/ 15/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

This is our first pa


Delete this
This is our second p
This is our third pa
Delete this too

-------after deletio

This is our first pa


This is our second p
This is our third pa

Disclosure: A liate links used

Manipulating
Attributes

Classes and Ids are not


the only attributes we
can access in PHP
DOM. The DOMElement
class has several
functions which can get,
set or remove attributes
from an element. These
methods look similar to
that of Javascript. So
you will nd it easy to
understand.

getAttribute($attribute_name)
– get the value of an
attribute

setAttribute($attribute_name,
$attribute_value) –

https://www.coralnodes.com/pars ng-html- n-php/ 16/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

set the value of an


attribute

hasAttribute($attribute_name)
– checks whether an
element has a
certain attribute and
returns a true or
false

$html = '<span class=


$dom = new DomDocumen
Disclosure: A liate links used
@ $dom->loadHTML($htm
$elem = $dom->getElem

if( $elem->hasAttribu
echo 'attribute v
$elem->setAttribu
echo '<br>updated
}

Conclusion

So far, we have looked


into some of the
important DOM APIs in
PHP. I hope that it will
help you to get started
in parsing HTML and
XML data with ease. If I
am not clear in certain
points, do ask it in the
comments.

About the

https://www.coralnodes.com/pars ng-html- n-php/ 17/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

author
Abhinav R
(Vishnu) is a
blogger with
a keen
interest in
learning
web trends
and
exploring
the world of
Disclosure: A liate links used
WordPress.
Apart from
that, he also
has a
passion for
nature
photography
and travel.

Posted in Guides &


Tips Tagged Web
Development

WP How
Super to
Cache Delete
vs and
WP Limit
Fastest WordPr
Cache Post
– Revisio
https://www.coralnodes.com/pars ng-html- n-php/ 18/20
30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

Which
is
the
Best?

Leave a Reply
Your email address will
not be published.
Required elds are
marked *
Disclosure: A liate links used
Comment

Name *

Email *

Website

POST COMMENT

USEFUL LINKS POPULAR TAGS

About Web Performance

Contact WordPress Themes

Privacy Policy WordPress Plugins

Terms and Conditions Analytics

Disclaimer SEO

Disclosure

https://www.coralnodes.com/pars ng-html- n-php/ 19/20


30.09.2019 Pars ng HTML n PHP us ng Nat ve Classes | CoralNodes

Copyright © 2011-2019 CoralNodes.Com Hosted with Cloudways on DigitalOcean

Disclosure: A liate links used

https://www.coralnodes.com/pars ng-html- n-php/ 20/20

You might also like