• Mar
  • 30
  • 2015

In this blog, we have explained some elements to scrap data from external websites. If you do not want to go deeper in this article then you can simply hire web scrapper at competitigve price.

Simple HTML DOM parser is a PHP 5+ class. It is useful to manipulate HTML elements. This class can work with both valid HTML and HTML class that didn’t pass W3C validation. You can find elements by ids, classes, tags and many more. You can also add, delete or alter DOM elements. The only one thing you should care about is memory leaks. But you can avoid memory leaks.

Get Started with PHP Simple HTML DOM Parser

After uploading the class file, the simple HTML DOM class instance has to be created.

There are three ways to create DOM class:

  1. Load HTML from a file
  2. Load HTML from a URL
  3. Load HTML from a string
<?php
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a HTML file
$html->load_file('path-to-file/example.html');
// Load HTML from an URL
$html->load_file('http://www.yourdomainname.com/');
// Load HTML from a string
$html->load('<html><body>All the Besttttt!</body></html>');
?>

If you use “Load HTML from a string” and want more control over HTTP request, then use CURL to fetch HTML to a string and after that load the DOM class object from a string.

Find HTML Elements using PHP Simple HTML DOM Parser

You can use the find function to find HTML DOM elements. It returns an object or an array of an object.

Examples:

<?php
//Find elements by tag name. Example: <p> tag. Keep in mind that it returns an array with object elements.
$p = $html->find('p');
// Find the element where the id is equal to a particular value
For example : div with id="header"
$main = $html->find('div[id=header]',0);
// Find (N)th element, where the first element is 0 and returns object or null if object not found.
$a = $html->find('a', 0);
//Query for finding elements which have attribute id
$divs = $html->find('[id]');
//Find elements that have id attribute. For example, find divs which have id attribute.
$divs = $html->find('div[id]');
?>

Use “selectors” to find DOM Elements:

<?php
// Find all elements where id=header. Note that two elements with the same ids is not valid HTML.
$result = $html->find('#header');
// Query for finding all elements where class=container
$result = $html->find('.container');
// For finding elements by tag name
$result = $html->find('b, p');
// Find elements by tag name where certain attribute value exists For example: find all anchors and images with the attribute title.
$result = $html->find('a[title], img[title]');
?>

Parent, child and sibling elements selection using built-in functions:

<?php
// returns the parent of a DOM element
$result->parent;
// returns element children in an array
$result->children;
// returns a specified child
$result->children(0);
// returns first child of an element. If it’s not found then returns null
$result->first_child ();
// returns last child of an element
$result->last _child ();
// For finding previous sibling of an element
$result->prev_sibling ();
//returns next sibling of an element
$result->next_sibling ();
?>

Attribute Operators

With simple regular expressions, we can use attribute selectors.

  • [attribute] – Select HTML DOM elements that have a certain attribute
  • [attribute=value] – elements which have the specified attribute with a specific value.
  • [attribute!=value]- elements which don’t have the specified attribute with a specific value.
  • [attribute*=value] – elements with the particular attribute whose value contains the specified value
  • [attribute$=value] – elements with the specified attribute whose value ends with the specified value
  • [attribute^=value] – elements with the specified attribute whose value begins with the certain

Accessing DOM Element Attributes with PHP Simple HTML DOM Parser

Attributes are actually object variables:

<?php
$link = $html->find('a',0)->href;
?>

Each object has four attributes:

  1. tag – returns the tag name
  2. innertext – returns inner HTML of an element
  3. outertext – returns outer HTML of an element
  4. plaintext – returns plain text (without HTML tags)

Editing HTML Elements with PHP Simple HTML DOM Parser

Edit an attribute is similar to reading their values.

<?php
// Change or set attribute value
$a->href = 'http://www.yourdomainname.com';
// Remove an attribute.
$a->href = null;
// Check if attribute exists
if(isset($a->href)) {
//do something here
}
?>

There are no special functions to append or remove elements, but there are some methods:

<?php
// Wrap an element
$result->outertext = '<div class="wrap">' . $result->outertext . '<div>';
// Remove an element
$result->outertext = '';
// Append an element
$result->outertext = $result->outertext . '<div>header<div>';
// Insert an element
$result->outertext = '<div>header<div>' . $result->outertext;
?>

To save the DOM document just put the DOM object into a variable:

<?php
$doc = $html;
// Display the page
echo $doc;
?>

Prevent PHP Simple HTML DOM Parser Memory Leak

Always be careful about memory leak because it can slow own your website. You can add following lines to avoid memory leaks.

<?php
$html->clear();
?>

Happy Coding!!

If data scrapping is your requirement and programming is not your core expertise then you can get data scrapping services at competitive rate by clicking here.

Ravi Makhija

Ravi Makhija

A writer, an Entrepreneur. Curious about the internet of everything. Interested in the cutting edge landscape of mobile apps and SAAS products. Blogs for Guru Technolabs - A Mobile App Development Company.

Popular Blogs

Among the many digital tools that are being used with increasing frequency, the one that has the greatest...

Having a Best Website like opening a door and inviting…

In this internet era, the business website is the most…

The Live chat is the common thing now days, if…

Let's Discuss a Project

Please share your unique idea or project requirement with us, our business concern person will get back to you with further details.