PHP Simple HTML DOM Parser Script

After uploading the class file, the simple HTML DOM class instance has to be created. There are three ways to create a DOM class:

  • Load HTML from a file
  • Load HTML from a URL
  • Load HTML from a string
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a HTML file
$html->load_file('path-to-file/example.html');
// Load HTML from an URL
$html->load_file('http://www.yourdomainname.com/');
// Load HTML from a string
 $html->load('<html><body>All the Besttttt!</body></html>');

If you use “Load HTML from a string” and want more control over HTTP request, then use CURL instead to fetch HTML to a string and after that load the DOM class object from a string.

You can use the find function to find HTML DOM elements on the page. It returns an object or an array of an objects.

Examples:

//Find elements by tag name. Example: <p> tag. Keep in mind that it returns an array with object elements.
$p = $html->find('p');
// Find the element where the id is equal to a particular value
For example : div with id="header"
$main = $html->find('div[id=header]',0);
// Find (N)th element, where the first element is 0 and returns object or null if object not found.
$a = $html->find('a', 0);
//Query for finding elements which have attribute id
$divs = $html->find('[id]');
//Find elements that have id attribute. For example, find divs which have id attribute.
$divs = $html->find('div[id]');

Use “selectors” to find DOM Elements:

// Find all elements where id=header. Note that two elements with the same ids is not valid HTML.
$result = $html->find('#header');
// Query for finding all elements where class=container
$result = $html->find('.container');
// For finding elements by tag name
$result = $html->find('b, p');
// Find elements by tag name where certain attribute value exists For example: find all anchors and
images with the attribute title.
$result = $html->find('a[title], img[title]');

Parent, child and sibling elements selection using built-in functions:

// returns the parent of a DOM element
$result->parent;
// returns element children in an array
$result->children;
// returns a specified child
$result->children(0);
// returns first child of an element. If it’s not found then returns null
$result->first_child ();
// returns last child of an element
$result->last _child ();
// For finding previous sibling of an element
$result->prev_sibling ();
//returns next sibling of an element
$result->next_sibling ();

Attribute Operators:

With simple regular expressions, we can use different attribute selectors.

  • [attribute] – Select HTML DOM elements that have a certain attribute
  • [attribute=value] – elements which have the specified attribute with a specific value.
  • [attribute!=value]- elements which don’t have the specified attribute with a specific value.
  • [attribute*=value] – elements with the particular attribute whose value contains the specified value
  • [attribute$=value] – elements with the specified attribute whose value ends with the specified value
  • [attribute^=value] – elements with the specified attribute whose value begins with the certain

Attributes are actually object variables:

$link = $html->find('a',0)->href;

Each object has four attributes:

  • tag – returns the tag name
  • innertext – returns inner HTML of an element
  • outertext – returns outer HTML of an element
  • plaintext – returns plain text (without HTML tags)

Editing HTML Elements with PHP Simple HTML DOM Parser

Edit an attribute is similar to reading their values.

// Change or set attribute value
$a->href = 'http://www.yourdomainname.com';
// Remove an attribute.
$a->href = null;
// Check if attribute exists
if(isset($a->href)) {
	//do something here
}

There are no special functions to append or remove elements, but there are some methods:

// Wrap an element
$result->outertext = '<div class="wrap">' . $result->outertext . '<div>';
// Remove an element
$result->outertext = '';
// Append an element
$result->outertext = $result->outertext . '<div>header<div>';
// Insert an element
$result->outertext = '<div>header<div>' . $result->outertext;

To save the DOM document just put the DOM object into a variable:

$doc = $html;
// Display the page
echo $doc;

Prevent PHP Simple HTML DOM Parser Memory Leak

Always be careful about memory leak because it can slow your website. You can add the following line to avoid memory leaks.

$html->clear();

Happy Coding!!