tags in PHP">

Retrieve all tags in PHP

Go To StackoverFlow.com

0

I used the following regex:

$regex = '/<a href=\"([^\"]*)\">(.*)<\/a>/iU';

but it always fail to retrieve the tags that I wanted.

It always miss out on the following tags:

<a href="http://site.com/folder/img1.jpg" name="test">

and also it will retrieve those that I do not want such as:

<a href="mailto:helloworld@hotmail.com">

and

<a href="http://site.com/folder/index.html">

How do I modify my regex so that it will retrieve all the <a href="....jpg" and if I got the following:

<a href="http://site.com/folder/img1.jpg" name="test">

it will simply display

<a href="http://site.com/folder/img1.jpg">

and also it will not retrieve the followings:

<a href="mailto:helloworld@hotmail.com">

and

<a href="http://site.com/folder/index.html">

Thank you.

Would appreciate if could provide freeware that can help to generate regex.

2012-04-04 01:13
by Jack
HTML, regex, etc. Have you tried an HTML parser?deceze 2012-04-04 01:19


2

Try the regex

$regex = '/(<a href="([^"]+)\.jpg")[^>]*>/iU';

And replace with '\1>'.

Notes:

  • Removed the escape in front of the "; not necessary (although you can leave them in if you want, it doesn't mak a difference)
  • Added an explicit \.jpg just before the last " to only match links ending with .jpg. You might consider \.jpe?g to allow '.jpeg' as well as '.jpg' (although the former is not that common)
  • Added a [^>]* before the > of the first <a href=...> to allow for optional extra attributes like name="asdf"
  • Added capturing brackets around the (<a href="xxx") bit so that I can replace with \1> (hence stripping out all the extra attributes).

Re a regex generating tool, I don't know of any that generate regex. I think your best bet is to learn regex yourself and then use an interactive tester to quickly develop it.

I highly recommend regexr.com.

If you follow that link you'll see exactly the regex I entered in and some test data to play around with it.

Then you can play around wth the regex and see the results in real-time -- it's very helpful for fast development of regexes.

(Although, regexr.com does not offer the ungreedy 'U' tag; just convert all + to +? and * to *? in the regex to simulate this).

2012-04-04 01:28
by mathematical.coffee


1

I don't know for what exactly are you using this regex, but i thinks this should work for you:

$your_string = '<a href="http://site.com/folder/img1.jpg" name="test">';
preg_match('@<a href="(.*?)".*?>(.*<\/a>)?@', $your_string, $matches);

print_r($matches) // Array ( [0] => http://site.com/folder/img1.jpg )
2012-04-04 01:23
by Igor Escobar


1

Check out http://gskinner.com/RegExr/.

I love that thing.

It will teach you how to construct your own patterns.

Regex (regular expressions) is an invaluable programming skill that is applicable in mane server side and client side programming languages.

2012-04-04 01:24
by iambriansreed


1

This will do what you want, perhaps differently from how you were expecting to do it...

<?php
// set up to parse our input
$dom = new DOMDocument();
$dom->loadHTMLFile("input.html");
$xpath = new DOMXPath($dom);

$anchors = $xpath->query("//a[contains(@href, 'http') and contains(@href, '.jpg')]");

foreach ($anchors as $anchor) {
  echo $anchor->C14N() . "\n";
}
?>
2012-04-04 11:05
by dldnh