From eccd3dad0c81cd958ae186a309e02abb53de8675 Mon Sep 17 00:00:00 2001 From: Jakub Zalas Date: Fri, 13 Sep 2013 21:42:44 +0100 Subject: [PATCH 1/4] [DomCrawler] Documented xml namespace autodiscovery. --- components/dom_crawler.rst | 46 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/components/dom_crawler.rst b/components/dom_crawler.rst index 9d4f4b701fa..554405ff72d 100644 --- a/components/dom_crawler.rst +++ b/components/dom_crawler.rst @@ -95,6 +95,52 @@ To remove a node the anonymous function must return false. All filter methods return a new :class:`Symfony\\Component\\DomCrawler\\Crawler` instance with filtered content. +Both :method:`Symfony\\Component\\DomCrawler\\Crawler::filterXPath` and +:method:`Symfony\\Component\\DomCrawler\\Crawler::filter` methods work with +XML namespaces, which are automatically registered. + +.. versionadded:: 2.4 + Auto discovery of namespaces was introduced in Symfony 2.4. + +Consider an XML below: + + + + tag:youtube.com,2008:video:kgZRZmEc9j4 + + + + Chordates - CrashCourse Biology #24 + widescreen + + + +It can be filtered with ``DomCrawler`` without a need to register namespace +aliases both with :method:`Symfony\\Component\\DomCrawler\\Crawler::filterXPath`:: + + $crawler = $crawler->filterXPath('//default:entry/media:group//yt:aspectRatio'); + +and :method:`Symfony\\Component\\DomCrawler\\Crawler::filter`:: + + use Symfony\Component\CssSelector\CssSelector; + + CssSelector::disableHtmlExtension(); + $crawler = $crawler->filter('default|entry media|group yt|aspectRatio'); + +.. note:: + + The default namespace is registered with a name "default". + +.. caution:: + + To query an XML with a CSS selector, the HTML extension needs to be disabled with + :method:`Symfony\\Component\\CssSelector\\CssSelector::disableHtmlExtension` + to avoid converting the selector to lowercase. + + Node Traversing ~~~~~~~~~~~~~~~ From 174904cba4c387e0e2dcb15fdc5e8e0aae5a6e85 Mon Sep 17 00:00:00 2001 From: Jakub Zalas Date: Wed, 18 Sep 2013 13:22:31 +0100 Subject: [PATCH 2/4] [DomCrawler] Documented changing of default namespace prefix. --- components/dom_crawler.rst | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/components/dom_crawler.rst b/components/dom_crawler.rst index 554405ff72d..609eb576fa0 100644 --- a/components/dom_crawler.rst +++ b/components/dom_crawler.rst @@ -132,7 +132,9 @@ and :method:`Symfony\\Component\\DomCrawler\\Crawler::filter`:: .. note:: - The default namespace is registered with a name "default". + The default namespace is registered with a prefix "default". It can be + changed with the + :method:`Symfony\\Component\\DomCrawler\\Crawler::setDefaultNamespacePrefix`. .. caution:: From f7a16c2255d540faa7ee93a31d9ab3f8e69b24d0 Mon Sep 17 00:00:00 2001 From: Jakub Zalas Date: Sun, 22 Sep 2013 23:41:41 +0100 Subject: [PATCH 3/4] [DomCrawler] Documented explicit namespace registration. --- components/dom_crawler.rst | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/components/dom_crawler.rst b/components/dom_crawler.rst index 609eb576fa0..354c2f109ea 100644 --- a/components/dom_crawler.rst +++ b/components/dom_crawler.rst @@ -97,10 +97,12 @@ To remove a node the anonymous function must return false. Both :method:`Symfony\\Component\\DomCrawler\\Crawler::filterXPath` and :method:`Symfony\\Component\\DomCrawler\\Crawler::filter` methods work with -XML namespaces, which are automatically registered. +XML namespaces, which can be either automatically discovered or registered +explicitly. .. versionadded:: 2.4 - Auto discovery of namespaces was introduced in Symfony 2.4. + Auto discovery and explicit registration of namespaces was introduced + in Symfony 2.4. Consider an XML below: @@ -136,13 +138,18 @@ and :method:`Symfony\\Component\\DomCrawler\\Crawler::filter`:: changed with the :method:`Symfony\\Component\\DomCrawler\\Crawler::setDefaultNamespacePrefix`. +Namespaces can be explicitly registered with the +:method:`Symfony\\Component\\DomCrawler\\Crawler::registerNamespace`:: + + $crawler->registerNamespace('m', 'http://search.yahoo.com/mrss/'); + $crawler = $crawler->filterXPath('//m:group//yt:aspectRatio'); + .. caution:: To query an XML with a CSS selector, the HTML extension needs to be disabled with :method:`Symfony\\Component\\CssSelector\\CssSelector::disableHtmlExtension` to avoid converting the selector to lowercase. - Node Traversing ~~~~~~~~~~~~~~~ From 0924b1a6bf5e27badc48a4c9b80d752f01373dec Mon Sep 17 00:00:00 2001 From: Jakub Zalas Date: Mon, 16 Dec 2013 22:25:11 +0000 Subject: [PATCH 4/4] [DomCrawler] Added a note about removing the default namespace. --- components/dom_crawler.rst | 3 +++ 1 file changed, 3 insertions(+) diff --git a/components/dom_crawler.rst b/components/dom_crawler.rst index 354c2f109ea..1224a1845d1 100644 --- a/components/dom_crawler.rst +++ b/components/dom_crawler.rst @@ -138,6 +138,9 @@ and :method:`Symfony\\Component\\DomCrawler\\Crawler::filter`:: changed with the :method:`Symfony\\Component\\DomCrawler\\Crawler::setDefaultNamespacePrefix`. + The default namespace is removed when loading the content if it's the only + namespace in the document. It's done to simplify the xpath queries. + Namespaces can be explicitly registered with the :method:`Symfony\\Component\\DomCrawler\\Crawler::registerNamespace`::