#010100: RSS import doesn't work with <dc:> tags

Description:

Hi,

rss v1 allows to add dc tags (dc:creator> <dc:date><dc:identifier><dc:title>...)

ez publish doesn't make the difference between <dc:title> and <title>, and therefore consider that there is more than one title and doesn't import the feed properly.

The problem seems to be that


    /*!
      \returns The first element that is named \a $name.
               If multiple elements with that name is found \c false is returned.

      \note This will only make sense for element nodes.
      \sa elementsByName
    */
    function &elementByName( $name )

in lib/ezxml/classes/ezdomnode.php

as used by the cronjobs/rssimport.php


$title =& $item->elementByName( 'title' ) 

Doesn't work, as your xml parser considers <dc:title> and <title> as identical.

I was involved in rss 1 during the (paintful) normalisation process, and I remember that having both title and dc:title was considered valid.

X+


Environment:

Operating System:
PHP Version: (please be specific, like '4.4.3' or '5.1.5')
Database and version:
Browser (and version):


Steps to Reproduce:

Create an RSS import with the following feed:
http://eurpub.oxfordjournals.org/rss/current.xmlurl

Suggested fix: if they are more than one "title" child, take the first one.


- Attachments

No attachments for this issue.


- Comments

Here is the patch that fixes the problem like you suggested. But for me seems like there is no so much difference when using the example you provided.

Can you please test it and report does it solve this or not?


Index: cronjobs/rssimport.php
===================================================================
--- cronjobs/rssimport.php      (revision 18058)
+++ cronjobs/rssimport.php      (working copy)
@@ -154,7 +154,7 @@

     // Get all items in rss feed
     $itemArray = $root->elementsByName( 'item' );
-    $channel = $root->elementByName( 'channel' );
+    $channel = $root->firstElementByName( 'channel' );

     // Loop through all items in RSS feed
     foreach ( $itemArray as $item )
@@ -183,7 +183,7 @@
     $addCount = 0;

     // Get all items in rss feed
-    $channel =& $root->elementByName( 'channel' );
+    $channel =& $root->firstElementByName( 'channel' );

     // Loop through all items in RSS feed
     foreach ( $channel->elementsByName( 'item' ) as $item )
@@ -225,9 +225,9 @@
     }

     $parentContentObject =& $parentContentObjectTreeNode->attribute( 'object' ); // Get parent content object
-    $titleElement = $item->elementByName( 'title' );
+    $titleElement = $item->firstElementByName( 'title' );
     $title = is_object( $titleElement ) ? $titleElement->textContent() . getCDATA( $titleElement ) : '';
-    $link = $item->elementByName( 'link' );
+    $link = $item->firstElementByName( 'link' );
     $linkURL = $link->textContent() . getCDATA( $link );
     $md5Sum = md5( $linkURL );

@@ -433,7 +433,7 @@
         {
             if ( count( $importDescriptionArray ) == 1 )
             {
-                $element = $xmlDomNode->elementByName( $importDescriptionArray[0] );
+                $element = $xmlDomNode->firstElementByName( $importDescriptionArray[0] );
                 // We should check if text contains CDATA content
                 $resultText = is_object( $element ) ? $element->textContent() . getCDATA( $element ) : false;
                 return $resultText;
@@ -442,7 +442,7 @@
             {
                 $elementName = $importDescriptionArray[0];
                 array_shift( $importDescriptionArray );
-                return recursiveFindRSSElementValue( $importDescriptionArray, $xmlDomNode->elementByName( $elementName ) );
+                return recursiveFindRSSElementValue( $importDescriptionArray, $xmlDomNode->firstElementByName( $elementName ) );
             }
         }

Index: lib/ezxml/classes/ezdomnode.php
===================================================================
--- lib/ezxml/classes/ezdomnode.php     (revision 18057)
+++ lib/ezxml/classes/ezdomnode.php     (working copy)
@@ -363,6 +363,7 @@
     }

     /*!
+      \deprecated Use firstElementByName() instead.
       \returns The first element that is named \a $name.
                If multiple elements with that name is found \c false is returned.

@@ -388,6 +389,27 @@
         return $element;
     }

+    /*
+    \returns The first element that is named \a $name.
+               If multiple elements with that name is found \c false is returned.
+
+      \note This will only make sense for element nodes.
+    */
+    function &firstElementByName( $name )
+    {
+        $element = false;
+        foreach ( array_keys( $this->Children ) as $key )
+        {
+            $child =& $this->Children[$key];
+            if ( $child->name() == $name && !$child->prefix() )
+            {
+                $element =& $child;
+                break;
+            }
+        }
+        return $element;
+    }
+
     /*!
      Alias for libxml compatibility
     */
#251534 by Kirill Subbotin on January 31st, 2007 [Permanent Link]

I added the posted patch, and added the following code to

lib/ezxml/classes/ezdomnode.php:



    /*!
      \returns The first element that is named \a $name.

      \note This will only make sense for element nodes.
      \note Modified version of elementByName, returns first element if multiples are found.
      \note Modification to support RSS
      \sa elementsByName
    */
    function &firstElementByName( $name )
    {
        $element = false;
        foreach ( array_keys( $this->Children ) as $key )
        {
            $child =& $this->Children[$key];
            if ( $child->name() == $name )
            {
                $element =& $child;
                return $element;
            }
        }
        return $element;
    }

The problem remained.

The feed is:

http://content.nejm.org/rss/current.xmlurl

And the node name is blank.

I also tried using <title|alt_title> to name the node, where alt_title was the item pub date. That didn't help either.

eZ 3.8.6

#251550 by Betsy Gamrat on February 2nd, 2007 [Permanent Link]

But why have you added another firstElementByName() function? It is wrong!

#251560 by Kirill Subbotin on February 5th, 2007 [Permanent Link]

Sorry, bit slow on that one, got difficulty patching the install.

What version are you using ? trunk 3.9 ?

I'll have to install one that is online with what you have.

X+

PS. Another thing is that I've patched quite a bit already to solve various other RSS bug...

#251566 by Xavier Dutoit on February 5th, 2007 [Permanent Link]

It was trunk (3.10.0) but I think this patch should work on any 3.9 as well.

#251785 by Kirill Subbotin on March 1st, 2007 [Permanent Link]

This problem exists in 4.0.x too

#258522 by Geoff Bentley on October 12th, 2008 [Permanent Link]

Duplicate of #011930: Importing <dc:xxxx> xml fields with RSS

#266753 by Alexandru Stanoi on August 2nd, 2010 [Permanent Link]

I move this from Code Review to Closed since it has been solved in another issue.

Geir Arne Waaler
eZ Documentation

#271048 by Geir Arne Waaler on October 6th, 2011 [Permanent Link]

- History
Properties
Type Bug
Priority Medium
Component Misc
Affects 3.7.9
Fix Version -
Reporter Xavier Dutoit
Responsible Alexandru Stanoi
Status 0 Closed
Resolution Duplicate
Created January 25th, 2007
Updated October 6th, 2011
Resolved August 2nd, 2010
 
Navigation [Permanent Link]
Previous Issue
Back to Issues List
Next Issue: #019127
  DB Deadlocks on ezcontentobject_tree when updating subnode