1. Parsing Microformats

    ...any way you like

    Ryan King Engineer Technorati
  2. http://theryanking.com/presentations/2007/www2007-microformats-parsing/
  3. Scope

    Two Kinds of Microformats

    compound vs. elemental microformats

  4. Parsing?

  5. Overlaid Trees

    
          <>
            <>
              <>
                <>...</>
              </>
            </>
          </>
        
  6. General Rules

  7. fragment identifiers

  8. root class names

  9. properties

  10. cardinality

  11. sub-properties

    
          <p class="vcard">
            <span class="fn n">
              <span class="given-name">Ryan</span>
              <span class="family-name">King</span>
            </span>
          </p>
        
  12. Extracting Properties

  13. text

    
          <span class="vcard">
            <span class="fn n">
              <span class="given-name">Ryan</span>
              <span class="family-name">King</span>
            </span>
          </span>
        

    This hCard's fn is "Ryan King".

    (remember whitespace collapsing rules)

  14. value-excerpting

    
          <span class="vcard">
            <span class="fn n">
              ...
            </span>
            <span class="title">I'm an
              <span class="value">Engineer</span> 
              at Technorati.
            </span>
          </span>
        

    title is just "Engineer"

  15. semantic exceptions

    Applying HTML semantics to make publishing smoother.

  16. mailto links

    
          <span class="vcard">
            <span class="fn n">
              ...
            </span>
            <a class="email" href="mailto:ryan@theryanking.com?Subject=nice+presentation">
              email me!
            </a>
          </span>
        

    email is just "ryan@theryanking.com"

  17. URL fields

    elements attribute
    a, area href
    img src
    object data
  18. URL Fields

    Examples:

    <a     class="url" href="foo">bar</a>
    <area  class="url" href="foo">bar</a>
    <img   class="url" src="foo" alt="bar" />
    <object class="url" data="foo">bar</a>

    the value is "foo"

  19. non-URLs

    element attribute
    img alt
    area alt
    abbr title
  20. non-URL Fields

    Examples:

    <img   class="fn" alt="foo" src="bar" />
    <area  class="fn" alt="foo" href="bar" />
    <abbr  class="fn" title="foo">bar</abbr>

    the value is "foo"

  21. table headers

    rationale/history

  22. table headers

    example

    
          <table>
            <tr>
              <th>Time</th>
              <th id="location"><span class="location">Coleman</span></th>
            </tr>
            <tr>
              <th id="s1115">
                <abbr class="dtstart" title="2007-05-12T11:15-0600">11:15</abbr>
              </th>
              <td headers="location s1115" class="vevent">
              <span class="summary">Parsing Microformats...</span>
              </td>
            </tr>
          </table>
        
  23. table headers

    Processing

  24. include pattern

    rationale

  25. include pattern

    example

    
          <span class="vcard">
            <span id="name" class="fn n">...</span>
          </span>
    
          ....
          
          <span class="vcard">
            <a class="include" href="#name"></a>
            ...
          </span>
    
        
  26. include pattern

    notes

  27. Approaches

  28. XPath

  29. XPath

  30. XPath

    challenge 1

    
        <xsl:if test="contains(
          concat (
            ' ',
            concat(
              normalize-whitespace(@class),
              ' '
            )
          ),
          ' vcard '
         )" >
        

    Tokenized attributes are not easy.

  31. XPath

    challenge 2

  32. XPath

    See also

  33. Selectors

    Advantages

  34. Selectors

    examples

  35. Selectors

    Challenges

  36. Event-based (like SAX)

    Advantages

  37. Event-based

    Disadvantages

  38. Test Suite

  39. Test Suite

    Future Work

  40. Resources

  41. ?