Parse a HTML Table into a ListView

What?
A quick article on if you were given a webpage coded in HTML, what methods in AutoHotkey could you use to separate out the HTML Tables into a ListView.

Why?
I want a snippet of code that replicates any HTML table.

How?
I've been trying various ways so I'm posting them here. My opinions on them change with the weather so until I do some benchmarking I won't know which ones best:

Method #1
This method replicates the table in a listview format:
copyraw
StringReplace,ReturnedHTMLTableRows,ReturnedHTMLTable,,|,A
    ReturnedHTMLTableRows=%ReturnedHTMLTableRows%
    StringSplit,ReturnedHTMLTableRows,ReturnedHTMLTableRows,|
    RowIndex:=0
    Loop, %ReturnedHTMLTableRows0%
    {
        ColIndex:=1
        RowIndex++
        ReturnedHTMLTableColString:=ReturnedHTMLTableRows%RowIndex%

        If RowIndex=2
            StringSplit,HeadingsArray,HeadingsArrayString,|

        StringReplace,ReturnedHTMLTableColString,ReturnedHTMLTableColString,,|,A
        ReturnedHTMLTableColString=%ReturnedHTMLTableColString%
        StringSplit,ReturnedHTMLTableCols,ReturnedHTMLTableColString,|
        Loop, %ReturnedHTMLTableCols0%
        {
            ThisValue := RegExReplace( ReturnedHTMLTableCols%ColIndex%, "" , "")
            HeadingIndex := ColIndex + 2
            TypeLabel := HeadingsArray%HeadingIndex%

            If RowIndex=1
                HeadingsArrayString:=HeadingsArrayString "|" ThisValue

            If ColIndex=1
                ThisLabel := ThisValue
            Else
                If RowIndex1
                    If ColIndex%ReturnedHTMLTableCols0%
                        LV_Add("", ThisLabel " (" TypeLabel ")" , ThisValue )

            ColIndex++
        }
    }
  1.  StringReplace,ReturnedHTMLTableRows,ReturnedHTMLTable,,|,
  2.      ReturnedHTMLTableRows=%ReturnedHTMLTableRows% 
  3.      StringSplit,ReturnedHTMLTableRows,ReturnedHTMLTableRows,
  4.      RowIndex:=0 
  5.      Loop, %ReturnedHTMLTableRows0% 
  6.      { 
  7.          ColIndex:=1 
  8.          RowIndex++ 
  9.          ReturnedHTMLTableColString:=ReturnedHTMLTableRows%RowIndex% 
  10.   
  11.          If RowIndex=2 
  12.              StringSplit,HeadingsArray,HeadingsArrayString,
  13.   
  14.          StringReplace,ReturnedHTMLTableColString,ReturnedHTMLTableColString,,|,
  15.          ReturnedHTMLTableColString=%ReturnedHTMLTableColString% 
  16.          StringSplit,ReturnedHTMLTableCols,ReturnedHTMLTableColString,
  17.          Loop, %ReturnedHTMLTableCols0% 
  18.          { 
  19.              ThisValue :RegExReplace( ReturnedHTMLTableCols%ColIndex%, "" , "") 
  20.              HeadingIndex := ColIndex + 2 
  21.              TypeLabel := HeadingsArray%HeadingIndex% 
  22.   
  23.              If RowIndex=1 
  24.                  HeadingsArrayString:=HeadingsArrayString "|" ThisValue 
  25.   
  26.              If ColIndex=1 
  27.                  ThisLabel := ThisValue 
  28.              Else 
  29.                  If RowIndex1 
  30.                      If ColIndex%ReturnedHTMLTableCols0% 
  31.                          LV_Add("", ThisLabel (" TypeLabel ")" , ThisValue ) 
  32.   
  33.              ColIndex++ 
  34.          } 
  35.      } 

Method #2
This method only uses two columns with the label being a concatenation of the value of the first column and the column heading.
copyraw
ExtractText( Haystack, Needle1a, Needle1b, Needle2a, NeedleMarker ){

        Needle1 := InStr( Haystack, Needle1a, false, NeedleMarker )
        Needle1 := InStr( Haystack, Needle1b, false, Needle1 ) + StrLen( Needle1b )
        Needle2 := InStr( Haystack, Needle2a, false, Needle1 )
        NeedleLen := Needle2 - Needle1
        NeedleMarker := Needle2
        ThisValue := SubStr( Haystack, Needle1, NeedleLen )
        ThisValue=%ThisValue%

        Return [ThisValue, NeedleMarker]

    }


; Usage
        ; get table HTML
        ReturnedValues := ExtractText( Haystack, Needle1a, Needle1b, Needle2a, NeedleMarker )
        TheHTMLTable := ReturnedValues[1]
        NeedleMarker := ReturnedValues[2]
  1.  ExtractText( Haystack, Needle1a, Needle1b, Needle2a, NeedleMarker ){ 
  2.   
  3.          Needle1 :InStr( Haystack, Needle1a, false, NeedleMarker ) 
  4.          Needle1 :InStr( Haystack, Needle1b, false, Needle1 ) + StrLen( Needle1b ) 
  5.          Needle2 :InStr( Haystack, Needle2a, false, Needle1 ) 
  6.          NeedleLen := Needle2 - Needle1 
  7.          NeedleMarker := Needle2 
  8.          ThisValue :SubStr( Haystack, Needle1, NeedleLen ) 
  9.          ThisValue=%ThisValue% 
  10.   
  11.          Return [ThisValue, NeedleMarker] 
  12.   
  13.      } 
  14.   
  15.   
  16.  ; Usage 
  17.          ; get table HTML 
  18.          ReturnedValues :ExtractText( Haystack, Needle1a, Needle1b, Needle2a, NeedleMarker ) 
  19.          TheHTMLTable := ReturnedValues[1] 
  20.          NeedleMarker := ReturnedValues[2] 


Snippets
Assume that "Haystack" is the string of code you want to parse (all content).

Function ExtractText extracts text given a unique string to mark the start of the extract (1a), a second string (1b) to refine the starting position of the extract, and a third string (2a) to specify the closing position of the extract. NeedleMarker is the offset and means when this function is used for several tables, the Needlemarker tells the function to start from where it last found a table and to find the next:
    ExtractText( Haystack, Needle1a, Needle1b, Needle2a, NeedleMarker ){

        Needle1 := InStr( Haystack, Needle1a, false, NeedleMarker )
        Needle1 := InStr( Haystack, Needle1b, false, Needle1 ) + StrLen( Needle1b )
        Needle2 := InStr( Haystack, Needle2a, false, Needle1 )
        NeedleLen := Needle2 - Needle1
        NeedleMarker := Needle2
        ThisValue := SubStr( Haystack, Needle1, NeedleLen )
        ThisValue=%ThisValue%

        Return [ThisValue, NeedleMarker]

    }


; Usage
        ; get table HTML
        ReturnedValues := ExtractText( Haystack, Needle1a, Needle1b, Needle2a, NeedleMarker )
        TheHTMLTable := ReturnedValues[1]
        NeedleMarker := ReturnedValues[2]


Category: Hypertext Markup Language :: Article: 496

Credit where Credit is Due:


Feel free to copy, redistribute and share this information. All that we ask is that you attribute credit and possibly even a link back to this website as it really helps in our search engine rankings.

Disclaimer: Please note that the information provided on this website is intended for informational purposes only and does not represent a warranty. The opinions expressed are those of the author only. We recommend testing any solutions in a development environment before implementing them in production. The articles are based on our good faith efforts and were current at the time of writing, reflecting our practical experience in a commercial setting.

Thank you for visiting and, as always, we hope this website was of some use to you!

Kind Regards,

Joel Lipman
www.joellipman.com

Related Articles

Joes Revolver Map

Accreditation

Badge - Certified Zoho Creator Associate
Badge - Certified Zoho Creator Associate

Donate & Support

If you like my content, and would like to support this sharing site, feel free to donate using a method below:

Paypal:
Donate to Joel Lipman via PayPal

Bitcoin:
Donate to Joel Lipman with Bitcoin bc1qf6elrdxc968h0k673l2djc9wrpazhqtxw8qqp4

Ethereum:
Donate to Joel Lipman with Ethereum 0xb038962F3809b425D661EF5D22294Cf45E02FebF
© 2024 Joel Lipman .com. All Rights Reserved.