Ignore:
Timestamp:
04/11/14 12:12:52 (11 years ago)
Author:
pharms
Message:
  • adapted parsing of HTML files to have more power in specifying replacements
File:
1 edited

Legend:

Unmodified
Added
Removed
  • trunk/autoquest-plugin-html/src/main/resources/manuals/parseDirHTML

    r1354 r1496  
    11Treats all files in a directory structure as HTML log files and parses them into event sequences and a GUI model. Also sub directories are parsed. 
    22 
    3 The parsing process can be parameterized. This allows to replace or ignore ids or indexes of GUI elements in the log files. If they are replaced or ignored, the GUI model is more harmonized and GUI elements are considered equal although they are not. This may be helpful, e.g., if you have a table where each row is semantically the same. Without replacing or ignoring indexes or ids of the rows, each row is treated separately. But with replaced or ignored indexes or ids, all rows are considered the same. 
     3The parsing process can be parameterized. This allows to replace ids or ignore indexes of GUI elements in the log files. If they are replaced or ignored, the GUI model is more harmonized and GUI elements are considered equal although they are not. This may be helpful, e.g., if you have a table where each row is semantically the same. Without ignoring indexes or ids of the rows, each row is treated separately. But with ignored or replaced indexes or ids, all rows are considered the same. 
    44 
    5 To ignore the indexes, add -clearIndex=<path to GUI element> as parameter to the command call. To ignore ids, add -clearId=<path to GUI element> to the command call. The path to the GUI element is written using the HTML tag names and either their index or their id as identification. E.g., to denote all rows in a table where the table has the id "table_1" you can specify "table(htmlId=table_1)/tbody/tr". To denote e.g. all divs being the child of a div with an index 1, you specify "div[1]/div".   
     5The parameterization is done in a separate properties file. The keys in the file specify the tags for which either the id shall be replaced or the index shall be ignored. A specification for a simple tag either simply by its name, by its name and index or by its name and id looks as follows: 
    66 
    7 To replace ids, a separate files with mappings must be created. The path to this file must be provided using the idReplacements parameter. The file follows a typical properties format. The key is the path denoting the GUI element of which the id shall be set. The value is the actual id. The key may contain the # character to denote a wildcard in html ids. This allows matching several GUI elements with similar ids at once and to give them the same id. An example entry of this file is: 
     7tagName 
     8tagName[index] 
     9tagName(htmlId\=id) 
    810 
    9 div(htmlId\=id_number_#)=div_number_X 
     11Furthermore, tags can be specified as paths through the DOM in that several tags specifications are given and concatenated using /. An example with three specified tag (tag1 with index 5, tag 2, and tag 3 with id "id") is the following: 
    1012 
    11 This line would give all divs with an id "id_number_#" where # denotes any character the new id "div_number_X". Please note that for specifying the keys, it is required to escape any = sign in the key specification. This is usually required if the path to the denoted GUI elements denotes elements by their id as shown in the example. 
     13tag1[5]/tag2/tag3(htmlId\=id) 
    1214 
     15The specification of a tag id may contain the # character to denote a wildcard. This allows matching several GUI elements with similar ids at once and to give them the same id. An example entry of this is: 
     16 
     17div(htmlId\=id_number_#) 
     18 
     19This line would match all divs with an id starting with "id_number_" where # denotes any character. 
     20 
     21It is also possible to specify the document in which the tag path should match. A document is specified by giving a part of the documents path in the URL. After the document specification, the full path to the specified tag must be given. An example is the following: 
     22 
     23document(path\=accounts)/html/body/div[0]/ul/li(htmlId\=breadcrumb1)/a 
     24 
     25Please note that for specifying the keys, it is required to escape any = sign in the key specification. This is usually required if the path to the denoted GUI elements denotes elements by their id as shown in the example. 
     26 
     27To remove the id of a specified tag, the value must be empty. To set the id, the value must the id the tag shall have. To clear the index of the specified tag, that value must be CLEAR_INDEX. Here are some further example entries: 
     28 
     29body/div/div/div/form= 
     30body/p/small/a=imprint-link 
     31document(path\=accounts/login)/html/body/div[0]/div[1]/div[0]/form/p/a=password-reset-link 
     32document(path\=accounts/login)/html/body/div[0]/div[1]/div[0]/form/div/button=CLEAR_INDEX 
     33body/div[5]=date-chooser 
     34div(htmlId\=date-chooser)/div[0]=date-chooser_day 
    1335 
    1436 
     
    1941[<sequenceNames>] 
    2042    array of sequences into which the parsed events shall be stored 
    21 {-idReplacements=path/to/replacementfile} 
     43{-parseParams=path/to/replacementfile} 
    2244    used to define id replacements as described in a separate file 
    23 {-clearId=path/to[0]/gui(htmlId=element)} 
    24     used to define GUI elements of which the ids shall be ignored 
    25 {-clearIndex=path/to[0]/gui(htmlId=element)} 
    26     used to define GUI elements of which the indexes shall be ignored 
    2745 
    2846Example(s): 
    2947parseDirHTML /path/to/directory 
    30 parseDirHTML /path/to/directory sequences -clearId=table(htmlId=overview)/tbody[0]/tr 
    31 parseDirHTML /path/to/directory sequences -idReplacements=idReplacements.txt -clearId=body 
     48parseDirHTML /path/to/directory sequences -parseParams=idReplacements.txt 
Note: See TracChangeset for help on using the changeset viewer.