Context Navigation

parseHTML @ 1953

Last change on this file since 1953 was 1894, checked in by pharms, 11 years ago
added support for mapping document paths and removing their queries
File size: 3.3 KB

Rev	Line
[1339]	1	Parses an HTML log file them into an event sequence and a GUI model.
	2
[1496]	3	The parsing process can be parameterized. This allows to replace ids or ignore indexes of GUI elements in the log files. If they are replaced or ignored, the GUI model is more harmonized and GUI elements are considered equal although they are not. This may be helpful, e.g., if you have a table where each row is semantically the same. Without ignoring indexes or ids of the rows, each row is treated separately. But with ignored or replaced indexes or ids, all rows are considered the same.
[1339]	4
[1496]	5	The parameterization is done in a separate properties file. The keys in the file specify the tags for which either the id shall be replaced or the index shall be ignored. A specification for a simple tag either simply by its name, by its name and index or by its name and id looks as follows:
[1339]	6
[1496]	7	tagName
	8	tagName[index]
	9	tagName(htmlId\=id)
[1354]	10
[1496]	11	Furthermore, tags can be specified as paths through the DOM in that several tags specifications are given and concatenated using /. An example with three specified tag (tag1 with index 5, tag 2, and tag 3 with id "id") is the following:
[1354]	12
[1496]	13	tag1[5]/tag2/tag3(htmlId\=id)
[1354]	14
[1496]	15	The specification of a tag id may contain the # character to denote a wildcard. This allows matching several GUI elements with similar ids at once and to give them the same id. An example entry of this is:
[1354]	16
[1496]	17	div(htmlId\=id_number_#)
[1354]	18
[1496]	19	This line would match all divs with an id starting with "id_number_" where # denotes any character.
	20
	21	It is also possible to specify the document in which the tag path should match. A document is specified by giving a part of the documents path in the URL. After the document specification, the full path to the specified tag must be given. An example is the following:
	22
	23	document(path\=accounts)/html/body/div[0]/ul/li(htmlId\=breadcrumb1)/a
	24
	25	Please note that for specifying the keys, it is required to escape any = sign in the key specification. This is usually required if the path to the denoted GUI elements denotes elements by their id as shown in the example.
	26
[1894]	27	To remove the id of a specified tag, the value must be empty. To set the id, the value must be the id the tag shall have. To clear the index of the specified tag, that value must be CLEAR_INDEX. Here are some further example entries:
[1496]	28
	29	body/div/div/div/form=
	30	body/p/small/a=imprint-link
	31	document(path\=accounts/login)/html/body/div[0]/div[1]/div[0]/form/p/a=password-reset-link
	32	document(path\=accounts/login)/html/body/div[0]/div[1]/div[0]/form/div/button=CLEAR_INDEX
	33	body/div[5]=date-chooser
	34	div(htmlId\=date-chooser)/div[0]=date-chooser_day
	35
[1894]	36	It is also possible to set a new path for a document and to clear queries belonging to the paths of the documents. Examples for this are:
	37
	38	document(path\=/pathToPage/)=CLEAR_QUERY
	39	document(path\=/pathToOtherPath)=new/path
	40	document(path\=/pathTo/Page/withQuery/=CLEAR_QUERY,new/path (clears the query and changes the path at once
	41
	42	For path specifications, also wild cards are allowed. To denote the end of a path, the $ sign can be used.
	43
	44
[1339]	45	$USAGE$
	46
[1354]	47	<file>
	48	path to the file to be parsed
	49	[<sequenceNames>]
	50	array of sequences into which the parsed events shall be stored
[1496]	51	{-parseParams=path/to/replacementfile}
[1354]	52	used to define id replacements as described in a separate file
	53
[1339]	54	Example(s):
	55	parseDirHTML /path/to/file.log
[1496]	56	parseDirHTML /path/to/file.log sequences -parseParams=idReplacements.txt

Note: See TracBrowser for help on using the repository browser.

Download in other formats:

Original Format