1
1
PGXML TODO List
2
2
===============
3
3
4
- Some of these items still require much more thought! The data model
5
- for XML documents and the parsing model of expat don't really fit so
6
- well with a standard SQL model .
4
+ Some of these items still require much more thought! Since the first
5
+ release, the XPath support has improved (because I'm no longer using a
6
+ homemade algorithm!) .
7
7
8
- 1. Generalised XML parsing support
8
+ 1. Performance considerations
9
9
10
- Allow a user to specify handlers (in any PL) to be used by the parser.
11
- This must permit distinct sets of parser settings -user may want some
12
- documents in a database to parsed with one set of handlers, others
13
- with a different set.
10
+ At present each document is parsed to produce the DOM tree on every query.
14
11
15
- i.e. the pgxml_parse function would take as parameters (document,
16
- parsername) where parsername was the identifier for a collection of
17
- handler etc. settings.
12
+ Pros:
13
+ Easy
14
+ No persistent memory or storage allocation for parsed trees
15
+ (libxml docs suggest representation of a document might
16
+ be 4 times the size of the text)
18
17
19
- "Stub" handlers in the pgxml code would invoke the functions through
20
- the standard fmgr interface. The parser interface would define the
21
- prototype for these functions. How does the handler function know
22
- which document/context has resulted it in being called?
18
+ Cons:
19
+ Slow/ CPU intensive to parse.
20
+ Makes it difficult for PLs to apply libxml manipulations to create
21
+ new documents or amend existing ones.
23
22
24
- Mechanism for defining collection of parser settings (in a table? -but
25
- maybe copied for efficiency into a structure when first required by a
26
- query?)
27
23
28
- 2. Support for other parsers
24
+ 2. XQuery
29
25
30
- Expat may not be the best choice as a parser because a new parser
31
- instance is needed for each document i.e. all the handlers must be set
32
- again for each document. Another parser may have a more efficient way
33
- of parsing a set of documents identically.
26
+ I'm not sure if the addition of XQuery would be best as a function or
27
+ as a new front-end parser. This is one to think about, but with a
28
+ decent implementation of XPath, one of the prerequisites is covered.
34
29
35
- 3. XPath support
30
+ 3. DOM Interfaces
36
31
37
- Proper XPath support. I really need to sit down and plough
38
- through the specification...
32
+ Expose more aspects of the DOM to user functions/ PLs. This would
33
+ allow a procedure in a PL to run some queries and then use exposed
34
+ interfaces to libxml to create an XML document out of the query
35
+ results. I accept the argument that this might be more properly
36
+ performed on the client side.
39
37
40
- The very simple text comparison system currently used is too
41
- basic. Need to convert the path to an ordered list of nodes. Each node
42
- is an element qualifier, and may have a list of attribute
43
- qualifications attached. This probably requires lexx/yacc combination.
44
- (James Clark has written a yacc grammar for XPath). Not all the
45
- features of XPath are necessarily relevant.
38
+ 4. Returning sets of documents from XPath queries.
46
39
47
- An option to return subdocuments (i.e. subelements AND cdata, not just
48
- cdata). This should maybe be the default.
49
-
50
- 4. Multiple occurences of elements.
51
-
52
- This section is all very sketchy, and has various weaknesses.
40
+ Although the current implementation allows you to amalgamate the
41
+ returned results into a single document, it's quite possible that
42
+ you'd like to use the returned set of nodes as a source for FROM.
53
43
54
44
Is there a good way to optimise/index the results of certain XPath
55
45
operations to make them faster?:
56
46
57
- select docid, pgxml_xpath(document,'/site/location',1 ) as location
58
- where pgxml_xpath(document,'/site/name',1 ) = 'Church Farm';
47
+ select docid, pgxml_xpath(document,'// site/location/text()','','' ) as location
48
+ where pgxml_xpath(document,'// site/name/text()','','' ) = 'Church Farm';
59
49
60
50
and with multiple element occurences in a document?
61
51
62
- select d.docid, pgxml_xpath(d.document,'/site/location',1 )
52
+ select d.docid, pgxml_xpath(d.document,'// site/location/text()','','' )
63
53
from docstore d,
64
- pgxml_xpaths('docstore','document','feature/type','docid') ft
54
+ pgxml_xpaths('docstore','document','// feature/type/text() ','docid') ft
65
55
where ft.key = d.docid and ft.value ='Limekiln';
66
56
67
57
pgxml_xpaths params are relname, attrname, xpath, returnkey. It would
@@ -71,10 +61,15 @@ defined by relname and attrname.
71
61
72
62
The pgxml_xpaths function could be the basis of a functional index,
73
63
which could speed up the above query very substantially, working
74
- through the normal query planner mechanism. Syntax above is fragile
75
- through using names rather than OID.
64
+ through the normal query planner mechanism.
65
+
66
+ 5. Return type support.
67
+
68
+ Better support for returning e.g. numeric or boolean values. I need to
69
+ get to grips with the returned data from libxml first.
70
+
76
71
77
- John Gray <jgray@azuli.co.uk>
72
+ John Gray <jgray@azuli.co.uk> 16 August 2001
78
73
79
74
80
75
0 commit comments