XML Compression Bibliography

Please email me at tomasz.muldner AT acadiau.ca if you know of an additional paper that is not on this list, or if you find an error in the current list.

Last update: October 22, 2008

See also: Greg Leighton's page

Notes: here are there are four categories of papers:

1) XML Compression papers (which do not appear to GL's page)

2) General Compression papers (related to XML compression)

3) Grammar papers (related to Grammar XML Compression)

4) XPath-related papers

5) Auxiliary papers


1) XML Compression papers (which do not appear to GL's page)

G. Antoshenkov. Order order-preserving Key Compression. DEC 1994.

G. Antoshenkov. Dictionary-based order-preserving string compression
February 1997, The VLDB Journal — The International Journal on Very Large Data Bases,   Volume 6 Issue 1
Publisher: Springer-Verlag New York, Inc.

Stefan Böttcher, Rita Steinmetz, Niklas Klein. XML INDEX COMPRESSION BY DTD SUBTRACTION

Stefan Böttcher , Rita Steinmetz . Data Management for Mobile Ajax Web 2.0 Applications, Dexa 2007 

S.J. Davies and I.S. Burnett. Exchanging XML Multimedia Containers Using a Binary XML Protocol.

J. Fang, A. Martinez-Smith, B. Ghandi. A Method to Compress Schema-Based XML Metadata for Mobile Environments. Multimedia and Expo, 2007 IEEE International Conference on
Multimedia and Expo, 2007 IEEE International Conference  Issue , 2-5 July 2007 Page(s):719 - 722

K. Swanson, J. Judt, CompreX: further developments in XML compression. Aerospace Conference, 2006 IEEE
  4-11 March 2006

D. Winkowski, M. Cokus.
XML Sizing and Compression Study For Military Wireless Data

XSBC:
XML Schema-based Binary Compression

Raymond K. Wong Franky Lam William M. Shui. Querying and Maintaining a Compact XML Storage. WWW2007




2) General Compression papers (related to XML compression)

J. Katajainen and E. Makinen. Tree compression and optimization with applications. Int. Journal Comput. Science, 1(4):425-447, December 1990.

J.C. Kieffer, E-H. Yang, G. Nelson and P. Cosman, Universal lossless compression via multilevel pattern matching, IEEE Trans. Inform. Theory, Vol. IT-46, No. 4, pp. 1227--1245, July 2000.

J. C. Kieffer and E.-H. Yang, Structured grammar-based codes for universal lossless data compression, Communications in Information and Systems , Vol. 2, No. 1, pp. 29-52, June 2002


J. C. Kieffer and E.-H. Yang, Grammar based codes: A new class of universal lossless source codes,' IEEE Trans. Inform. Theory, Vol.IT-46, No. 3, pp. 737--754, May 2000.

 

E.-H. Yang and J. C. Kieffer, ``Efficient universal lossless compression algorithms based on a greedy sequential grammar transform--Part one: Without context models,'' IEEE Trans. Inform. Theory, Vol.IT-46, No. 3, pp. 755--777, May 2000.

 

E-H. Yang and Z. Zhang, "The redundancy of source coding with a fidelity criterion--Part II: Coding at a fixed rate level with unknown statistics," accepted for publication in IEEE Trans. Inform. Theory, June 2000.Q.


J.C. Kieffer, E-H. Yang, G. Nelson and P. Cosman, "Universal lossless compression via multilevel pattern matching," IEEE Trans. Inform. Theory, Vol. IT-46, No. 4, pp. 1227--1245, July 2000.

J.C. Kieffer and E-H. Yang, "Grammar based codes: A new class of universal lossless source codes," IEEE Trans. Inform. Theory, Vol IT-46, No. 3, pp. 737--754, May 2000.

B. Zhu, E-H. Yang, and A.H. Tewfil, "Arithmetic coding with dual symbol sets and its performance analysis," IEEE Trans. Image Proecssing, Vol. 8, No. 12, pp. 1667--1676, December 1999.
 



3) Grammar papers (related to Grammar XML Compression)

Claus Brabrand, Anders Mo ller, and Michael I. Schwartzbach Dual Syntax for XML Languages 

Boris Chidlovskii.
Schema Extraction from XML data: A Grammatical Inference Approach.

H. Comon and Max Dauchet and R. Gilleron and D. Lugiez and S. Tison and M. Tommasi.
Tree Automata Techniques and Applications

J.M. Lake. Prediction by Grammatical match.
Data Compression Conference, 2000. Proceedings. DCC 2000 Publication Date: 2000; pp. 153-162

D. Lee, W. Chu. Comparative Analysis of Six XML Schema Languages. SIGMOD Record (ACM Special Interest Group on Management of Data). Volume 29 ,
  Issue 3  (September 2000), pp. 76-87.

E. Lehman, A. Shelat.
Approximation algorithms for grammar-based compression, SODA 2002.

W. Martens, F. Neven, and T. Schwentick. Simple off the shelf abstractions for XML Schema. SIGMOD Record, 36(3), pp. 15-22, 2007.

M. Murata, D. Lee, M. Mani, K. Kawaguchi, November 2005 Taxonomy of XML Schema Languages Using Formal Language Theory,

C.G. Nevill-Manning and I.H. Witten, Compression and Explanation Using Hierarchical Grammars, Computer J., vol. 40, nos.2/3, pp. 103-113, 1997

Padovani, L., Zacchiroli, S., Vitali, F. Stream Processing of XML Documents Made Easy with LALR(1) Parser Generators, September 2007.

J. Tarhio. On Compression of Parse Trees.
Eighth Symposium on String Processing and Information Retrieval (SPIRE'01),  2001

K. Yamagata, T. Uchida, K. Yamagata, T. Uchida, T. Shoudai and Y. Nakamura, "An Effective Grammar-Based Compression Algorithm for Tree Structured Data", Proc. 13th International Conference on Inductive Logic Programming (ILP 2003), pp. 383--400, Lecture Notes in Artificial Intelligence 2835, Springer, 2003. ( First Page:
PDF, ps, Full Text: PDF, ps )



4) XPath-related papers

Arpan Desai. Introduction to Sequential XPath

Pavel Zezula , Federica Mandreoli, and Riccardo Martoglia. Tree Signatures and Unordered XML Pattern Matching



5) Auxiliary papers

A. S E. Campos.
Finite Context Modeling.

J. G. Cleary, W.J. Teahan. Unbounded Length Contexts for PPM.
In Proc. Data Compression Conference '95 (DCC'95), pages 52--61. IEEE Computer Society, 1995.

Information Entropy. Wikipedia

Beda Christoph Hammerschmidt, Christian Werner, Ylva Brandt, Volker Linnemann, Sven Groppe, and Stefan Fischer. Incremental Validation of String-Based XML Data in Databases, File Systems, and Streams

Mark Johnson , Prakash Ishwar, Vinod M. Prabhakaran, Daniel Schonberg, and Kannan Ramchandran. On Compressing Encrypted Data. IEEE transaction Signal Processing, vol. 52, pp. 2992-3006, Oct. 2004

Mark Johnson, David Wagner, and Kannan Ramchandran.
On Compressing Encrypted Data Without the Encryption Key.


A. Schmidt, D. Florescu, M. J. Carey, I. Manolescu, R. Busse.
Why And How To Benchmark XML Databases.

C. M. Sperberg-McQueen Context-sensitive rules in XML Schema

H. Wang, Laks V.S. Lakshmanan, Efficient Secure Query Evaluation over Encrypted XML Databases.

XFLAT 

XMark — An XML Benchmark Project.