A. Agam, S. Argamon, O. Frieder, D. Grossman, and D. Lewis, Contentbased document image retrieval in complex document collections, Document Recognition and Retrieval XIV (Part of the IS&T/SPIE Electronic Imaging Symposium) 65000S1?65000S12. SPIE, 2007. [abb] Ocr, icr, omr and linguistic software

C. [. Antonacopoulos, C. Clausner, S. Papadopoulos, and . Pletschacher, Historical Document Layout Analysis Competition, 2011 International Conference on Document Analysis and Recognition, pp.1516-1520, 2011.
DOI : 10.1109/ICDAR.2011.301

D. [. Agrawal and . Doermann, Voronoi++: A Dynamic Page Segmentation Approach Based on Voronoi and Docstrum Features, 2009 10th International Conference on Document Analysis and Recognition, pp.1011-1015, 2009.
DOI : 10.1109/ICDAR.2009.270

S. [. Amin and . Fischer, A Document Skew Detection Method Using the Hough Transform, Pattern Analysis & Applications, vol.3, issue.3, pp.243-253, 2000.
DOI : 10.1007/s100440070009

]. O. Ajd11a, N. Augereau, J. Journet, and . Domenger, Classification d'images de documents avec retour de pertinence : Application aux documents de type ressources humaines, 23ème Colloque sur le traitement du signal et des images. GRETSI, 2011.

]. O. Ajd11b, N. Augereau, and J. Journet, Domenger : Document images indexing with relevance feedback : an application to industrial context, Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp.1190-1194, 2011.

N. [. Augereau, J. Journet, and . Domenger, Reconnaissance et extraction de pièces d'identité, Actes du Douzième Colloque International Francophone sur l'Écrit et le Document (CIFED), pp.179-194, 2012.

[. Al-khaffaf, A. Z. Talib, and R. A. Salam, Removing salt-and-pepper noise from binary images of engineering drawings, 2008 19th International Conference on Pattern Recognition, pp.1-4350, 1998.
DOI : 10.1109/ICPR.2008.4761425

. S. Adam, C. Ogier, R. Cariou, J. Mullot, Y. Gardes et al., Utilisation de la transformée de fourier-mellin pour la reconnaissance de formes multi-orientées et multi-échelles : application à l'analyse automatique de documents techniques, p.17, 2001.

A. Antonacopoulos, S. Pletschacher, D. Bridson, and C. , Papadopoulos : ICDAR 2009 page segmentation competition, Document Analysis and Recognition (ICDAR), 2009 International Conference on, pp.1370-1374, 2009.

J. [. Arlandis and E. Perez-cortes, Ungria : Identification of very similar filled-in forms with a reject option, Proceedings of the 2009 10th International Conference on Document Analysis and Recognition, pp.246-250, 2009.

. C. Awy-+-99-]-c, J. L. Aggarwal, P. S. Wolf, C. Yu, J. S. Procopiuc et al., Fast algorithms for projected clustering : Finding generalized projected clusters in high dimensional spaces, ACM SIGMOD Record ACM SIGMOD Record, vol.28, issue.292, pp.61-7281, 1999.

J. [. Bekkerman and . Allan, Using bigrams in text categorization, p.1003, 2004.

. S. Baa-+-10-]-s, A. Bukhari, M. I. Azawi, F. Ali, T. M. Shafait et al., Document image segmentation using discriminative learning over connected components, Proceedings of the 9th IAPR International Workshop on Document Analysis Systems, pp.183-190, 2010.

]. H. Bai92 and . Baird, Background structure in document images, Advances in Structural and Syntactic Pattern Recognition, pp.253-269, 1992.

G. [. Bartoli, E. Davanzo, E. Medvet, and . Sorio, Improving Features Extraction for Supervised Invoice Classification, Artificial Intelligence and Applications, 2010.
DOI : 10.2316/P.2010.674-040

A. [. Bay, T. Ess, L. Tuytelaars, and . Van-gool, Speeded-up robust features (surf) Computer Vision and Image Understanding, pp.346-359, 2008.

S. [. Benjlaiel, A. Kanoun, and . Alimi, Une méthode de segmentation d'Images de Documents Composites, Actes du Neuvième Colloque International Francophone sur l'Écrit et le Document (CIFED), pp.121-126, 2006.

D. [. Brown and . Lowe, Automatic Panoramic Image Stitching using Invariant Features, International Journal of Computer Vision, vol.50, issue.1, pp.59-73, 2007.
DOI : 10.1007/s11263-006-0002-3

M. [. Baird, C. Moll, M. R. An, and . Casey, Document image content inventories, Document Recognition and Retrieval XIV, pp.65000-65001, 2007.
DOI : 10.1117/12.705094

M. [. Boutsidis and P. Mahoney, Drineas : Unsupervised feature selection for the k-means clustering problem, Advances in Neural Information Processing Systems, vol.22, pp.153-161, 2009.

]. T. Bre02a and . Breuel, Robust least square baseline finding using a branch and bound algorithm, Document Recognition and Retrieval VIII (Part of the IS&T/SPIE Electronic Imaging Symposium), pp.20-27, 2002.

E. [. Boiman, M. Shechtman, and . Irani, In defense of Nearest-Neighbor based image classification, 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8, 2008.
DOI : 10.1109/CVPR.2008.4587598

. [. Bagdanov, First order Gaussian graphs for efficient structure classification, Thèse de doctorat, pp.1311-1324, 1997.
DOI : 10.1016/S0031-3203(02)00227-3

]. F. Cao08 and . Cao, A theory of shape identification, 1948.

D. [. Chen and . Blostein, A survey of document image classification: problem statement, classifier architecture and performance evaluation, International Association for Pattern Recognition Workshop on Document Analysis Systems, pp.1-16, 1994.
DOI : 10.1007/s10032-006-0020-2

T. [. Cattoni, S. Coianiz, . Messelodi, and . Modena, Geometric layout analysis techniques for document image understanding : a review, p.9703, 1998.

G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray, Visual categorization with bags of keypoints, Workshop on statistical learning in computer vision, ECCV, p.22, 2004.

M. [. Cesarini, S. Gori, and . Marinai, Soda : Structured document segmentation and representation by the modified XY tree, Document Analysis and Recognition (ICDAR), 1999 International Conference on, pp.563-566, 1999.

M. [. Chaoji, S. Hasan, and M. J. Salem, Zaki : Sparcl : Efficient and effective shape-based clustering, Eighth IEEE International Conference on Data Mining ICDM'08, pp.93-102, 2008.
DOI : 10.1109/icdm.2008.73

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.150.9571

C. [. Chiang and . Knoblock, Recognition of multi-oriented, multisized , and curved text, Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp.1399-1403, 2011.

M. [. Cesarini, S. Lastri, and . Marinai, Soda : Encoding of modified XY trees for document classification, Document Analysis and Recognition (ICDAR), 2001 International Conference on, p.1131, 2001.

]. A. Bibliographie-[-cor02 and . Cornuéjols, Une nouvelle méthode d'apprentissage : Les SVM. Séparateurs à vaste marge. Bulletin de l'AFIA, pp.14-23, 2002.

]. B. Coü01 and . Coüasnon, Dmos : A generic document recognition method, application to an automatic generator of musical scores, mathematical formulae and table structures recognition systems Coüasnon : Dmos, a generic document recognition method : application to table structure analysis in a general and in a specific way, Document Analysis and Recognition (ICDAR), 2001 International Conference on, pp.215-220111, 2001.

]. B. Coü12, Coüasnon : Fusion des connaissances en analyse de documents, Actes du Douzième Colloque International Francophone sur l'Écrit et le Document (CIFED), pp.3-3, 2012.

S. [. Clausner and . Pletschacher, Antonacopoulos : Scenario driven in-depth performance evaluation of document layout analysis methods, Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp.1404-1408, 2011.

E. [. Corry and . Swain, Optical character recognition (ocr) testing : British geological survey report ir, p.66, 2006.

. [. Duygulu, A hierarchical representation of form documents for identification and retrieval, International Journal on Document Analysis and Recognition, vol.5, issue.1, pp.17-27, 2002.
DOI : 10.1007/s100320100077

P. [. Diligenti, M. Frasconi, and . Gori, Hidden tree markov models for document image classification, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.25, issue.4, pp.519-523, 2003.
DOI : 10.1109/TPAMI.2003.1190578

D. [. Deselaers, H. Keysers, and . Ney, Features for image retrieval: an experimental comparison, Information Retrieval, vol.3, issue.2, pp.77-107, 1977.
DOI : 10.1007/s10791-007-9039-3

J. [. Datta and J. Z. Li, Wang : Content-based image retrieval : approaches and trends of the new age, Proceedings of the 7th ACM SIGMM international workshop on Multimedia information retrieval, pp.10-11, 2005.

J. [. Déjean and . Meunier, Versatile page numbering analysis, Document Recognition and Retrieval XV (Part of the IS&T/SPIE Electronic Imaging Symposium), pp.68150-68151, 2008.

C. Domeniconi, D. Papadopoulos, D. Gunopulos, and S. Ma, Subspace Clustering of High Dimensional Data, Proceedings of the 2004 SIAM International Conference on Data Mining, pp.517-521, 2004.
DOI : 10.1137/1.9781611972740.58

T. [. Efron, I. Hastie, R. Johnstone, and . Tibshirani, Least angle regression, Annals of statistics, vol.32, issue.2, pp.407-451, 2004.

M. Ester, H. P. Kriegel, J. Sander, and X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, Proceedings of the 2nd International Conference on Knowledge Discovery and Data mining, pp.226-231, 1996.

R. [. Fischler and . Bolles, Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography, Communications of the ACM, vol.24, issue.6, pp.381-395, 1981.
DOI : 10.1145/358669.358692

L. Fei-fei, R. Fergus, P. Peronafhd90, S. Fisher, and D. Hinds, Learning generative visual models from few training examples: An incremental Bayesian approach tested on 101 object categories, Pattern Recognition (ICPR), 1990 International Conference onFR06] C. Fraley et A.E. Raftery : MCLUST version 3 for R : Normal mixture modeling and model-based clustering. Rapport technique, pp.59-70, 1990.
DOI : 10.1016/j.cviu.2005.09.012

. Gaa-+-01-]-n, V. Gorski, E. Anisimov, O. Augustin, S. Baret et al., Industrial bank check processing : the a2ia checkreader tm Holub et P. Perona : Caltech-256 object category dataset Cure : an efficient clustering algorithm for large databases Rock : A robust clustering algorithm for categorical attributes Valveny : A rotation invariant page layout descriptor for document classification and retrieval Chaudhuri : An end-to-end administrative document analysis system, Gac09] D. Gaceb : Contributions au tri automatique de documents et de courrier d'entreprises Thèse de doctorat en informatique, Institut National de Sciences Appliquées de Lyon SIGMOD '98 : Proceedings of the 1998 ACM SIGMOD international conference on Management of data International Conference on Data Engineering (ICDE) Document Analysis and Recognition (ICDAR), 2009 International Conference on Document Analysis Systems (DAS) The Eighth IAPR International Workshop on, pp.196-206, 1998.

]. H. Hbbc08b, Y. Hamza, A. Belaïd, B. B. Belaïd, and . Chaudhuri, Incremental classification of invoice documents, Pattern Recognition (ICPR) 19th International Conference on, pp.1-4, 2008.

P. Héroux, S. Diana, and A. Ribert, Trupin : Classification method study for automatic form class identification, Pattern Recognition Fourteenth International Conference on, 1998.

. J. Heg-+-07-]-j, B. Hull, J. Erol, Q. Graham, H. Ke et al., Van Olst : Paper-based augmented reality Combining efficient object localization and image classification, 17th International Conference on Computer Vision IEEE 12th International Conference on, pp.205-209, 2007.

. [. Bibliographie and D. A. Hinneburg, Keim : An efficient approach to clustering in large multimedia databases with noise. Knowledge Discovery and Data Mining, 1998.

E. Han, K. Kim, H. K. Yang, K. Junghl09, A. Hamou et al., La classification non supervisée (clustering ) de documents textuels par les automates cellulaires Stephens : A combined corner and edge detector Hull : Document image similarity and equivalence detection Batistakis : Quality scheme assessment in the clustering process. Principles of Data Mining and Knowledge Discovery Text segmentation using Gabor filters for automatic document processing Hou : Relevance feedback learning with feature selection in region-based image retrieval, International Conference on Information Technology and its Applications (CIIA-09), Saïda/Algeria Alvey vision conference Machine Vision and Applications Proc. of IEEE Int. Conf. Acoustics, Speech and Signal Processing, pp.872-881, 1988.

L. Juan and O. Gwun, A comparison of sift, pca-sift and surf, International Journal of Image Processing (IJIP), vol.3, issue.4, pp.143-152, 2009.

N. Journet, R. Mullot, V. Eglin, and J. Y. , Analyse d'images de documents anciens : une approche texture Data clustering : a review ACM computing surveys (CSUR) [Joa98] T. Joachims : Text categorization with support vector machines : Learning with many relevant features : Page segmentation using texture analysis Haralick et I. Phillips : Global and local document degradation models Font and background color independent text binarization, Machine learning : ECML-98 Document Analysis and Recognition Proceedings of the Second International Conference on Second International Workshop on Camera- Based Document Analysis and RecognitionKo12] Y. Ko : A study of term weighting schemes using class information for text classification Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval, pp.461-479, 1993.

. [. Kaufman, Finding groups in data : an introduction to cluster analysis, 1990.
DOI : 10.1002/9780470316801

W. [. Kursa and . Rudnicki, Feature selection with the boruta package, Journal of Statistical Software, vol.36, issue.11, pp.1-13, 2010.

R. [. Ke and . Sukthankar, Pca-sift : A more distinctive representation for local image descriptors, 2004.

F. [. Keysers and T. M. Shafait, Breuel : Document image zone classification -a simple high-performance approach, 2nd Int. Conf. on Computer Vision Theory and Applications, pp.44-51, 2007.

A. [. Kise and M. Sato, Segmentation of Page Images Using the Area Voronoi Diagram, Computer Vision and Image Understanding, vol.70, issue.3, pp.370-382, 1998.
DOI : 10.1006/cviu.1998.0684

K. [. Kumar, P. V. Suneera, and . Kumar, Content Based Image Retrieval-Extraction by Objects of User Interest, International Journal on Computer Science and Engineering, vol.3, issue.3, pp.1068-1074, 2011.

C. [. Lu and . Lim-tan, A nearest-neighbor chain based approach to skew estimation in document images, Pattern Recognition Letters, vol.24, issue.14, pp.2315-2323, 2003.
DOI : 10.1016/S0167-8655(03)00057-6

[. Lecoq, L. Najman, O. Gibot, and E. Trupin, Benchmarking commercial OCR engines for technical drawings indexing, Proceedings of Sixth International Conference on Document Analysis and Recognition, pp.138-142, 2001.
DOI : 10.1109/ICDAR.2001.953770

]. D. Low04 and . Lowe, Distinctive image features from scale-invariant keypoints, International journal of computer vision, vol.60, issue.2, pp.91-110, 2004.

Q. [. Li, J. Shen, and . Sun, Skew detection using wavelet decomposition and projection profile analysis. Pattern recognition letters, pp.555-562, 2007.

O. [. Matas, M. Chum, T. Urban, and . Pajdla, Robust wide-baseline stereo from maximally stable extremal regions, Image and Vision Computing, vol.22, issue.10, pp.761-767, 2004.
DOI : 10.1016/j.imavis.2004.02.006

R. Marée, P. Geurts, and L. Wehenkel, Content-based image retrieval by indexing random subwindows with randomized trees, Proceedings of the 8th Asian conference on Computer vision-Volume Part II, pp.611-620, 2007.

M. Muja, D. G. Lowe, D. G. Mccann, and . Lowe, Fast approximate nearest neighbors with automatic algorithm configuration Local naive bayes nearest neighbor for image classification, International Conference on Computer Vision Theory and Applications (VISSAPP 09 Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on, pp.331-340, 2009.

E. [. Marinai, F. Marino, and . Cesarini, Soda : A general system for the retrieval of document images from digital libraries, pp.150-173, 2004.

. [. Bibliographie, J. Moise, and . Sander, Finding non-redundant, statistically significant regions in high dimensional data : a novel approach to projected and subspace clustering, Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.533-541, 2008.

[. Moise, J. Sander, and M. Ester, P3C: A Robust Projected Clustering Algorithm, Sixth International Conference on Data Mining (ICDM'06), pp.414-425, 2006.
DOI : 10.1109/ICDM.2006.123

G. [. Morel, . T. Yunco11-]-t, M. Nguyen, and J. M. Coustaty, ASIFT: A New Framework for Fully Affine Invariant Image Comparison, Document Analysis and Recognition (ICDAR), 2011 International Conference on, pp.438-469, 2009.
DOI : 10.1137/080732730

J. [. Ng and . Han, Efficient and effective clustering methods for spatial data mining, Proceedings of the International Conference on Very Large Data Bases, pp.144-155, 1994.

[. Na and P. Jinxiao, Fast and robust skew detection for scanned documents, Proceedings of 2011 International Conference on Electronic & Mechanical Engineering and Information Technology, pp.4170-4173, 2011.
DOI : 10.1109/EMEIT.2011.6023104

F. [. Nowak and . Jurie, Triggs : Sampling strategies for bag-of-features image classification, Computer Vision?ECCV, pp.490-503, 2006.

T. Nakai, K. Kise, and M. Iwamura, Use of affine invariants in locally likely arrangement hashing for camera-based document image retrieval. Document Analysis Systems VII, pp.541-552, 2006.

J. [. Nilsson, J. Peña, and . Bjorkegren, Tegnér : Consistent feature selection for pattern recognition in polynomial time, The Journal of Machine Learning Research, vol.8, pp.589-612, 2007.

S. [. Nagy and . Seth, Hierarchical representation of optically scanned documents, Seventh International Conference on Pattern Recognition, p.347, 1984.

R. [. O-'gorman and . Kasturi, Document image analysis, 1995.

C. [. Psyllos, E. Anagnostopoulos, and . Kayafas, Vehicle logo recognition using a sift-based enhanced matching scheme. Intelligent Transportation Systems, IEEE Transactions on, vol.11, issue.2, pp.322-328, 2010.

F. Paradis and J. Y. Nie, Contextual feature selection for text classification. Information processing & management, pp.344-352, 2007.

M. [. Pollard, Van Der Laan : A method to identify significant clusters in gene expression data, Invited Proceedings of Sci2002, pp.318-325, 2002.

M. Rusinol, D. Aldavert, R. Toledo, and J. Lladós, Browsing Heterogeneous Document Collections by a Segmentation-Free Word Spotting Method, 2011 International Conference on Document Analysis and Recognition, pp.63-67, 2011.
DOI : 10.1109/ICDAR.2011.22

[. Ramel, S. Busson, and . Demonet, AGORA: the Interactive Document Image Analysis Tool of the BVH Project, Second International Conference on Document Image Analysis for Libraries (DIAL'06), pp.145-155, 1998.
DOI : 10.1109/DIAL.2006.2

URL : https://hal.archives-ouvertes.fr/hal-01026266

N. Vincent-rabeux, J. Journet, and . Domenger, Ancient documents bleed-through evaluation and its application for predicting ocr error rates, 2011.

F. [. Rice and T. A. Jenkins, Nartker : The fifth annual test of OCR accuracy. Information Science Research Institute, 1996.

M. Rusinol and J. , Lladós : Logo spotting by a bag-of-words approach for document categorization, 10th International Conference on Document Analysis and Recognition, pp.111-115, 2009.

S. [. Ramel, . Leriche, S. Demonet, and . Busson, User-driven page layout analysis of historical printed books, International Journal of Document Analysis and Recognition (IJDAR), vol.26, issue.6, pp.243-261, 2007.
DOI : 10.1007/s10032-007-0040-6

URL : https://hal.archives-ouvertes.fr/hal-00150167

]. P. Rou87 and . Rousseeuw, Silhouettes : a graphical aid to the interpretation and validation of cluster analysis, Journal of computational and applied mathematics, vol.20, pp.53-65, 1987.

N. [. Rigaud, J. C. Tsopze, J. M. Burie, and . Ogier, Extraction robuste des cases et du texte de bandes dessinées, pp.349-360, 2012.

B. [. Ramdane, A. Taconet, S. Zahour, and . Kebairi, Apprentissage et reconnaissance automatique de types de formulaires par une méthode statistique, 17ème Colloque sur le traitement du signal et des images, 1999.

R. [. Silpa-anan and . Hartley, Optimised kd-trees for fast image descriptor matching Computer Vision and Pattern Recognition Merged consensus clustering to assess and improve class discovery with microarray data, BMC bioinformatics, vol.11, issue.1, pp.1-8590, 2008.

]. E. Sau11 and . Saund, Scientific Challenges Underlying Production Document Processing, Proceedings of Document Recognition and Retrieval XVIII, p.787402, 2011.

D. [. Shin, A. Doermann, and . Rosenfeld, Classification of document pages using structure-based features, International Journal on Document Analysis and Recognition, vol.3, issue.4, pp.232-247, 2001.
DOI : 10.1007/PL00013566

]. F. Seb02 and . Sebastiani, Machine learning in automated text categorization, ACM computing surveys (CSUR), vol.34, issue.1, pp.1-47, 2002.

V. [. Shi and . Govindaraju, Line separation for complex document images using fuzzy runlength, Document Image Analysis for Libraries Proceedings. First International Workshop on, pp.306-312, 2004.

M. Suzuki and S. Hirasawa, Text categorization based on the ratio of word frequency in each categories, 2007 IEEE International Conference on Systems, Man and Cybernetics, pp.3535-3540, 2007.
DOI : 10.1109/ICSMC.2007.4414216

. [. Bibliographie, N. Steinherz, and . Intrator, Rivlin : Skew detection via principal components analysis, Document Analysis and Recognition (ICDAR), 1999 International Conference on, pp.153-156, 1999.

G. [. Steinbach, V. Karypis, and . Kumar, A comparison of document clustering techniques, KDD, International Conference on Knowledge Discovery in Data, pp.525-526, 2000.

]. R. Smi09 and . Smith, Hybrid Page Layout Analysis via Tab-Stop Detection, pp.241-245, 2009.

N. [. Sur, M. Noury, and . Berger, Image point correspondences and repeated patterns, Research Report, vol.7693, 2011.
URL : https://hal.archives-ouvertes.fr/inria-00609998

S. [. Song, M. Uchida, and . Liwicki, Look Inside the World of Parts of Handwritten Characters, 2011 International Conference on Document Analysis and Recognition, pp.784-788, 2011.
DOI : 10.1109/ICDAR.2011.161

C. [. Thomas, P. Chatelain, L. Thierry, and . Heutte, Combinaison architecture profonde/hmm pour l'extraction de séquences dans des documents manuscrits, pp.7-22, 2012.

K. [. Takeda and . Kise, Iwamura : Real-time document image retrieval for a 10 million pages database with a memory efficient and stability improved llah, pp.1054-1058, 2011.

S. [. Tan, Z. Sung, Y. Yu, and . Xu, Text retrieval from document images based on n-gram algorithm, Text and Web Mining Workshop, 6th Pacific Rim International Conference on Artificial Intelligence, 2000.

Y. [. Tan, C. D. Wang, H. Lee, H. Uchiyama, and . Saito, The use of bigrams to enhance text categorization Information processing & management Augmenting text document by on-line learning of local arrangement of keypoints, Mixed and Augmented Reality 8th IEEE International Symposium on, pp.529-546, 2002.

]. J. Van-beusekom, D. Keysers, and F. Shafait, Distance Measures for Layout-Based Document Image Retrieval, Second International Conference on Document Image Analysis for Libraries (DIAL'06), p.11, 2006.
DOI : 10.1109/DIAL.2006.16

M. [. Valle and . Cord, Advanced Techniques in CBIR: Local Descriptors, Visual Dictionaries and Bags of Features, 2009 Tutorials of the XXII Brazilian Symposium on Computer Graphics and Image Processing, pp.72-78, 2009.
DOI : 10.1109/SIBGRAPI-Tutorials.2009.14

M. [. Viola and . Jones, Rapid object detection using a boosted cascade of simple features, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, p.511, 2001.
DOI : 10.1109/CVPR.2001.990517

K. Y. Wong, R. G. Casey, and F. M. Wahl, Document Analysis System, IBM Journal of Research and Development, vol.26, issue.6, pp.647-656, 1982.
DOI : 10.1147/rd.266.0647

. Wdl-+-09-]-k, A. Weinberger, J. Dasgupta, A. Langford, and . Smola, Attenberg : Feature hashing for large scale multitask learning, Proceedings of the 26th Annual International Conference on Machine Learning, pp.1113-1120, 2009.

J. [. Woo, M. H. Lee, Y. J. Kim, and . Lee, FINDIT: a fast and intelligent subspace clustering algorithm using dimension voting, Information and Software Technology, vol.46, issue.4, pp.255-271, 2004.
DOI : 10.1016/j.infsof.2003.07.003

T. [. Wolf, P. Poggio, and . Sinha, Human document classification using bags of words, Computer Science and Artificial Intelligence Laboratory, 2006.

S. [. Wang and . Srihari, Classification of newspaper image blocks using texture analysis, Computer Vision, Graphics, and Image Processing, pp.327-352, 1989.
DOI : 10.1016/0734-189X(89)90116-3

]. J. Wyy-+-10, J. Wang, K. Yang, F. Yu, T. Lv et al., Gong : Localityconstrained linear coding for image classification, Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, pp.3360-3367, 2010.

D. [. Yip, M. K. Cheung, and . Ng, HARP: a practical projected clustering algorithm, IEEE Transactions on Knowledge and Data Engineering, vol.16, issue.11, pp.1387-1397, 2004.
DOI : 10.1109/TKDE.2004.74

J. Yang, Y. G. Jiang, A. G. Hauptmann, and C. W. Ngo, Evaluating bagof-visual-words representations in scene classification, Proceedings of the international workshop on Workshop on multimedia information retrieval, pp.197-206, 2007.

R. [. Yalniz and . Manmatha, A Fast Alignment Scheme for Automatic OCR Evaluation of Books, 2011 International Conference on Document Analysis and Recognition, pp.754-758, 2011.
DOI : 10.1109/ICDAR.2011.157

C. [. Yang, M. Stewart, C. L. Sofka, and . Tsai, Alignment of challenging image pairs : Refinement and region growing starting from a single keypoint correspondence, IEEE Trans. Pattern Anal. Machine Intell, vol.23, issue.11, pp.1973-1989, 2007.

K. [. Yang, Y. Yu, T. Gong, and . Huang, Linear spatial pyramid matching using sparse coding for image classification, Computer Vision and Pattern Recognition CVPR 2009. IEEE Conference on, pp.1794-1801, 2009.

T. [. Zhou and . Huang, Relevance feedback in image retrieval : A comprehensive review. Multimedia systems, pp.536-544, 2003.

M. [. Zhang, U. Hsu, and . Dayal, K-harmonic means-a data clustering algorithm, 1999.

V. [. Zhao, P. Hautamaki, and . Franti, Knee Point Detection in BIC for Detecting the Number of Clusters, Advanced Concepts for Intelligent Vision Systems, pp.664-673, 2008.
DOI : 10.1007/s100440070007

W. [. Zhang and . Lenan, A fast document image denoising method based on packed binary format and source word accumulation, Journal of Convergence Information Technology, vol.6, issue.2, pp.131-137, 2011.