Leverage keywords of early Chinese periodicals’ images: towards a knowledge organization system, contextualization and visualization

Introduction Big data and text analysis together have been increasingly used in recent years, but images within texts are not fully analyzed yet. Therefore, providing subject terms or keywords manually has been one of the most important means of enabling users to access and locate images. The study aims at exploring the potentials of keywords of images for digital humanities by constructing a knowledge organization system and the visual contextualization. 

The study selected 827 images in “The Crystal” (1919 and 1940, a Chinese magazine in Shanghai), available from the Early Chinese Periodicals Online (ECPO), and analyzed their 1,958 keywords in the metadata records, which are both Chinese and English assigned by the researchers of the ECPO Project collaborated between University of Heidelberg and Academia Sinica.

Method: First step was to map these bilingual keywords into the Getty Research Institute’s Art & Architecture Thesaurus (AAT), and produced a micro-knowledge organization system, ECPO-KOS, with six facets and seventeen hierarchies. Second step was to visualize the networks of the keywords and attributes of the images with visualization platform Gephi. Last, the studies interviewed three researchers whose are experts of Women in Modern China, to explore if the microknowledge organization system and the network visualization tool can provide any help and insight.

Preliminary Results 1. The keywords can be mapped into 6 Facets of the AAT, except the Styles and Periods Facet(Fig.1). The AAT conceptual structure could be an aid providing better organizing and understanding of these keywords. For instance, the properties between concepts, and the hierarchies of keywords that originally are flat structure. 2. The study used modularity to examine module classes and their possible themes in keyword networks(Fig.2). There are 32 classes, for instance, one of the modularity classes includes music, stars, gossip, movie, art (green color in Fig.2), which might be labeled as themes of amusement or popular culture. In addition, we introduce betweenness centrality to measure how often a keyword appears on shortest paths between keywords in the network. The bigger nodes shows that the keywords might be the brokers across more themes. 3. We demonstrated how keywords in a specific Facet relate to other Facets(Fig.3), for instance, in the keywords of Associate Concepts Facet, female relates to topics of amusement, however, male has more diverse topics including amusement, war, antiJapan movement and military (Fig. 4) . 4. The interviews have showed that the scholars recognized the potential of the microknowledge organization system and visualization might enhance their studies, especially when these are the parts of the personal research tools.

Main Contributions of this Study 1. Providing a set of methodology and implementation research where the AAT-based knowledge organization system, with facet analysis theoretical basis, can allocate the images’ keywords at different facets and the potential applications of Linked Open Data; 2. Presenting different types of networks using social network analysis tools which could provide clues of contextualization, and inspire researchers’ proposing further enquiries.

