Special Sessions

Aims and Scope

With the rapid development of video monitoring system around the city, the demand of intelligence surveillance application for pedestrian grows rapidly. Intelligence surveillance applications for pedestrian include pedestrian detection and tracking, pedestrian intelligent analysis and pedestrian pose recognition and event detection. Person Re-identification (ReID) is to identify the same person across multiple cameras which plays an important role in various surveillance applications, such as pedestrian retrieval and public security event detection. Pedestrian detection, tracking and ReID in real-world applications are challenging due to various body poses, view of cameras, illumination, and cluttered backgrounds.

Recently, with a significant demand for efficient representation of pedestrian in surveillance scenario, the analysis of visual content for pedestrian is becoming increasingly urgent. Low-level pedestrian representation for largescale gallery ReID is rather challenging because it involves both performance and speed problems. The development is constrained mainly by the semantic gap between visual context and human understanding. Great advancements in machine learning and artificial intelligence have made large-scale pedestrian intelligence application possible, which received a lot of interest and attention from both academic and industrial research communities.

This special session seeks original contributions reporting the most recent progress on different research directions and methodologies on pedestrian visual content analysis and its wide intelligence applications. It targets a mixed audience of researchers and product developers from several communities, i.e., multimedia, machine learning, computer vision, etc. The topics of interest include, but are not limited to:

Different directions of large-scale pedestrian intelligence analysis:

  • Pedestrian detection
  • Pedestrian tracking
  • Person Re-identification
  • Pedestrian attribute recognition
  • Public security event detection
  • Person pose recognition
  • Person action detection
  • Person detection and body part proposal
  • Affective computing for pedestrian analysis

Machine learning methodologies for large-scale pedestrian intelligence analysis:

  • Weakly-supervised/unsupervised learning
  • Few/one/zero shot learning
  • Deep learning and reinforcement learning
  • Metric learning
  • Multi-modal/multi-task learning
  • Object detection and tracking algorithm

Important Dates

  • Regular Paper submission deadline: December 3, 2018
  • Regular Paper acceptance notification: March 11, 2019
  • Camera-Ready Regular Paper submission deadline: April 8, 2019

Paper Submission

Authors should prepare their manuscript according to the Guide for Authors of ICME available at http://www.icme2019.org/author_info and submit their papers at the submission page http://www.icme2019.org/paper. All the papers will be peer-reviewed following the ICME reviewing procedures. Manuscripts should not be published or currently submitted for publication elsewhere. Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere.

Paper Review Process

All the submitted papers to this special session would be reviewed by at least three invited reviewers, who are competent and have experience in the areas of the subject matter of the papers. Referees are formal reviewers whose comments and opinions will form the basis upon which the special session chairs will decide whether or not to accept the papers. The chairs’ decisions are always based on all of the reviews received, but mixed reviews present the need for the exercise of chairs’ judgment. The review process shall ensure that all authors have equal opportunities for publication of their papers.

Guest Editorial Board

  • Prof. Guiguang Ding, Tsinghua University, China. E-mail: dinggg [AT] tsinghua [DOT] edu [DOT] cn
  • Dr. Sicheng Zhao, University of California, Berkeley, USA. E-mail: schzhao [AT] gmail [DOT] com
  • Prof. Jungong Han, Lancaster University, UK. E-mail: jungong.han [AT] lancaster [DOT] ac [DOT] uk

Session abstract


This special session is proposed under the context of fast growing of retail stores. Today’s empowering retail store is undergoing a dramatic transformation. A sharpened focus on customers and their shopping is converting traditional store sites into leaders of technological innovation. The application of multimedia technologies such as unmanned convenience stores and face recognition payment in the retail industry has achieved rapid development. Meanwhile, the empowering retail experiences promote the development of new technologies. New technologies will continue to be an important driving force for new retail change. Thus, in this special session we aim to bring together the latest advance in this field to draw attention from both the academic and industrial community.


In the industry of empowering retail, a future trend is use technological means to enhance consumer experience. Since the consumption capacity of consumers has been improved, they expect better shopping experience. This enforces the retail industry to carry out consumption upgrading to satisfy customer needs to maximize the profits. To this end, multimedia technologies offer potential opportunities for shopping experience enhancement for a wide range of consumers. For example, face recognition enables fast payment without physically presenting a credit card. Object recognition techniques allows the shop to know which items are bought. Based on these technologies, retailers are also able to streamline internal operations, create new revenue streams, reduce costs, analyze consumption habits, and improve staff productivity.

Meeting retail challenges with multimedia technologies solutions

Multimedia techniques are central to retailers’ ability to understand and predict customer behavior. “Scan and go” programs are changing the way consumers pay. In general, there are three ways technologies meet the challenges. First, the autonomous perception and learning system of biological characteristics solves the identification problem of consumers in open space. It involves face recognition and detection methods. Second, the object recognition and trading system is the core of this multimedia technology. It has high requirement for safety, accuracy and speed. The third is target detection and tracking. In the process of continuously tracking consumers, posture recognition may be more feasible than face recognition, which mainly relies on multichannel cameras. In addition, NLP technology can assist consumers with good shopping experience. The large-scale data accumulated with user purchase history allows retailers to understand customer behavior and recommend appropriate items to the users. The solution is a complete and continuously optimized system that combines computer vision, natural language processing, and speech recognition. Therefore, we believe this special session will facilitate a closer integration of multimedia technologies with industrial applications.

Paper Submission

Authors should prepare their manuscript according to the Guide for Authors of ICME available at http://www.icme2019.org/author_info and submit their papers at the submission page http://www.icme2019.org/paper. All the papers will be peer-reviewed following the ICME reviewing procedures. Manuscripts should not be published or currently submitted for publication elsewhere. Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere.


  • Wu Liu, Senior Researcher, JD AI Research, liuwu1 [AT] jd [DOT] com
  • Liang Zheng, Lecturer, Australian National University, liangzheng06 [AT] gmail [DOT] com
  • Lexing Xie, Associate Professor, Australian National University, lexing.xie [AT] anu [DOT] edu [DOT] au
  • Yi Yang, Professor, University of Technology Sydney, yi.yang [AT] uts [DOT] edu [DOT] au

Topic Summary

Multi-modal information can be seen everywhere on the Internet, these information cooperation works towards a common or mutual target, such as cross-modal multimedia analysis and recommendation.The fast developments of Internet technologies facilitate users to access and consume information at any time and place. With the flourishing of various online and mobile services, users create a lot of content and leave many traces in the Web. For example, in E-commerce sites like Amazon, we know which products are purchased by a user; in news portals like Toutiao, we know which events or topics are of interest to a user; in image-sharing sites like Pinterest, we know which images are liked by a user; in video-sharing sites like YouTube, we know which videos are attractive for a user. The prevalence of such user behaviors on multi-modal content --- including but are not limited to texts, images, videos, and products --- makes it possible to comprehensively understand users, which are beneficial to many downstream applications such as search engine, recommendation, question-answering, dialog systems, etc.

Indeed, user profiling research has recently emerged to meet this need by uncovering the profile of users including demographics, preferences, personality, and even health-related statuses. However, existing work has largely focused on understanding users’ behaviors on the data of a single modality --- mostly texts, while leaves the much richer yet complicated multimedia content, such as images and (micro-)videos less tapped.

This special section focus to seek the latest developments of Multi-modal User Profiling and Multimedia Recommendation. This special section serves as a forum for researchers all over the world to bring together active researchers to hare their recent advances in this exciting area. We solicit original contributions in three primary categories: (1) state-of-the-art theories and novel application related to cross-modal user profiling and multimedia recommendation; (2) survey of the recent progress in this area; and (3) benchmark datasets.

The topics of interest for this special issue include, but are not limited to:

  • Multimedia recommendation theory and application
  • Multi-modal modeling and recommendation
  • Knowledge graphs for recommendation
  • Multi-modal dataset collection methods for multimedia recommendation
  • User profiling on demographics with multi-modal data
  • User profiling on personalities with multi-modal data
  • User profiling on interests with multi-modal data
  • User profiling on self-defined tags with multi-modal data
  • Embedding-based user profiling with multi-modal data

Paper Submission

Submitted papers should present original, unpublished work, relevant to one of the topics of the Special Session. All submitted papers will be evaluated on the basis of relevance, significance of contribution, technical quality, scholarship, and quality of presentation, by independent reviewers. Authors should prepare their manuscripts according to the instructions of use the link of submission guidelines (http://www.icme2019.org/author_info). Manuscripts that are totally irrelevant to the topics will not be taken into consideration.

Special Session Organizers

Feng Xue, Hefei University of Technology, Hefei, China (feng.xue [AT] hfut [DOT] edu [DOT] cn)

Richang Hong, Hefei University of Technology, Hefei, China (hongrc.hfut [AT] gmail [DOT] com)

Hanwang Zhang, Nanyang Technological University, Singapore (hanwangzhang [AT] gmail [DOT] com)

Special Session Chairs

Junwei Han
Northwestern Polytechnical University, China
Enrico Magli
Politecnico di Torino, Italy