top of page

*Appendix 1 -  Data Mining and Analysis

On Weibo, firstly we inquired 4 keywords in the Weibo search under the topic item, including ‘Single’, ‘Off Single’, ‘Marriage’ and ‘Blind Date’, getting 2,443 related topics in total. We then searched ‘Single’ under the synthesis item and got 731 recent (in April, 2018) and hottest posts and for each post, we collected the top 10 comments. (Data collected through "Bazhuayu")

 

On Jiayuan.com, we set 2 criteria, gender and location including Beijing, Shanghai and Guangdong. We first searched the top 2000 males in each area and then females according to their attractiveness score. We collected the 1.2 million URLs of their personal page. Afterwards, on each URL, the personal information, habits and requirements (over 20 attributes in total) are collected with XPath one by one. (Data collected through Knime)

*Appendix 2 - Logic of the project 

Basically, we hope to find out the current situation of China’s dating and marriage market, giving some information about those people who are trying to have blind dates on the online social & match-making sites and the possibility of them to get “off single” in this way.

 

Firstly, in the Background part, our research begins with the hot topics related to “dating and marriage”, aiming at providing an introduction of this topic to our audience. We have found that “off single”, “marriage” and “blind date” are the topics that people cannot escape, which means our research below is of great value.

 

Secondly, in the Data part, we give an macroscopical version of the “group portrait” who enjoy matchmaking service on the online social and match-making sites based on big data analysis. “Basic info”, “Finance & Work” and “Self- intro” are the main points we focus on.

 

Then, in the Story section, we manage to make data become perceptual by providing some vivid story narratives. Mary and Jack are two typical figures from Beijing and Shanghai respectively, since all of their self-info and requirements toward their ideal partners are calculated based on the Mode of its group (on Jiayuan.com). Finally, we provide the matching percentage to reveal the reality-based result and to show the matching probability of this topic to the audience .

bottom of page