Gender specific preference in online dating epj data science full text



Gender-specific preference in online dating

Figure 2 shows that generally the height difference for women sending messages to men (most are 12 cm) are larger than that for men sending messages to women (most are 10 cm) when choosing potential mates. In China, for men, the ideal height difference is that they are 10 cm taller than the person they message, while for women, the ideal height difference is that they are 12 cm shorter than the person they message. According to the data from Yahoo! dating personal advertisements, for users in the U.S., height also matters for dating, especially for females [51]. In Fig. 2, the height difference range for women is smaller than that for men: the minimum height women accept is that men are 3 cm shorter than them and the maximum height they accept is that men are 30 cm taller than them, while the minimum height men accept is that women are 13 cm shorter than them and the maximum height they accept is that women are 32 cm taller than them. Females show the characteristic of likes-attract in terms of preference for height. As is same with age, users seek potential mates with a smaller height difference than predicted by random selection, although the difference is not as obvious as age difference.

Gender-specific preference in online dating

In this paper, to reveal the differences of gender-specific preference and the factors affecting potential mate choice in online dating, we analyze the users’ behavioral data of a large online dating site in China. We find that for women, network measures of popularity and activity of the men they contact are significantly positively associated with their messaging behaviors, while for men only the network measures of popularity of the women they contact are significantly positively associated with their messaging behaviors. Secondly, when women send messages to men, they pay attention to not only whether men’s attributes meet their own requirements for mate choice, but also whether their own attributes meet men’s requirements, while when men send messages to women, they only pay attention to whether women’s attributes meet their own requirements. Thirdly, compared with men, women attach great importance to the socio-economic status of potential partners and their own socio-economic status will affect their enthusiasm for interaction with potential mates. Further, we use the ensemble learning classification methods to rank the importance of factors predicting messaging behaviors, and find that the centrality indices of users are the most important factors. Finally, by correlation analysis we find that men and women show different strategic behaviors when sending messages. Compared with men, for women sending messages, there is a stronger positive correlation between the centrality indices of women and men, and more women tend to send messages to people more popular than themselves. These results have implications for understanding gender-specific preference in online dating further and designing better recommendation engines for potential dates. The research also suggests new avenues for data-driven research on stable matching and strategic behavior combined with game theory.

1 Introduction

As a special type of social networking sites [1,2,3], online dating sites have emerged as popular platforms for single people to seek potential romance. According to a recent survey, nearly 40 million single people (out of 54 million) in the U.S. have been trying online dating, and about 20% of committed relationships began online [4]. Although some psychologists have questioned the reliability and effectiveness of online dating [5], recent empirical studies using the tracking data and survival analysis found that for heterosexual couples, meeting partners through online dating sites can speed up marriage [6]. Besides, one survey found that marriages initiated through online channels are slightly less likely to break than through traditional offline channels and have a slightly higher level of marital satisfaction for the respondents [7].

Mate choice and marital decisions, because of their importance to the formation and evolution of society, have drawn wide attention of scholars from different fields. Two hypotheses, potentials-attract and likes-attract, have been proposed to explain the preference and choice of long-term mates [8]. The potentials-attract means that people choose mates matched with their sex-specific traits indicating reproductive potentials: men pay more attention than women to youthfulness, health, and physical attractiveness of partners which are the characteristics of fertile mates, while women pay more attention than men to ambition, social status, financial wealth, and commitment of partners which are the characteristics of good providers. In other words, men tend to seek young and physically attractive women, while women pay more attention to men’s socio-economic status [9, 10], which is consistent with the Chinese saying “lang cai nv mao” for the choice of long-term partners [8]. In fact, analyzing gender differences of online identity reconstruction in an online social network revealed that men value personal achievements more while women value physical attractiveness more [11]. The likes-attract means that people choose mates who are similar to themselves in a variety of attributes, which is consistent with the Chinese saying “men dang hu dui”. From the perspective of evolutionary and social psychology [12], the difference in parental investment strategies determines the different mate selection strategies for both sexes [13]. Empirical studies on offline dating showed that mate choice is very much in line with the evolutionary predictions of parental investment theory on which potentials-attract hypothesis is founded [14, 15], while one research on a Chinese online dating site showed that mate choice is more consistent with the likes-attract hypothesis [8].

From a sociological perspective, compared with the offline environment, online dating largely expands the search scope of potential mates [16, 17]. The Internet allows users to form relationships with strangers whom they did not know before, whether through online or offline channels. For individuals who are difficult to find potential partners through offline channels, such as homosexuals and middle aged and elderly heterosexuals, the Internet provides an ideal platform for them to meet their partners. The preference of people for mate selection has been extensively studied [18,19,20,21], such as the preference on education level [22], age [23] and race [24, 25]. The matching pattern or the choice for potential mates, shows a homophily phenomenon [26, 27], that is, people prefer to choose mates who are similar to themselves. Three possible reasons lead to homophily. First, similar people are more likely to have the same hobbies and reach the same places, thus it is easier to see each other [17]. Second, there exists homophily for the relationship from the introduction of friends and relatives [28]. Finally, the similarity between partners can also be explained by individual preferences or cost/benefit calculation. By analyzing OkCupid data [21], Lewis found that although there is a similarity preference for partner selection, the preference is not always symmetrical for men and women. On some online dating platforms, users can browse the profiles of the other users anonymously, without leaving any trace of visit. A recent study on a major North American online dating site found that anonymous users viewed more profiles than nonanonymous ones, however nonanonymity can achieve better matching results [29].

Economists usually study mate choice and marriage problem from the perspective of game theory and strategic behavior [30,31,32,33,34,35]. Considering the difference of mate choice for both sexes in marriage market, Becker regarded the marriage matching problem of mate choice as a frictionless matching process, and by constructing a matching model, Becker proved that the mate choice is not random, but a careful personal choice of attributes [30, 31], which is later extended to a barging matching by Pollak et al. [32]. Marriage market is the first stage of a multi-stage game and corresponds with the Pareto efficiency of equilibrium. In the Internet age, Lee and Niederle launched a two-stage experiment in online dating market using rose-for-proposal signals [36], and found that sending a preference signal can increase the acceptance rate. Some other scholars also studied the mate preference from the economic perspective [37, 38]. For example, Fisman et al. found that male selectivity is invariant to size of female group, while female selectivity is strongly increasing in size of male group [37].

Computer scientists usually study online dating from the perspective of user behaviors [39,40,41] and recommendation systems [4, 42,43,44]. By analyzing online dating data, Xia et al. found that there exists distinct difference between preferences of men and women [41], and there also exists difference between users’ stated and actual preferences. Xia et al. also proposed a reciprocal recommendation system for online dating based on similarity measures [4]. For general social networks, gender differences lead to obvious differences in behaviors and preferences between men and women. Research on an online-game society showed that females perform better economically and are less risk-taking than males, and they are also significantly different from males in managing their social networks [45]. Another research found sex-related differences in communication patterns in a large dataset of mobile phone records and showed the existence of temporal homophily [46].

Although the research on mate choice, both offline and online, has been extended to many fields, the following problems still exist: (i) online dating sites are a special kind of social networking sites, but the most previous researches focus only on the users’ demographic attributes, and have not considered users’ network centrality in dating sites, which can be potential important factors associated with users’ mate selection; (ii) most studies focus on male and female preferences in mate choice, but they do not properly examine the compatibility of the two parties’ preferences; (iii) with the advent of big data era, the methods of machine learning, such as ensemble learning, have been widely applied to diverse fields to achieve good prediction performance. However, most of the existing literature still only uses the econometric methods to study users’ mate choice.

To address the research gap, in this paper, using empirical data from a large online dating site in China, we explore the users’ attribute preference compared with random selection, and use logistic regression to study how the users’ demographic attributes, popularity and activity and compatibility scores are associated with messaging behaviors, which reveal the gender differences in potential mate selection. We also use ensemble learning classifiers to sort the importance of various potential factors predicting messaging behaviors. At last we use correlation analysis to study users’ strategic behavior.

2 Dataset

This study is based on a complete anonymized dataset extracted in 2011 from a large online dating site in China for only heterosexual users. The dating site provides many features common to other popular online dating platforms: it allows users to set up a profile, browse the profiles of potential mates, be browsed by the potential mates, and send and receive messages. Specifically, when a registered member (user) A visits the dating site, at a specific position of his/her homepage, the site will recommend to him/her the members that he/she may be interested in according to certain rules. At this time, A can only see the members’ avatar (real photo), nickname, location and age. After A enters the members’ homepage, he/she can browse their detailed personal information without leaving the trace of visit. After that, if A feels very interested in some member, he/she will contact the member through the internal letters of the site. There are three data tables in the dataset, including female profiles, male profiles and the user behavior data. There are total 548,395 users in the dataset including 344,552 male users and 203,843 female users. The users’ profiles include 35 attributes, such as user ID, gender, birthday, education level, mate requirements and so on. The dating site requires the registered users to be at least 18 years old at the time of registration, thus on the platform the minimum user age is 18.

3 Results

3.1 Attribute preference analysis

3.1.1 Attribute difference distribution

In online dating, there are significant gender differences in terms of attribute preference, self-presentation and interaction [47]. Users usually have a certain preference for mates’ age or height. For both men and women, when they send messages to their potential partners, we compute the age difference as age(receiver) − age(sender), and the height difference as height(receiver) − height(sender). Figures 1 and 2 show the age difference and height difference distributions, respectively. As a comparison, we also show the randomized results by assuming that female(male) users randomly send messages to male(female) users.

Figure 1

Age difference distribution. FM represents that female users send messages to male users and MF represents that male users send messages to female users. Solid lines represent the locally weighted polynomial regression fitting of their corresponding data points, and the gray interval represents a 95% confidence region

Figure 2

Height difference distribution. FM represents that female users send messages to male users and MF represents that male users send messages to female users. Solid lines represent the locally weighted polynomial regression fitting of their corresponding data points, and the gray interval represents a 95% confidence region

In most times and places, women usually marry older men [48, 49]. Figure 1 shows that in modern Chinese society, on average, men prefer women two years younger than them and women prefer men two years older than them. However, the range of age difference that women accept is smaller than that of men: the minimum age women accept is that men are 11 years younger than them and the maximum age they accept is that men are 23 years older than them, while the minimum age men accept is that women are 25 years younger than them and the maximum age they accept is that women are 28 years older than them. If only the age difference distributions are considered, in line with previous findings from a range of cultures and religions [50], we find that the range of ages that women are willing to message is narrower than the range of ages that men are willing to message. Male and female preferences are not random; they seek potential dates with a smaller age difference than predicted by random selection, which shows the characteristic of likes-attract.

Figure 2 shows that generally the height difference for women sending messages to men (most are 12 cm) are larger than that for men sending messages to women (most are 10 cm) when choosing potential mates. In China, for men, the ideal height difference is that they are 10 cm taller than the person they message, while for women, the ideal height difference is that they are 12 cm shorter than the person they message. According to the data from Yahoo! dating personal advertisements, for users in the U.S., height also matters for dating, especially for females [51]. In Fig. 2, the height difference range for women is smaller than that for men: the minimum height women accept is that men are 3 cm shorter than them and the maximum height they accept is that men are 30 cm taller than them, while the minimum height men accept is that women are 13 cm shorter than them and the maximum height they accept is that women are 32 cm taller than them. Females show the characteristic of likes-attract in terms of preference for height. As is same with age, users seek potential mates with a smaller height difference than predicted by random selection, although the difference is not as obvious as age difference.

It is noteworthy that in the dating site, users’ characteristics are all self-reported. For impression management considerations [52], users can exaggerate their personal characteristics [53]. For example, a recent research on online self-reported height against objectively measured data in young Australian adults revealed that self-reported height is significantly overestimated by a mean of 1.79 cm for males and 1.29 cm for females [54]. Men lie more than women about their height, which is also found in the online daters of New York City [55]. We note that users seem to have not accurately reported their physical height in the dating site. In the dataset, the average heights of female and male users are 161.99 cm ( \(\mathit=4.18\) ) and 173.08 cm ( \(\mathit=4.68\) ), respectively. However, in real world the average heights of adult females and males in China are 160.88 cm and 169.00 cm, respectively, which means that female and male users can exaggerate their height by an average of 1.11 cm and 4.08 cm, respectively. After correcting these, we find that real height differences \(10-(4.08-1.11) = 7.03\text< cm>\) for men, and \(12-(4.08-1.11) = 9.03\text< cm>\) for women would be significant. However we also notice that in the dating site, the average ages of male and female users are 28.73 and 28.58 years old, respectively, while in the overall adult population in China, the average ages of men and women are 40.56 and 41.01 years old respectively according to the population census data. The dating population is younger than the overall adult population, thus is likely taller, and users may not exaggerate their height by quite as much as calculated.

3.1.2 Attribute preference

When a user sends a message to another user, his/her choice of recipient may not be random, but rather has some preference for certain attributes, such as preference for employment, education, income, and so on. To characterize the preference of sender with attribute i for receiver with attribute j, let \(m_\) be the number of messages sent from users with attribute i to users with attribute j, \(m_\) be the total number of messages sent from users with attribute i, \(n_\) be the number of receivers with attribute j, and n be the total number of receivers, then the attribute preference is \(p_ = m_ /m_ - n_ /n\) . \(p_>0\) indicates that compared with random selection, senders with attribute i have a preference for receivers with attribute j, \(p_=0\) indicates that there is no preference and \(p_j.

Employment preferences are shown in Figs. 3 and 4 (see Tables 1 and 2 in Additional file 1 for the meanings of attributes and the number and proportion of men/women for each employment). We find that compared with males sending messages to females, when female users send messages to male users, there is a stronger preference for the employments of their potential mates. In Fig. 3, we find that women who are students, accountants, educators or in other uncategorized occupations are not preferred by men, while women engaged in design are slightly popular in terms of the relative amount of messages received, especially for men in aviation service industry. At the same time, we also find that in these data, men engaged in housekeeping only send messages to women in accounting and men engaged in translation industry only send messages to women who are private owners, which may be due to the small sample size of user behavior with respect to these attributes.

Figure 3

Employment preference for male users sending messages to female users. The vertical axis indicates the male occupations and the horizontal axis indicates the female occupations. Preference values are represented by different colors