In this section we first discuss our dataset and the (network) measures for identifying key players. Next, we report and interpret results for the task of distinguishing vendors from non-vendors and predicting the levels of vendor success. Then, we explore to what extent the rankings induced by (network) measures can reduce the set of users for law enforcement to investigate, while still including the greatest share of successful vendors. Finally, we look at the set of top ranked users for the most promising network centrality measure and activity indicator at a specific point in time. We do so to establish how well represented key players are among these top ranked users.
Data
In this study we focus on the cryptomarket Evolution. Evolution was active from January 2014 until March 2015, when it closed due to an exit scam. At the time, it was one of the most popular cryptomarkets5. It formed a combination of a carding forum, where card information (e.g., credit/debit/ID/etc.) is traded, and an underground drug market25.
We obtained raw data of the Evolution marketplace and forum from the dark net market archives26. From this, we extracted a structured dataset, established a method of linking the market and forum data, and subsequently extracted communication network(s). The extraction and linking process, the resulting dataset, and various statistics on the dataset and its completeness, are presented in Boekhout et al.27. Parameters of the network extraction procedure control respectively the bounds on when two posts constitute a social tie (\(\delta _o\) and \(\delta _t\)) and the strength of the social tie (\(\omega _{lower}\), \(t_{lim}\), and \(\omega _{first}\)). For the communication network(s) studied in this work, the same extraction procedure and parameters were used as those used in Boekhout et al.27, i.e., \(\delta _o = 10\), \(\delta _t = 1\) month, \(\omega _{lower} = 0.2\), \(t_{lim} = 7\) days, and \(\omega _{first} = 0.5\). We demonstrate the robustness of our findings for each of these parameters in Supplementary Material Section S1.
The cryptomarket Evolution observed two notable changes in user and post activity. In the initial months up to May 2014, the cryptomarket underwent steady growth in terms of both post activity and the number of active users. However, monthly post activity stabilised from May until October (see Fig. 5). Notably, May saw a change in the vendor ranking system, which assigns textual labels to vendors that are visible on the marketplace to potential customers and imply a level of success and trustworthiness. Obtaining a label representing greater success and trustworthiness as a vendor now required sufficient positive feedback, but most important for us, the new ranking system also reported on the exact number of sales a vendor had made up to that point. The second major change to the cryptomarket came in early November 2014, as a by-product of the closure of six cryptomarkets following the joint international law enforcement operation dubbed “Onymous”5. After this disruption, Evolution showed a significant increase in overall activity until its closure.
Both the communication networks and current & future sales counts were extracted on a monthly basis using data up to the end of each month, including all data prior to the given month. As such, we obtained 15 network snapshots (starting from January 2014 up to March 2015). Note that we rely on all data prior to the given month and not only the most recent month(s), because a vendor’s reputation plays an important role in their success and is based not just on the most recent activity. In fact, the build up reputation is such a vital aspect that it is predominantly the successful vendors with a large number of sales and high reputation who choose to migrate and maintain their identity in new cryptomarkets after market closures28. Details on the network extraction process and the computation of monthly sales statistics are provided in the “Methods” section.
Network measures & activity indicators
Each considered network measure captures a different role a user may play within the user-to-user communication network. To cover a wide range of user roles that may be important to vendor success, we report on four centrality measures: (1) in-degree; (2) bidirectional harmonic closeness centrality; (3) directed weighted betweenness centrality; and (4) directed weighted PageRank. The in-degree of a user indicates the number of different users that posted (shortly) after them on the same topic(s). Thus, it can serve as a proxy of how many users have seen one or more of their posts and thus to some extent their level of name recognition. The bidirectional harmonic closeness centrality29 is a measure of a user’s ability to reach the entire network, following paths regardless of link direction. High harmonic closeness centrality indicates that it should be relatively easy to reach and therefore potentially be visible to the entire user base. The directed weighted betweenness centrality30,31 computes how often a user lies on shortest paths connecting other nodes, taking into account both the direction and strength of social ties. High betweenness nodes often lie ‘between’ communities. As such, it may be a good measure of how well a (potential) vendor reaches different, otherwise separated, communities of customers. Finally, the directed weighted Pagerank32 computes the probability that a random walker that infinitely traverses a network ends up at a given node, taking into account both the direction and strength of social ties. High PageRank centrality is often an indicator of being well connected to other important users. Duxbury and Haynie24 found that buyers were more likely to continue ordering with vendors within the same community. As such, a close connection with other key players, as indicated by a high PageRank value, can be indicative of a high perceived trust, positively affecting sales. Finally, we note that links in the communication network are temporally independent unless they rely on the same post(s), i.e., the link (a, b) is not dependent on the existence of link (b, c) unless they were formed based on the same post by user b. As such, were we to consider only time-respecting paths (e.g., as introduced by Kempe et al.33), which require temporally dependent links, we would not adequately capture the social aspect of the network, i.e., the desired concepts of familiarity and shared interest. Therefore, we focus on ‘static’ network measures.
To evaluate the network measures we compare them against three activity indicators, which serve as our baselines. These activity indicators can be computed directly from the forum data, so without aforementioned communication network extraction, are intuitively meaningful in the context of cryptomarket vendor success and also do not require knowledge of message content. We consider: (1) post activity; (2) topics started; and (3) topic engagement. Post activity refers to the number of posts a user has placed on the forum. It relies on the idea that greater activity means greater visibility, which in turn leads to greater name recognition. Topics started determines the number of topics a user started and topic engagement subsequently computes the sum of all posts placed within those topics, regardless of who posted them. These measures rely on the fact that the more topics a user has started and the more engagement those topics received, the greater the likelihood that they are a (successful) vendor. This is supported by Armona10 previously concluding that a similar measure of vendor forum sentiment could be indicative of higher demand for a vendor on the Agora cryptomarket. Whereas their measure relied entirely on forum post texts (and thread titles) for the selection of posts and computation of the sentiment, our activity indicators can be determined entirely independent of post content. Again, the increased visibility through starting topics also boosts name recognition.
Further details on the computation and interpretation of the measures is provided in the “Methods” section.
Distinguishing vendors and their level of success
To predict vendor success, we must determine if it is possible to distinguish between vendors and non-vendors, as well as between various levels of success. We look at the average network centralities and activity indicators for groups of users, in an attempt to distinguish groups with greater success. To this end, we divided, for each month, all active vendors, i.e., all users that are or will become vendors with at least one post already posted at that time, into five groups of success percentiles, each including respectively the top 0–20%, 20–40%, etc. of vendors in terms of sales. We refer to these groups as vendor percentiles. Separate vendor percentiles are formed for current and future success. We refer to the most and second most successful percentiles as the top and sub-top percentile, respectively. The non-vendors, consisting of regular forum users and those vendors with no recorded sales at all, form a separate sixth group.
First, we computed for each month the mean normalized value for each measure for the groups of all vendors and all non-vendors, using min-max normalization. From this the relative and absolute difference scores between vendors and non-vendors was computed for each of the four network measures and three activity indicators (see “Methods” section for more details on their computation). The resulting scores are depicted in Fig. 1. In these figures, lines give a third polynomial approximation of the trend based on the monthly centralities and activity indicators. Here, the third polynomial is used to try to account for the two aforementioned events that took place in the Evolution cryptomarket27. Dashed lines are used for the network measures and dotted lines for the activity indicators.
Figure 1 shows that, for all measures, vendors have higher network centralities and activity indicators than non-vendors. Furthermore, they show that although the relative difference score for betweenness centrality of vendors over non-vendors is quite significant (600–1000%), the corresponding absolute difference score is the smallest of all these measures. This indicates that betweenness has relatively small values overall with some extremely high outliers. On the contrary, harmonic closeness centrality has low relative difference scores but nominal absolute difference scores. Since these effects are expected to disappear when inducing a ranking from the actual values, it is less the size of the difference scores than the fact that they are positive that are an indicator of (useful) predictive power. After all, the ranking induced by the centralities and activity indicators is more useful to law enforcement practitioners than the actual values. Thus, the exclusively positive values in Fig. 1, indicate the potential of all network measures and activity indicators to distinguish vendors from non-vendors.
Next, we investigate whether these measures can also distinguish between vendors’ levels of success. To assess this, we looked at the relative difference scores between the top percentile and all vendors (Fig. 2a, b) and between the top and sub-top percentile (Fig. 2c, d) for both current and future success. Figure 2a shows that for all measures the currently most successful vendors have on average higher network centralities and activity indicators. After the first month and with the exception of July and August 2014 for betweenness centrality, Fig. 2c demonstrates this also holds when comparing the top with the sub-top percentile. Interestingly, trend changes for most measures follow cryptomarket developments. For example, up until May the difference score increases monthly, similar as to how the level of activity on the cryptomarket increased during this period. The following period, up to the November 2014 “Onymous” disruption5, shows stable but slightly decreasing difference scores for most measures. Finally, after this disruption, we see a small increase in difference scores again.
When we consider future success, Fig. 2b shows again positive difference scores between the top vendor percentile and all vendors. However, they are noticeably lower than for current success. Similarly, Fig. 2d shows mostly positive difference scores when comparing with the sub-top percentile, but with lower scores. Thus, for both current and future success the network centralities and activity indicators show the potential to distinguish vendors’ level of success.
Notably, betweenness centrality shows trends that differ from all other measures. In particular, for current success we see clearly higher difference scores in the last months. On the contrary, for future success the final months show lower difference scores than before. This behaviour is likely due to the delay between successful vendors establishing themselves in the network and reaping the benefits in terms of sales. In other words, high betweenness centrality is expected to be more a prelude to than a consequence of vendor success. Thus, these results show the potential of betweenness centrality as an early warning signal for future vendor success.
In short, for all measures vendors show positive difference scores over non-vendors and less successful vendors. Thus, rankings induced by these measures are expected to rank successful vendors (relatively) higher. Therefore, the induced rankings have the potential to assist law enforcement by allowing them to focus investigative efforts on higher ranked users. Furthermore, betweenness centrality was shown to have potential as an early warning signal, as high betweenness appears to precede vendor success. Finally, among the remaining network measures and activity indicators, topic engagement consistently showed the highest difference scores. This suggests that topic engagement may provide the best predictions of vendor success.
Detecting vendors in the user base
In their efforts to disrupt cryptomarkets, law enforcement has access to limited personnel and resources. One method employed by law enforcement to deal with this limitation, is to reduce the set of users to investigate based on a ranking induced by some measure. Rankings that after such a reduction still include many users of interest, are of course preferable. In the previous section, we established the predictive potential of the network measures and activity indicators for predicting (successful) vendors. Now, with the specific law enforcement perspective of aiming to find as many (hard to identify) vendors as possible, we want to explore how this predictive potential translates to the task of reducing the set of users to investigate. To do this, we consider what we call the vendor recall. The vendor recall computes what percentage of users among the top vendor percentile (the top 20% of vendors) is also among the top percentile of all users, i.e., among the top 20% of all users when ranked on a given network measure or activity indicator (see “Methods” section for further details). Note that we focus on the top percentile, instead of the absolute top vendors, as this aligns with the law enforcement intervention method of dissuading continued participation in the cryptomarket. Since this intervention method is known to be ineffective for the absolute top vendors and comes at a relatively low cost to law enforcement, it is more suited to targeting larger groups of vendors. Furthermore, although a vendor’s sales volume is merely a proxy for their trade volume, we may reasonably expect those with the largest trade volume to be among the top vendor percentile in terms of sales. For these reasons we also prioritize reporting the recall of vendors over sales. Monthly vendor recalls are plotted in Fig. 3 for current (a) and future (b) success, respectively. As noted before, lines in these plots are third polynomial approximations of the trend.
Figure 3 shows that, for both current and future success, degree and closeness centrality generally have a worse vendor recall than any of our activity indicators. From May onwards, PageRank outperforms post activity and performs on par with the topics started indicator. Meanwhile, from July onwards, betweenness centrality consistently outperforms both the post activity and topics started activity indicators and performs (nearly) on par with topic engagement. Overall, the topic engagement indicator most consistently achieves high performance in terms of vendor recall. These observations tell us two things. First, network centrality measures require the communication network to have developed and stabilised sufficiently before achieving reliable vendor recall. During the initial months the communication network and its structure are still undergoing significant changes. Consequently, we also see large fluctuations in vendor recall for the network measures between these months. Second, network measures do not strictly improve on our best activity indicator(s) in terms of vendor recall.
Despite achieving the best vendor recall, topic engagement is only able to detect up to 2/3rd of the most successful vendors for current success and even fewer for future success. Thus, there may still be a significant number of successful vendors that are not detected by the activity indicators that may be included by network measures. To investigate this, we analyse the overlap of detected vendors between the network measures and activity indicators. Table 1 shows the average monthly overlap of each network measure with each individual activity indicator and the union of detected vendors by all activity indicators. We see that PageRank and betweenness centrality detect the greatest share of vendors also found by the activity indicators, detecting on average approximately 80% of all current vendors and 75% of all future vendors found. However, respectively nearly 99% and 97% of all vendors detected by PageRank are also found by the activity indicators. As such, PageRank is not able to identify many new vendors. On the contrary, the activity indicators find respectively only 94% and 90% of the vendors included by betweenness centrality. Notably, individual indicators find far fewer. Thus, betweenness centrality is able to detect the largest share of successful vendors not included by any of the activity indicators. Therefore, reducing the set of users for law enforcement to investigate using betweenness centrality may provide a fresh perspective.
Despite finding additional vendors, the union of all successful vendors detected by betweenness centrality and all activity indicators only finds around 75% and 65% of the top percentile for current and future success, respectively. This means there is still a significant segment of the most successful vendors that would not be found for any of these measures. One possible explanation for scoring low on any of these measures is simply low posting activity. To assess whether this holds for the successful vendors that do not score high enough to be detected, we look at what we call the post activity recall of the top vendor percentile in Supplementary Material Section S2. The post activity recall is the percentage of the top vendor percentile’s total post activity, for a given month, that is associated with those vendors detected with vendor recall (see “Methods” section for further details). We find that for both current and future success, the vast majority of post activity is associated with the vendors with high network centrality and activity indicators. As such, low post activity can be considered the main reason for the relatively low vendor recalls we observe. After all, though over 30% of successful vendors are not found, they are responsible for less than 10% of the post activity of the entire group (in most cases even less). Indeed, Fig. 4a, c show that any vendor with activity above a certain threshold is always among the detected vendors, while most vendors with very few posts are not. Specifically, it demonstrates that for both topic engagement and betweenness centrality for September 2014, this threshold is below 100 posts (as confirmed by Fig. 4b, d). This also holds for the other centrality measures, as demonstrated in Supplementary Material Section S3. We note that vendors with low post activity are also much less likely to be found using other methodologies. Therefore, applying the methods discussed in this paper is likely not to miss vendors that other methodologies might have found. Thus, the relatively low vendor recall achieved by betweenness centrality and topic engagement should not discourage law enforcement practitioners from using them.
Figure 4 further indicates that vendors are overall more likely to be identified the greater their respective success. This is also demonstrated through sales recall, which measures what percentage of sales of the entire top percentile the detected vendors are responsible for (see Methods section for further details), in Supplementary Material Section S2. There we show that the sales recall is generally between 10 and 20% higher than the corresponding vendor recall. This indicates that the detected vendors are, on average, the more successful vendors. In Supplementary Material Section S3 we show that this finding also holds for other months and network measures. Furthermore, from Fig. 4a, c it appears that the vendors found by topic engagement, and not betweenness centrality, are generally slightly more active and less successful compared to those found by betweenness centrality and not topic engagement. Indeed, Fig. 4b, d confirm that, for vendors with between 10 and 100 posts, those found exclusively by betweenness are generally more successful. Notably, the effect seems to be even stronger for future success and this is moreover confirmed to hold for other months in Supplementary Material Section S3. This observation once more highlights the potential of betweenness centrality as an early warning signal.
Throughout this section we have considered a single threshold at which we cut-off the rankings, namely 20% of all users. In Supplementary Material Section S4, we investigate the performance of the measures at different thresholds. We observe that for low false positive rates, up to around 20%, our findings hold. For higher false positive rates however, topic engagement clearly outperforms all other measures. However, given the limited resources of law enforcement, it is unlikely that such large user samples would ever be considered for investigation. After all, the resources required to investigate even 20% of users would likely exceed those available to law enforcement. Additionally, we find that topic engagement is the best measure for predicting vendors, regardless of their level of success.
To summarise, topic engagement provides the best single measure recall performance. Meanwhile, betweenness centrality identifies the greatest share of vendors that do not score high for any of the activity indicators. Additionally, betweenness centrality detects the most vendors of all network measures. As such, betweenness centrality is the network measure most likely to be of use to law enforcement for detecting vendors in the user base. Furthermore, betweenness centrality uniquely finds relatively more successful vendors among those with moderate activity. Notably, this effect is stronger for future success, further demonstrating its potential as an early warning signal.
Key player identification
In the previous section we determined that betweenness centrality and topic engagement are the measures with the greatest vendor recall performance. That is to say, they are likely to have the most successful vendors among the top ranked users when ranked on these measures. Here we look at the top scoring users to investigate to what extent the top scoring users are indeed key players in the cryptomarket. To this end, we report the top 25 users, their member title, and their current and future sales for September 2014 for these measures in Table 2.
We see that among the top 25 users in betweenness centrality and topic engagement there are ten (i.e., 40%) that occur in both rankings. Furthermore, we observe that for both measures over half of the top 25 users have current and/or future sales (56% and 64% respectively). The probabilities of this happening randomly are more than a million times smaller (\(3.47 \times 10^{-7}\) and \(3.44 \times 10^{-9}\) respectively). Note, not all users with sales also have the corresponding “Vendor” member title. The reason for this is twofold: first, more important titles such as “Administrator” and “Moderator” supersede the “Vendor” title; and second, the “Vendor” title did not exist before September leading to some older vendors with few future sales not to be labelled as such. This also illustrates a potential pitfall of relying too much on forum member titles for key player identification.
Of the users with sales, twelve are among the top percentile for current sales and eight are among the top percentile for future sales. Respectively three (kalashnikov, Yasuo, and Grandeur) and one (SkypeMan) of them are in fact in the top 10 current and future sales. This suggests, these two measures are suitable for predicting potential successful vendors. Notably, Trippyy, who is included in the top 25 for betweenness centrality, is the only user that is a member of the top percentile for future sales, but not a member of the top percentile for current sales. Note, that Trippyy’s member title in September was still “Vendor”. Additionally, betweenness centrality appears to include a greater proportion of vendors for whom the majority of their sales are yet to come. On the other hand, we observe that the inclusion of kalashnikov and SkypeMan for topic engagement means that it captures a substantially greater total of future sales among the top 25 users. If our goal were to identify the absolute top vendors specifically, these results may be interpreted to imply that topic engagement is the better choice of measure. However, recall that sales volume does not equate to trade volume, but is merely a proxy of it. After all, the trade volume associated with a single sale can differ between listings and we are not able to differentiate between which sales came from which listings. Therefore, 100 sales could represent a larger total trade volume than 1000 sales. As such, we can not conclusively say whether the inclusion of SkypeMan by topic engagement is indeed the better choice compared to the inclusion of Trippyy by betweenness centrality. This uncertainty is another reason why we put a greater emphasis on vendor recall than sales recall in this work, and why we focus on the top vendor percentile instead of the pure top vendors in terms of sales. Regardless, the results in Table 2 are a concrete example of how these measures can potentially serve as early warning signals for future vendor success.
In addition to vendors, we also find users with other important positions on the forum, such as “Administrator” and “Moderator”, among the top 25 for both measures. In fact, betweenness centrality and topic engagement combined include three out of the four users to have held the title “Administrator” among their top users. Furthermore, the only missing administrator became inactive within a month of the founding of the cryptomarket. Thus, we can say that all active administrators were found. Additionally, betweenness centrality identifies five out of nine users to have held the title of “Moderator” and who registered before the end of September 2014 (four out of seven if we exclude users who obtained the title after September, including d33poutside). The probability of this happening randomly is more than 250 million times smaller (\(2.07 \times 10^{-11}\) (\(2.27 \times 10^{-9}\))). On the other hand, topic engagement includes two out of nine (two out of seven) with probabilities of this randomly occurring that are just over 700 times smaller (\(3.10 \times 10^{-4}\) (\(1.82 \times 10^{-4}\))). Thus, these measures are suited to predicting key players beyond just successful vendors. Though neither measure perfectly identifies only key players, they provide an excellent way of identifying individuals to investigate further manually.