Using GitHub APIs, we construct an unbiased dataset of over 10 million GitHub users. The data was collected between Jul. 20 and Aug. 27, 2018, covering 10,649,574 users, 118,602,740 commits, and 20,999,258 repositories. Each data entry is stored in JSON format, representing one GitHub user, and containing the descriptive information in the user’s profile page, the information of her commit activities and created/forked public repositories.
Gong, Q., Zhang, J., Chen, Y., Li, Q., Xiao, Y., Wang, X. & Hui, P., Nov 2019, CIKM '19:Proceedings of the 28th ACM International Conference on Information and Knowledge Management.ACM, p. 1251-1260 (ACM International Conference on Information & Knowledge Management).
Research output: Chapter in Book/Report/Conference proceeding › Conference article in proceedings › Scientific › peer-review
Open Access
File
20Citations
(Scopus)
335Downloads
(Pure)
Cite this
DataSetCite
Gong, Q. (Creator), Zhang, J. (Creator), Chen, Y. (Creator), Xiao, Y. (Contributor), Fu, X. (Creator), Hui, P. (Creator), Li, X. (Creator), Wang, X. (Creator) (1 Jan 2018). A Representative User-centric Dataset of 10 Million GitHub Developers. Harvard Dataverse. 10.7910/dvn/t6zrjt