A comprehensive study by researchers from Socket, Carnegie Mellon University, and North Carolina State University has uncovered a significant problem of artificial inflation of repository popularity on GitHub through fake “stars.” The investigation revealed approximately 4.5 million suspicious stars across the platform.
Key Findings:
– 4.5 million suspected fake stars identified
– 1.32 million suspicious accounts detected
– 22,915 repositories affected
– 15.8% of repositories with over 50 stars showed suspicious activity in July 2024
The Research Method:
Researchers developed “StarScout,” a specialized tool analyzing 20TB of data from GHArchive, covering:
– 6 billion GitHub events
– 60.5 million user actions
– 310 million repositories
– 610 million stars
Detection Criteria:
– Minimal account activity
– Bot-like behavior patterns
– Coordinated starring actions
– Temporary account characteristics
Impact and Implications:
The manipulation of stars affects GitHub’s:
– Global ranking system
– Content recommendation algorithm
– User trust
– Project visibility
Security Concerns:
– Malware distribution through artificially promoted repositories
– Scam projects gaining unwarranted visibility
– Exploitation in state-sponsored operations
Verification Results:
– 91% of identified suspicious repositories were deleted
– 62% of suspected inauthentic accounts were removed
– GitHub promptly removed reported suspicious activity
User Recommendations:
Instead of relying solely on stars, users should:
– Evaluate repository activity
– Review documentation
– Examine code quality
– Assess contribution patterns
– Verify project legitimacy