Hugging Face's logo Hugging Face
  • Models
  • Datasets
  • Spaces
  • Docs
  • Enterprise
  • Pricing

  • Log In
  • Sign Up
Denis Rodionov's picture

Denis Rodionov

shrimpfriedrice-bricks
ยท

AI & ML interests

None yet

Recent Activity

reacted to nyuuzyou's post with ๐Ÿ”ฅ about 7 hours ago
๐Ÿ›๏ธ Google Code Archive Dataset - https://huggingface.co/datasets/nyuuzyou/google-code-archive Expanding beyond the modern code series, this release presents a massive historical snapshot from the Google Code Archive. This dataset captures the open-source landscape from 2006 to 2016, offering a unique time capsule of software development patterns during the era before GitHub's dominance. Key Stats: - 65,825,565 files from 488,618 repositories - 47 GB compressed Parquet storage - 454 programming languages (Heavily featuring Java, PHP, and C++) - Extensive quality filtering (excluding vendor code and build artifacts) - Rich historical metadata: original repo names, file paths, and era-specific licenses This is one of those releases that I'm most interested in getting feedback on. Would you like to see more old code datasets?
View all activity

Organizations

None yet

shrimpfriedrice-bricks 's datasets

None public yet
Company
TOS Privacy About Careers
Website
Models Datasets Spaces Pricing Docs