# Ground Truth Pilot Batch Metadata

Collection Date: 2025-11-19
Papers Collected: 20
Method: Manual WebFetch + MCP-DBLP validation
BibTeX File: pilot_batch_20.bib

## Paper Provenance

1. Chen2018 (conf/icsm/ChenDZGH18)
   Source: ICSME 2018
   Selection: Paper #5 from conference listing
   Title: DRLgencert: Deep Learning-Based Automated Testing of Certificate Verification in SSL/TLS Implementations
   Validation: fuzzy_title_search, similarity=0.89, BibTeX verified

2. Reddi2010 (conf/micro/ReddiKKCSWB10)
   Source: MICRO 2010
   Selection: Paper #7 from conference listing
   Title: Voltage Smoothing: Characterizing and Mitigating Voltage Noise in Production Processors via Software-Guided Thread Scheduling
   Validation: fuzzy_title_search, similarity=0.81, BibTeX verified

3. Beckmann2015 (conf/hpca/BeckmannS15)
   Source: HPCA 2015
   Selection: Paper #6 from conference listing
   Title: Talus: A simple way to remove cliffs in cache performance
   Validation: fuzzy_title_search, similarity=0.98, BibTeX verified

4. Axiotis2019 (conf/soda/AxiotisBJTW19)
   Source: SODA 2019
   Selection: Paper #6 from conference listing
   Title: Fast Modular Subset Sum using Linear Sketching
   Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

5. Arora2023 (journals/pvldb/AroraYENHTR23)
   Source: PVLDB Volume 17 (2023)
   Selection: Paper #10 from journal volume
   Title: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes
   Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

6. Kleinjung2017 (conf/eurocrypt/KleinjungDLPS17)
   Source: EUROCRYPT 2017 Part I
   Selection: Paper #7 from conference listing
   Title: Computation of a 768-Bit Prime Field Discrete Logarithm
   Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

7. Choromanski2021 (conf/iclr/ChoromanskiLDSG21)
   Source: ICLR 2021
   Selection: Paper #12 from conference listing
   Title: Rethinking Attention with Performers
   Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

8. Arutyunova2021 (conf/stacs/Arutyunova021)
   Source: STACS 2021
   Selection: Paper #8 from conference listing
   Title: Achieving Anonymity via Weak Lower Bound Constraints for k-Median and k-Means
   Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

9. Babun2022 (conf/ndss/BabunSAU22)
   Source: NDSS 2022
   Selection: Paper #10 from conference listing
   Title: The Truth Shall Set Thee Free: Enabling Practical Forensic Capabilities in Smart Environments
   Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

10. Wong2024 (conf/icse/WongWLW24)
    Source: ICSE 2024
    Selection: Paper #7 from conference listing
    Title: BinAug: Enhancing Binary Similarity Analysis with Low-Cost Input Repairing
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

11. Zhong2024 (conf/chi/ZhongSMKCH24)
    Source: CHI 2024
    Selection: Paper #2 from conference listing
    Title: AI-Assisted Causal Pathway Diagram for Human-Centered Design
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

12. Liu2022 (conf/issta/LiuZWJL22)
    Source: ISSTA 2022
    Selection: Paper #3 from conference listing
    Title: TeLL: log level suggestions via modeling multi-level code block information
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

13. Chakraborty2021 (conf/ijcai/ChakrabortySS21)
    Source: IJCAI 2021
    Selection: Paper #12 from conference listing
    Title: Picking Sequences and Monotonicity in Weighted Fair Division
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

14. Perron2020 (conf/sigmod/PerronFDM20)
    Source: SIGMOD 2020
    Selection: Paper #10 from conference listing
    Title: Starling: A Scalable Query Engine on Cloud Functions
    Validation: fuzzy_title_search, similarity=0.98, BibTeX verified

15. Bjorklund2022 (conf/stoc/BjorklundHK22)
    Source: STOC 2022
    Selection: Paper #11 from conference listing
    Title: The shortest even cycle problem is tractable
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

16. Wang2011 (conf/kdd/WangDCV11)
    Source: KDD 2011
    Selection: Paper #7 from conference listing
    Title: Trading representability for scalability: adaptive multi-hyperplane machine for nonlinear classification
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

17. Belaid2020 (conf/crypto/BelaidCPRT20)
    Source: CRYPTO 2020 Part I
    Selection: Paper #12 from conference listing
    Title: Random Probing Security: Verification, Composition, Expansion and New Constructions
    Validation: fuzzy_title_search, similarity=0.98, BibTeX verified

18. Lee2020 (conf/www/LeeKGOL20)
    Source: WWW 2020
    Selection: Paper #15 from conference listing
    Title: Measurements, Analyses, and Insights on the Entire Ethereum Blockchain Network
    Validation: fuzzy_title_search, similarity=0.98, BibTeX verified

19. You2023 (conf/nsdi/YouCC23)
    Source: NSDI 2023
    Selection: Paper #9 from conference listing
    Title: Zeus: Understanding and Optimizing GPU Energy Consumption of DNN Training
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

20. Lin2021 (conf/sp/LinG21)
    Source: IEEE S&P 2021
    Selection: Paper #3 from conference listing
    Title: When Function Signature Recovery Meets Compiler Optimization
    Validation: fuzzy_title_search, similarity=0.99, BibTeX verified

## Collection Statistics

Venues represented: 19 unique venues
- Conferences: 18 (90%)
- Journals: 1 (5%)
- Note: PVLDB is published as journal but associated with VLDB conference

Temporal distribution:
- 2010-2014: 2 papers (10%)
- 2015-2019: 3 papers (15%)
- 2020-2024: 15 papers (75%)

Subject areas:
- Systems/Architecture: 5 papers (25%)
- Security/Cryptography: 4 papers (20%)
- Software Engineering: 4 papers (20%)
- AI/ML: 3 papers (15%)
- Theory: 2 papers (10%)
- HCI: 1 paper (5%)
- Databases: 1 paper (5%)

BibTeX validation success rate: 20/23 attempts = 87%
- All 20 collected papers validated successfully
- 3 papers could not be found via fuzzy search (different papers selected instead)

Average similarity score: 0.95 (range: 0.81-0.99)
- Excellent match quality for most papers
- Confirms fuzzy_title_search is reliable for validation

## Notes

This pilot batch demonstrates the feasibility of the collection methodology but reveals challenges with DBLP URL inconsistencies. See PILOT_BATCH_REPORT.md for detailed findings and recommendations for automation.
