Artifact Descriptions for ICPP must follow the following structure and they cannot exceed 2-page long, US letter, ACM format. The Artifact Description (AD) Appendix will be auto generated from author responses to a standard form embedded in the online submission system.
The code artifact will contain the following aspects:
- Artifact Identification: Including an abstract describing the main contributions of the article and how the role of the artifact in these contributions. The abstract may include a software architecture or data models and its description to help the reader understand the artifact and a clear description on to what extent the artifact contributes to the reproducibility of the experiments in the article.
- Artifact Dependencies and Requirements: Including (i) a description of the hardware resources required, (ii) a description of the operating systems required, (iii) the software libraries needed, (iv) the input dataset needed to execute the code or when the input data is generated, and (v) optionally, any other dependencies or requirements. Best practices to facilitate the understanding of the descriptions indicate that unnecessary dependencies and requirements should be suppressed from the artifact.
- Artifact Installation and Deployment Process: Including (i) the process description to install and compile the libraries and the code, and (ii) the process description to deploy the code in the resources. The description of these processes should include an estimation of the installation, compilation, and deployment times. When any of these times exceed what is reasonable, authors should provide some way to alleviate the effort required by the potential recipients of the artifacts. For instance, capsules with the compiled code can be provided, or a simplified input dataset that reduces the overall experimental execution time. On the other hand, best practices indicate that, whenever it is possible, the actual code of software dependencies (libraries) should not be included in the artifact, but scripts should be provided to download them from a repository and perform the installation.
- Reproducibility of Experiments: Including (i) a complete description of the experiment workflow that the code can execute, (ii) an estimation of the execution time to execute the experiment workflow, (iii) a complete description of the expected results and an evaluation of them, and most importantly (iv) how the expected results from the experiment workflow relate to the results found in the article. Best practices indicate that, to facilitate the understanding of the scope of the reproducibility, the expected results from the artifact should be in the same format as the ones in the article. For instance, when the results in the article are depicted in a graph figure, ideally, the execution of the code should provide a (similar) figure (there are open-source tools that can be used for that purpose such as gnuplot). It is critical that authors devote their efforts to these aspects of the reproducibility of experiments to minimize the time needed for their understanding and verification.
- Other notes: Optionally, it might include other related aspects that could be important and were not addressed in the previous points.
The dataset artifact might contain the following aspects:
- Artifact Identification: Including the owner of the artifact, Zenodo also provides DOIs for datasets, but they can also be obtained at IEEE dataport, etc.), an abstract describing the main contributions of the article and how the role of the artifact in these contributions. The abstract may include data models and its description to help the reader understand the artifact and a clear description on to what extent the artifact contributes to the reproducibility of the experiments in the article.
- Data Provenance: It would be beneficial to include information about the dataset's origin, such as its source, data collection methods, and the timeframe in which the data was collected. This metadata would enable users to assess the reliability and credibility of the dataset.
- Data Collection Methods: Detailed documentation on the procedures used to collect the data, including any instruments or tools employed, would allow users to evaluate the accuracy and potential biases present in the dataset.
- Sampling Techniques: If the dataset is a sample of a larger population, it would be helpful to describe the sampling methods used, such as random sampling, stratified sampling, or convenience sampling. This information would assist users in understanding the representativeness of the data.
- Data Preprocessing: Including a description of any preprocessing steps applied to the data, such as cleaning, filtering, or transformation, would help users understand how the raw data was processed and any potential impact on the final dataset.
- Variables and Definitions: Clear definitions and descriptions of the variables present in the dataset, including their units of measurement, data types, and possible values, would enable users to correctly interpret and analyze the data.
- Missing Data Handling: It would be beneficial to explain how missing or incomplete data were treated, whether through imputation, exclusion, or other methods. This metadata would allow users to understand any potential biases or limitations resulting from missing data.
- Data Quality Measures: Documentation of quality assessments performed on the dataset, including measures such as accuracy, completeness, consistency, and reliability, would help users gauge the overall quality and reliability of the data.
- Privacy and Ethical Considerations: Providing information regarding any privacy or ethical considerations associated with the dataset, such as anonymization techniques used, consent procedures, or adherence to data protection regulations, would ensure compliance with legal and ethical standards.
- Versioning and Updates: If the dataset undergoes changes or updates, it would be beneficial to include version information and a changelog to track modifications over time. This would enable users to understand the dataset's evolution and any potential impact on analysis or comparability.
Licensing and Usage Terms: Clear indication of the dataset's licensing terms, including any restrictions on data usage, redistribution, or modifications, would help users understand their rights and obligations when working with the dataset.
Reproducibility Artifact Examples
Some journal articles with reproducibility badges are listed here and can be explored as guidance.